Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion How do you htmlentities in Python

View parsed - Show only message text

Path: g2news1.google.com!postnews.google.com!q19g2000prn.googlegroups.com!not-for-mail
From:  Matimus <mccre...@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: How do you htmlentities in Python
Date: Mon, 04 Jun 2007 17:17:27 -0000
Organization: http://groups.google.com
Lines: 34
Message-ID: <1180977447.745432.109040@q19g2000prn.googlegroups.com>
References: <mailman.8674.1180963921.32031.python-list@python.org>
NNTP-Posting-Host: 134.134.136.3
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Trace: posting.google.com 1180977448 9055 127.0.0.1 (4 Jun 2007 17:17:28 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 4 Jun 2007 17:17:28 +0000 (UTC)
In-Reply-To: <mailman.8674.1180963921.32031.python-list@python.org>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4,gzip(gfe),gzip(gfe)
X-HTTP-Via: 1.1 jffwpr03.jf.intel.com:911 (squid/2.5.STABLE12)
Complaints-To: groups-abuse@google.com
Injection-Info: q19g2000prn.googlegroups.com; posting-host=134.134.136.3;
   posting-account=M_6TYQwAAAArPByfBgv1JGPpAkaflA4L

On Jun 4, 6:31 am, "js " <ebgs...@gmail.com> wrote:
> Hi list.
>
> If I'm not mistaken, in python, there's no standard library to convert
> html entities, like &amp; or &gt; into their applicable characters.
>
> htmlentitydefs provides maps that helps this conversion,
> but it's not a function so you have to write your own function
> make use of  htmlentitydefs, probably using regex or something.
>
> To me this seemed odd because python is known as
> 'Batteries Included' language.
>
> So my questions are
> 1. Why doesn't python have/need entity encoding/decoding?
> 2. Is there any idiom to do entity encode/decode in python?
>
> Thank you in advance.

I think this is the standard idiom:

>>> import xml.sax.saxutils as saxutils
>>> saxutils.escape("&")
'&amp;'
>>> saxutils.unescape("&gt;")
'>'
>>> saxutils.unescape("A bunch of text with entities: &amp; &gt; &lt;")
'A bunch of text with entities: & > <'

Notice there is an optional parameter (a dict) that can be used to
define additional entities as well.

Matt


Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google