Mental Jetsam

By Peter Finch

Using Unicode characters in python

Posted by pcfinch on June 7, 2007

It’s really easy to work with strings in Python, but when it comes to handling Unicode there are a few issues that you may have to deal with. The main problem you will have is using Unicode characters with devices (consoles) or in database that do not support Unicode. If you have tried printing a Unicode string and got the following message then you will have experienced the issue.

>>> string = u'\\u7279\\u6b8a Unicode'
>>> print string
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters 
in position 0-1: ordinal not in range(256)

The issue is that Python is unable to convert the Unicode string into the encoding of the current terminal. A similar problem can also happen when trying to put Unicode data into a database that does not accept [has not been configured] to use unicode characters or send it via E-Mail. A couple of simple tricks using the very powerful “encode()” function can help a lot have make your code more resilient.

To convert a Uncode string so it can be displayed on an ASCII screen.

>>> print string.encode('ascii','replace')
?? Unicode
>>> print string.encode('ascii','ignore')

A more useful approach is to escape the characters into another encoding. My favorite is to use XML character entities. The string can then be safely put into a the database field, sent out in E-Mails or placed in a HTML page.

>>> string.encode('ascii', 'xmlcharrefreplace')
'& #29305;& #27530; Unicode'



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: