It’s really easy to work with strings in Python, but when it comes to handling Unicode there are a few issues that you may have to deal with. The main problem you will have is using Unicode characters with devices (consoles) or in database that do not support Unicode. If you have tried printing a Unicode string and got the following message then you will have experienced the issue.
>>> string = u'\\u7279\\u6b8a Unicode' >>> print string Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)
The issue is that Python is unable to convert the Unicode string into the encoding of the current terminal. A similar problem can also happen when trying to put Unicode data into a database that does not accept [has not been configured] to use unicode characters or send it via E-Mail. A couple of simple tricks using the very powerful “encode()” function can help a lot have make your code more resilient.
To convert a Uncode string so it can be displayed on an ASCII screen.
>>> print string.encode('ascii','replace') ?? Unicode >>> print string.encode('ascii','ignore') Unicode
A more useful approach is to escape the characters into another encoding. My favorite is to use XML character entities. The string can then be safely put into a the database field, sent out in E-Mails or placed in a HTML page.
>>> string.encode('ascii', 'xmlcharrefreplace') '& #29305;& #27530; Unicode'