Python unicode error. UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a'

Question

So, I have this code to fetch JSON string from url

url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)

for x in data: 
    print x['foo']

The problem is x['foo'], if tried to print it as seen above, I get this error.

Warning: Incorrect string value: '\xE4\xB8\xBA Co...' for column 'description' at row 1

If I use x['foo'].decode("utf-8") I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a' in position 0: ordinal not in range(128)

If I try, encode('ascii', 'ignore').decode('ascii') Then I get this error.

x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode'

Is there any way to fix this problem?

Daenyth Daenyth · Accepted Answer · 2015-10-07T12:49:29

x['foo'].decode("utf-8") resulting in UnicodeEncodeError means that x['foo'] is of type unicode. str.decode takes a str type and translates it to unicode type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode to str so that you can call decode on it. It does this with sys.defaultencoding, which is ascii, which can't encode all of Unicode, hence the exception.

The solution here is to remove the decode call - the value is already unicode.

Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.

It's worth noting here that everything returned by json.load will be unicode and not str.

Addressing the new question after edits:

When you print, you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode object to str. You can do this be calling encode with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.

This should work:

print x['foo'].encode('utf-8')

Python unicode error. UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a'

1 Answers