Python2.7 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

Question

I am currently using python 2.7 and doing web scraping on a Chinese website.

How to convert unicode below into a string?

Simple str() function does not work and states UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

Thanks in advance,

    u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

Possible duplicate of UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) — ImportanceOfBeingErnest

wim wim · Accepted Answer · 2016-11-14T21:45:36

Your string was already encoded, so it should be a bytes object not a unicode object. Try and solve that problem instead. i.e. the repr of your scraped data should be looking like this:

'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

not like this:

u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

To recover the Chinese text from the unicode object, you can jump to bytes and back:

>>> text = u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'
>>> print text.encode('latin-1').decode('utf-8')

中国深圳

Python2.7 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

1 Answers