how does the unicode thing works on python2? i just dont get it.
here i download data from a server and parse it for JSON.
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.12-py2.6.egg/eventlet/hubs/poll.py", line 92, in wait
readers.get(fileno, noop).cb(fileno)
File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.12-py2.6.egg/eventlet/greenthread.py", line 202, in main
result = function(*args, **kwargs)
File "android_suggest.py", line 60, in fetch
suggestions = suggest(chars)
File "android_suggest.py", line 28, in suggest
return [i['s'] for i in json.loads(opener.open('https://market.android.com/suggest/SuggRequest?json=1&query='+s+'&hl=de&gl=DE').read())]
File "/usr/lib/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib/python2.6/json/decoder.py", line 217, in JSONArray
value, end = iterscan(s, idx=end, context=context).next()
File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib/python2.6/json/decoder.py", line 183, in JSONObject
value, end = iterscan(s, idx=end, context=context).next()
File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib/python2.6/json/decoder.py", line 155, in JSONString
return scanstring(match.string, match.end(), encoding, strict)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data
thank you!!
EDIT: the following string causes the error: '[{"t":"q","s":"abh\xf6ren"}]'
. \xf6
should be decoded to ö
(abhören)
.decode("zlib")
first. – Sven Marnachö
in ISO-8895-1 and other similar 8-bit encodings. If the original message being ISO-8859-1, rather than UTF-8, is beyond your control, then you can always domessage = unicode(message, "ISO-8859-1")
– Tadeusz A. Kadłubowski/usr/lib/python2.6/urllib.py:1222: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
and later:KeyError: u'\xf6'
– ihucos