Figuring out unicode: 'ascii' codec can't decode

Question

I currently use Sublime 2 and run my python code there. When I try to run this code. I get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)

# -*- coding: utf-8 -*-  
s = unicode('abcdefö') 
print s

I have been reading the python documentation on unicode and as far as I understand this should work, or is it the console that's not working

Edit: Using s = u'abcdefö' as a string produces almost the same result. The result I get is

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 6: ordinal not in range(128)

A [Google search][1] of 'ascii' codec can't decode returns 12,400 matches just for Stack Overflow... You might want to do a little research on this site first... [1]: google.com/… — dda
Do yourself a favor and use the latest version of Python which doesn't have these problems. — Oleh Prypin

Jochen Ritzel Jochen Ritzel · Accepted Answer · 2012-12-08T15:34:41

What happens is that unicode('abcdefö') tries to decode the encoded string to unicode during runtime. The coding: utf-8 line only tells Python that the source file is encoded in utf8. When the script runs it has been compiled and string has been stored as a encoded string. So when Python tries to decode the string it uses ascii by default. As the string is actually utf8 encoded this fails.

You can do s = u'abcdefö' which tells the compiler to decode the string with the encoding declared for the file and store it as unicode. s = unicode('abcdefö', 'utf8') or s = 'abcdefö'.decode('utf8') would do the same thing during runtime.

However does not necessarily mean that you can print s now. First the internal unicode string has to be encoded in a character set that the stdout (the console/editor/IDE) can actually display. Sadly often Python fails at figuring out the right character set and defaults to ascii again and you get an error when the string contains non-ascii characters. The Python Wiki knows a few ways to set up stdout properly.

Figuring out unicode: 'ascii' codec can't decode

3 Answers