0
votes

I currently use Sublime 2 and run my python code there. When I try to run this code. I get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)

# -*- coding: utf-8 -*-  
s = unicode('abcdefö') 
print s

I have been reading the python documentation on unicode and as far as I understand this should work, or is it the console that's not working

Edit: Using s = u'abcdefö' as a string produces almost the same result. The result I get is

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 6: ordinal not in range(128)

3
A [Google search][1] of 'ascii' codec can't decode returns 12,400 matches just for Stack Overflow... You might want to do a little research on this site first... [1]: google.com/…dda
Do yourself a favor and use the latest version of Python which doesn't have these problems.Oleh Prypin

3 Answers

6
votes

What happens is that unicode('abcdefö') tries to decode the encoded string to unicode during runtime. The coding: utf-8 line only tells Python that the source file is encoded in utf8. When the script runs it has been compiled and string has been stored as a encoded string. So when Python tries to decode the string it uses ascii by default. As the string is actually utf8 encoded this fails.

You can do s = u'abcdefö' which tells the compiler to decode the string with the encoding declared for the file and store it as unicode. s = unicode('abcdefö', 'utf8') or s = 'abcdefö'.decode('utf8') would do the same thing during runtime.

However does not necessarily mean that you can print s now. First the internal unicode string has to be encoded in a character set that the stdout (the console/editor/IDE) can actually display. Sadly often Python fails at figuring out the right character set and defaults to ascii again and you get an error when the string contains non-ascii characters. The Python Wiki knows a few ways to set up stdout properly.

1
votes

You need to mark the string as a unicode string:

s = u'abcdefö'
0
votes

s = 'abcdefö'

DO NOT TRY unicode() if string is already in unicode. i.e. unicode(s) is wrong.

IF type(s) == str but contains unicode characters:

  1. First convert to unicode

    str_val = unicode(s,'utf-8’)
    str_val = unicode(s,'utf-8’,’replace')
    
  2. Finally encode to string

    str_val.encode('utf-8')
    

Now you can print:

print s