5
votes

I get encoding error on this line:

s =  "%s:%s: %s: %s\n" % (filename, lineno, category.__name__, message)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xc4' in position 44: ordinal not in range(128)

I tried to reproduce this error by passing all combinations of parameters to string format, but closest I got was "ascii decode" error (by passing unicode and high ascii string simultaneously, which forced conversion of string to unicode, using ascii decoder.

However, I did not manage to get "ascii encode" error. Anybody has an idea?

3
Oh, you get it when warnings.warn is called... Couldn't you have said so? It was unclear that the code wasn't your code but in the standard library. You should say what your problem is, not a generic question that you think is the problem, because it generally isn't. I've updated my answer below with more details. โ€“ Lennart Regebro

3 Answers

8
votes

This happens when Python tries to coerce an argument:

s = u"\u00fc"
print str(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)

This happens because one of your arguments is an object (not a string of any kind) and Python calls str() on it. There are two solutions: Use a unicode string for the format (s = u"%s...") or wrap each argument with repr().

8
votes

You are mixing unicode and str objects.

Explanation: In Python 2.x, there are two kinds of objects that can contain text strings. str, and unicode. str is a string of bytes, so it can only contain characters between 0 and 255. Unicode is a string of unicode characters.

You can convert between str and unicode with the "encode" and "decode" methods:

>>> "thisisastring".decode('ascii')
u'thisisastring'

>>> u"This is รค string".encode('utf8')    
'This is \xc3\xa4 string'

Note the encodings. Encodings are ways of representing unicode text as only strings of bytes.

If you try to add str and unicode together, Python will try to convert one to the other. But by default it will use ASCII as a encoding, which means a-z, A-Z, and some extra characters like !"#$%&/()=?'{[]]} etc. Anything else will fail.

You will at that point either get a encoding error or a decoding error, depending on if Python tries to convert the unicode to str or str to unicode. Usually it tries to decode, that is convert to unicode. But sometimes it decides not to but to coerce to string. I'm not entirely sure why.

Update: The reason you get an encode error and not a decode error above is that message in the above code is neither str nor unicode. It's another object, that has a str method. Python therefore does str(message) before passing it in, and that fails, since the internally stores message is a unicode object that can't be coerced to ascii.

Or, more simply answered: It fails because warnings.warn() doesn't accept unicode messages.

Now, the solution:

Don't mix str and unicode. If you need to use unicode, and you apparently do, try to make sure all strings are unicode all the time. That's the only way to be sure you avoid this. This means that whenever you read in a string from disk, or a call to a function that may return anything else than pure ascii str, decode it to unicode as soon as possible. And when you need to save it to disk or send it over a network or pass it in to a method that do not understand unicode, encode it to str as late as possible.

In this specific case, the problem is that you pass unicode to warnings.warn() and you can't do that. Pass a string. If you don't know what it is (as seems to be the case here) because it comes from somewhere else, your try/except solutions with a repr works fine, although doing a encode would be a possibility to.

1
votes

One of the operands you are passing is not suitable for ASCII encoding - perhaps it contains either Unicode or Latin-1 characters. Change the format string to Unicode and see what happens.