I have read the HOWTO on Unicode from the official docs and a full, very detailed article as well. Still I don't get it why it throws me this error.
Here is what I attempt: I open an XML file that contains chars out of ASCII range (but inside allowed XML range). I do that with cfg = codecs.open(filename, encoding='utf-8, mode='r')
which runs fine. Looking at the string with repr()
also shows me a unicode string.
Now I go ahead and read that with parseString(cfg.read().encode('utf-8')
. Of course, my XML file starts with this: <?xml version="1.0" encoding="utf-8"?>
. Although I suppose it is not relevant, I also defined utf-8 for my python script, but since I am not writing unicode characters directly in it, this should not apply here. Same for the following line: from __future__ import unicode_literals
which also is right at the beginning.
Next thing I pass the generated Object to my own class where I read tags into variables like this: xmldata.getElementsByTagName(tagName)[0].firstChild.data
and assign it to a variable in my class.
Now what perfectly works are those commands (obj is an instance of the class):
for element in obj:
print element
And this command does work as well:
print obj.__repr__()
I defined __iter__()
to just yield every variable while __repr__()
uses the typical printf stuff: "%s" % self.varname
Both commands print perfectly and can output the unicode character. What does not work is this:
print obj
And now I am stuck because this throws the dreaded
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 47:
So what am I missing? What am I doing wrong? I am looking for a general solution, I always want to handle strings as unicode, just to avoid any possible errors and write a compatible program.
Edit: I also defined this:
def __str__(self):
return self.__repr__()
def __unicode__(self):
return self.__repr__()
From documentation I got that this
print obj
will use the object's__str__
, not__repr__
. – BrenBarn