Encoding error using Python

Question

I wrote a code to connect to imap and then parse the body information and insert into database. But I am having some problems with accents.

From email header I got this information:

Content-Type: text/html; charset=ISO-8859-1

But, I am not sure if I can trust in this information...

The email was wrote in portuguese, so we have a lot of words with accents. For example, I extract the following phrase from the email source code (using my browser):

"...instalação de eletrônicos..."

So, I connected to imap and fetched some emails:

... typ, data = M.fetch(num, '(RFC822)') ...

When I print the content, I get the following word:

print data[0][1]

instala+º+úo de eletr+¦nicos

I tried to use .decode('utf-8') but I had no success.

instalaÃ§Ã£o de eletrÃ´nicos

How can I make it a human readable? My database is in utf-8.

What does print(type(data[0][1])); print(repr(data[0][1])) print? — Martijn Pieters
@MartijnPieters - type: <type 'str'> and "print(repr(" returned accents with the following format: fun\xc3\xa7\xc3\xa3o (sorry, this is another accented word) — Thomas
No, that's exactly what I wanted to see. That's função in UTF8. And .decode('utf8') should work, perhaps you need to show us more code? — Martijn Pieters
@MartijnPieters, I tried: print repr(data[0][1]).decode('utf8') but still showing "fun\xc3\xa7\xc3\xa3o" (you are right about decoded word "função") — Thomas

marianobianchi marianobianchi · Accepted Answer · 2013-02-11T18:23:43

The header says it is using "ISO-8859-1" charset. So you need to decode the string with that encoding.

Try this:

data[0][1].decode('iso-8859-1')

Encoding error using Python

3 Answers