Help Replacing Non-ASCII character in Python

Question

I have a bunch of HTML files I downloaded using HTTPLIB2 package in Python. ' ' are showing as 'Â '.

<font color="#ff0000">02/12/2004Â </font> is showing while <font color="#ff0000">02/12/2004&nbsp;</font> is the desired format.

How do I replace the 'Â ' with ' ' in Python? Thanks a lot!

Yes it is slightly different from the original HTML. I am using httplib2 to download them and not a real browser. Is there somthing I have to include in the header for httlib2 to download the page as is? — ThinkCode

e-satis e-satis · Accepted Answer · 2011-12-22T10:18:30

You've got an encoding problem. Instead of trying to remove this characters, look for the encoding of the page, then when you read the file, use the codecs module instead of open(), using the proper character encoding.

Help Replacing Non-ASCII character in Python

3 Answers