I'm working with the BeautifulSoup python library. I used the urllib2 library to download the HTML code from a page, and then I have parsed it with BeautifulSoup. I want to save some of the HTML content into a MySql table, but I'm having some problems with the encoding. The MySql table is encoded with 'utf-8' charset.
Some examples:
When I download the HTML code and parse it with BeautifulSoup I have something like:
"Ver las \xc3\xbaltimas noticias. Ent\xc3\xa9rate de las noticias de \xc3\xbaltima hora con la mejor cobertura con fotos y videos"
The correct text would be:
"Ver las últimas noticias. Entérate de las noticias de última hora con la mejor cobertura con fotos y videos"
I have tried to encode and decode that text with multiple charsets, but when I insert it into MySql I have somethig like:
"Ver las últimas noticias y todos los titulares de hoy en Yahoo! Noticias Argentina. Entérate de las noticias de última hora con la mejor cobertura con fotos y videos"
I'm having problems with the encoding, but I don't know how to solve them.
Any suggestion?