So I'm using BeautifulSoup. It gets me the text of some HTML nodes, but those nodes have some Unicode characters, which get converted to escaped sequences in the string
For example, An HTML element that has this:
50 € is retrieved by BeautifulSoup like:
soup.find("h2").text as this string: 50\u20ac, Which is only readable in the Python console.
But then it becomes unreadable when written to a JSON file.
Note: I save to json using this code:
with open('file.json', 'w') as fp:
json.dump(fileToSave, fp)
How can I convert those Unicode characters back to UTF-8 or whatever makes them readable again?
repr()of the content of the string. - Mark Tolonen