I'm using Python 2.7 here (which is very relevant). Let's say I have a string containing an "em" dash, "—". This isn't encoded in ASCII. Therefore, when my Django app processes it, it complains. A lot. I want to to replace some such characters with unicode equivalents for string tokenization and use with a spell-checking API (PyEnchant, which considers non-ASCII apostrophes to be misspellings), for example by using the shorter "-" dash instead of an em dash. Here's what I'm doing:
s = unicode(s).replace(u'\u2014', '-').replace(u'\u2018', "'").replace(u'\u2019', "'").replace(u'\u201c', '"').replace(u'\u201d', '"')
Unfortunately, this isn't actually replacing any of the unicode characters, and I'm not sure why. I don't really have time to upgrade to Python 3 right now, importing unicode_literals from future at the top of the page or setting the encoding there does not let me place actual unicode literals in the code, as it should, and I have tried endless tricks with encode() and decode(). Can anyone give me a straightforward, failsafe way to do this in Python 2.7?