replace or delete specific unicode characters in python

Question

There seem to be a lot of posts about doing this in other languages, but I can't seem to figure out how in Python (I'm using 2.7).

To be clear, I would ideally like to keep the string in unicode, just be able to replace certain specific characters.

For instance:

thisToken = u'tandh\u2013bm'
print(thisToken)

prints the word with the m-dash in the middle. I would just like to delete the m-dash. (but not using indexing, because I want to be able to do this anywhere I find these specific characters.)

I try using replace like you would with any other character:

newToke = thisToken.replace('\u2013','')
print(newToke)

but it just doesn't work. Any help is much appreciated. Seth

if you use from __future__ import unicode_literals at the top of your file, all string literals are automatically unicode, and it would have helped here (but watch out for surprises when some strings need to be bytes, you can use the b prefix for them). — RemcoGerlich

Kevin Kevin · Accepted Answer · 2016-11-16T14:17:34

The string you're searching for to replace must also be a Unicode string. Try:

newToke = thisToken.replace(u'\u2013','')

replace or delete specific unicode characters in python

2 Answers