Does Python have a function that I can use to escape special characters in a regular expression?
For example, I'm "stuck" :\
should become I\'m \"stuck\" :\\
.
Does Python have a function that I can use to escape special characters in a regular expression?
For example, I'm "stuck" :\
should become I\'m \"stuck\" :\\
.
Use re.escape
>>> import re
>>> re.escape(r'\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape(r'\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.stackoverflow.com')
'www\\.stackoverflow\\.com'
>>> print(re.escape('www.stackoverflow.com'))
www\.stackoverflow\.com
Repeating it here:
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
As of Python 3.7 re.escape()
was changed to escape only characters which are meaningful to regex operations.
I'm surprised no one has mentioned using regular expressions via re.sub()
:
import re
print re.sub(r'([\"])', r'\\\1', 'it\'s "this"') # it's \"this\"
print re.sub(r"([\'])", r'\\\1', 'it\'s "this"') # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"') # it\'s\ \"this\"
Important things to note:
\
as well as the character(s) you're looking for.
You're going to be using \
to escape your characters, so you need to escape
that as well.([\"])
, so that the substitution
pattern can use the found character when it adds \
in front of it. (That's what
\1
does: uses the value of the first parenthesized group.)r
in front of r'([\"])'
means it's a raw string. Raw strings use different
rules for escaping backslashes. To write ([\"])
as a plain string, you'd need to
double all the backslashes and write '([\\"])'
. Raw strings are friendlier when
you're writing regular expressions.\
to distinguish it from a
backslash that precedes a substitution group, e.g. \1
, hence r'\\\1'
. To write
that as a plain string, you'd need '\\\\\\1'
— and nobody wants that.Use repr()[1:-1]. In this case, the double quotes don't need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.
>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\
Or maybe you just want to escape a phrase to paste into your program? If so, do this:
>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'
As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:
>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\