22
votes

I'm trying to check if a string is a number, so the regex "\d+" seemed good. However that regex also fits "78.46.92.168:8000" for some reason, which I do not want, a little bit of code:

class Foo():
    _rex = re.compile("\d+")
    def bar(self, string):
         m = _rex.match(string)
         if m != None:
             doStuff()

And doStuff() is called when the ip adress is entered. I'm kind of confused, how does "." or ":" match "\d"?

5

5 Answers

31
votes

\d+ matches any positive number of digits within your string, so it matches the first 78 and succeeds.

Use ^\d+$.

Or, even better: "78.46.92.168:8000".isdigit()

12
votes

re.match() always matches from the start of the string (unlike re.search()) but allows the match to end before the end of the string.

Therefore, you need an anchor: _rex.match(r"\d+$") would work.

To be more explicit, you could also use _rex.match(r"^\d+$") (which is redundant) or just drop re.match() altogether and just use _rex.search(r"^\d+$").

11
votes

There are a couple of options in Python to match an entire input with a regex.

Python 2 and 3

In Python 2 and 3, you may use

re.match(r'\d+$') # re.match anchors the match at the start of the string, so $ is what remains to add

or - to avoid matching before the final \n in the string:

re.match(r'\d+\Z') # \Z will only match at the very end of the string

Or the same as above with re.search method requiring the use of ^ / \A start-of-string anchor as it does not anchor the match at the start of the string:

re.search(r'^\d+$')
re.search(r'\A\d+\Z')

Note that \A is an unambiguous string start anchor, its behavior cannot be redefined with any modifiers (re.M / re.MULTILINE can only redefine the ^ and $ behavior).

Python 3

All those cases described in the above section and one more useful method, re.fullmatch (also present in the PyPi regex module):

If the whole string matches the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.

So, after you compile the regex, just use the appropriate method:

_rex = re.compile("\d+")
if _rex.fullmatch(s):
    doStuff()
7
votes

\Z matches the end of the string while $ matches the end of the string or just before the newline at the end of the string, and exhibits different behaviour in re.MULTILINE. See the syntax documentation for detailed information.

>>> s="1234\n"
>>> re.search("^\d+\Z",s)
>>> s="1234"
>>> re.search("^\d+\Z",s)
<_sre.SRE_Match object at 0xb762ed40>
5
votes

Change it from \d+ to ^\d+$