49
votes

I am looking to match a string that is inputted from a website to check if is alpha-numeric and possibly contains an underscore. My code:

if re.match('[a-zA-Z0-9_]',playerName):
            # do stuff

For some reason, this matches with crazy chars for example: nIg○▲ ☆ ★ ◇ ◆

I only want regular A-Z and 0-9 and _ matching, is there something i am missing here?

3

3 Answers

62
votes

Python has a special sequence \w for matching alphanumeric and underscore when the LOCALE and UNICODE flags are not specified. So you can modify your pattern as,

pattern = '^\w+$'

45
votes

Your regex only matches one character. Try this instead:

if re.match('^[a-zA-Z0-9_]+$',playerName): 
1
votes

…check if is alpha-numeric and possibly contains an underscore.

Do you mean this literally, so that only one underscore is allowed, total? (Not unreasonable for player names; adjacent underscores in particular can be hard for other players to read.) Should "a_b_c" not match?

If so:

if playerName and re.match("^[a-zA-Z0-9]*_?[a-zA-Z0-9]*$", playerName):

The new first part of the condition checks for an empty value, which simplifies the regex.

This places no restrictions on where the underscore can occur, so all of "_a", "a_", and "_" will match. If you instead want to prevent both leading and trailing underscores, which is again reasonable for player names, change to:

if re.match("^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?$", playerName):
// this regex doesn't match an empty string, so that check is unneeded