2
votes

I was playing around with some patterns today to try to match some specific characters in a string, and ran into something unusual that I'm hoping someone can explain.

I had created a set looking for a list of characters within some strings, and noticed I was getting back some unexpected results. I eliminated the characters in the set until I got down to just three, and it seems to be these three that are responsible:

string = "alpha.5dc1704B40bc7f.beta.123456789.gamma.987654321.delta.abc123ABC321"

result = ""
for a in string.gmatch(string, '[+-_]') do 
result = result .. a .. " "
end

> print(result)
. 5 1 7 0 4 B 4 0 7 . . 1 2 3 4 5 6 7 8 9 . . 9 8 7 6 5 4 3 2 1 . . 1 2 3 A B C 3 2 1

Why are these characters getting returned here (looks like any number or uppercase letter, plus dots)? I note that if I change up the order of the set, I don't get the same output - '[_+-]' or '[-_+]' or '[+_-]' or '[-+_]' all return nothing, as expected.

What is it about '[+-_]' that's causing a match here? I can't figure out what I'm telling lua that is being interpreted as instructions to match these characters.

1

1 Answers

3
votes

When a - is between other characters inside square brackets, it means everything between those two. For example, [a-z] is all of the lowercase letters, and [A-F] is A, B, C, D, E, and F. [+-_] means every ASCII character between + and _, which includes all the numbers, all the uppercase letters, and a lot of punctuation.