63
votes

I'm currently learning lua. regarding pattern-matching in lua I found the following sentence in the lua documentation on lua.org:

Nevertheless, pattern matching in Lua is a powerful tool and includes some features that are difficult to match with standard POSIX implementations.

As I'm familiar with posix regular expressions I would like to know if there are any common samples where lua pattern matching is "better" compared to regular expression -- or did I misinterpret the sentence? and if there are any common examples: why is any of pattern-matching vs. regular expressions better suited?

4
link to where you read this in the docs would be niceuser67416
@g33kz0r the docs are available at: lua.org/pil/20.1.html the citation is from the second paragraph (the one starting with: "Unlike several other scripting languages, ...) the last sentence.aurora

4 Answers

68
votes

Are any common samples where lua pattern matching is "better" compared to regular expression?

It is not so much particular examples as that Lua patterns have a higher signal-to-noise ratio than POSIX regular expressions. It is the overall design that is often preferable, not particular examples.

Here are some factors that contribute to the good design:

  • Very lightweight syntax for matching common character types including uppercase letters (%u), decimal digits (%d), space characters (%s) and so on. Any character type can be complemented by using the corresponding capital letter, so pattern %S matches any nonspace character.

  • Quoting is extremely simple and regular. The quoting character is %, so it is always distinct from the string-quoting character \, which makes Lua patterns much easier to read than POSIX regular expressions (when quoting is necessary). It is always safe to quote symbols, and it is never necessary to quote letters, so you can just go by that rule of thumb instead of memorizing what symbols are special metacharacters.

  • Lua offers "captures" and can return multiple captures as the result of a match call. This interface is much, much better than capturing substrings through side effects or having some hidden state that has to be interrogated to find captures. Capture syntax is simple: just use parentheses.

  • Lua has a "shortest match" - modifier to go along with the "longest match" * operator. So for example s:find '%s(%S-)%.' finds the shortest sequence of nonspace characters that is preceded by space and followed by a dot.

  • The expressive power of Lua patterns is comparable to POSIX "basic" regular expressions, without the alternation operator |. What you are giving up is "extended" regular expressions with |. If you need that much expressive power I recommend going all the way to LPEG which gives you essentially the power of context-free grammars at quite reasonable cost.

8
votes

http://lua-users.org/wiki/LibrariesAndBindings contains a listing of functionality including regex libraries if you wish to continue using them.

To answer the question (and note that I'm by no means a Lua guru), the language has a strong tradition of being used in embedded applications, where a full regex engine would unduly increase the size of the code being used on the platform, sometimes much larger than just all of the Lua library itself.

[Edit] I just found in the online version of Programming in Lua (an excellent resource for learning the language) where this is described by one of the principles of the language: see the comments below [/Edit]

I find personally that the default pattern matching Lua provides satisfies most of my regex-y needs. Your mileage may vary.

1
votes

Ok, just a slight noob note for this discussion; I particularly got confused by this page:

SciTE Regular Expressions

since that one says \s matches whitespace, as I know from other regular expression syntaxes... And so I'm trying it in a shell:

$ lua
Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
> c="   d"
> print(c:match(" "))

> print(c:match("."))

> print(c:match("\s"))
nil
> print("_".. c:match("[ ]") .."_")
_ _
> print("_".. c:match("[ ]*") .."_")
_   _
> print("_".. c:match("[\s]*") .."_")
__

Hmmm... seems \s doesn't get recognized here - so that page probably refers to the regular expression in Scite's Find/Replace - not to Lua's regex syntax (which scite also uses).

Then I reread lua-users wiki: Patterns Tutorial, and start getting the comment about the escape character being %, not \ in @NormanRamsey's answer. So, trying this:

> print("_".. c:match("[%s]*") .."_")
_   _

... does indeed work.

So, as I originally thought that Lua's "patterns" are different commands/engine from Lua's "regular expression", I guess a better way to say it is: Lua's "patterns" are the Lua-specific "regular expression" syntax/engine (in other words, there aren't two of them :) )

Cheers!

1
votes

With the risk of getting some downvotes for speaking the truth, I'll be bluntly honest about it (like an answer should be, after all): aside from being able to return multiple captures for a single match call (possible in regular expressions, but in a much more convoluted manner) and the %bxy pattern which matches a balanced pair of delimiters (e.g. all kind of brackets and such) and qualifies as useful, powerful and "better", almost everything Lua patterns can do, regular expressions can do as well.

The shortcomings of Lua patterns compared to regular expressions when it comes to "features" on the other hand are significant and too many too mention (e.g. lack of OR, lack of non-capturing groups, lookaround expressions, etc). Now that would be balanced if, say, Lua patterns would be significantly faster that the usually slower regular expressions, but I'm not sure whether - and where - such a comparison exists, one that would exclude the general native Lua speed due to its lightweight nature, the use of tables and so on.

The real reason Lua didn't bother to add regular expressions to its toolbox can't be the length of the required code (that's nonsense, modern computers don't even blink when it comes to 4000 lines of code vs "just" 500, even if it translates a bit differently into a library), but is probably due to the fact that being a scripting language, it was assumed that the "parent" language already includes the ability to use regular expressions. It is plain obvious when looking at the overall picture that Lua as a language was designed with simplicity, speed and only the necessary features in mind. It works well in most cases, but if you need more capabilities in this area and you cannot replicate them using Lua's other features, regular expressions are more comprehensive.

The good thing is that the differences in syntax between the Lua pattern and regular expressions are mostly minor, so if you know one you can relatively easy adapt to the other.