I'm writing a recursive descent parser in Ruby that uses regexp to match terminals. Terminals are actually regexps, and are matched against the current position in the string.
The problem is that the terminal regex can contain anything, including regular expressions that match newlines. For example, a terminal that matches anything between parenthesis /\([^\)]*\)/
will consume whitespace, including the newlines I need to count. I've come up with a several solutions, but they all have drawbacks which I don't particularly like:
Whenever a terminal is matched, count all occurences of
\n
in the match. This would effectively mean that every string is matched twice instead of once,Instead of storing the current line, I could store the current position of the string, and obtain the line and column number only when I need it by traversing the string. Obviously problematic because the whole string is traversed every time the line number is needed.
Instead of allowing regexps as terminals, I could allow a simpler form of matchers, similar to what ANTLR allows, and then manually match the string, counting newlines. However, it would require no small amount of extra work and a loss of the matching power regular expressions have.
I'm leaning towards the third solution, however I'd like to see if anyone has dealt with a similar problem and has a better solution which would save me the trouble.