3
votes

I need to parse a language which is similar to a minimalized version of Java. Since effiency is the most important factor I choose for a hand written parser instead of LRAR parser generators like GOLD, bison and yacc.

However I can't find the theory behind good hand written parsers. It seems like there are only tutorials on those generators and the mechanism behind it.

Do I have to drop using regular expressions? Because I can imaging they are slow compared to hand written tokiners.

Does anybody know a good class or tutorial for hand written parsing?

1
Compiled regular expressions (e.g., parallel FSMs) are usually faster than handwritten LL(n). Although I'd recommend to do a lexerless parsing instead. A handwritten PEG (with Pratt parsing for expressions) can be very fast, and you still can use some higher level templates for generating an efficient code. Read more on PEGs, probably on Packrat parsing and on Pratt, that should be more than enough of a theory.SK-logic
P.S., LLVM's Kaleidoscope tutorial includes a simple handwritten parser which in turn reflects the more complicated LLVM and Clang parsing approaches (which are notoriously efficient).SK-logic
Honestly, just do it the easy way and then benchmark it. At least you'll have a functionally correct prototype for comparison if you do need to hand-code something. Because I can imagine they are slow isn't a good reason for writing something this complex from scratch.Useless
@SK-logic Thanks for that information! Just what I needed.Tim
@Useless It is. But I've already a grammer and a parser generator generated tool. So this is my next step.Tim

1 Answers

0
votes

In case it helps, here is (not a class or a tutorial but) an example of a hand-written parser: https://github.com/tabatkins/css-parser (however it's explicitly coded for correct/simple correspondence to the specification, and not for optimized for high performance).

The bigger problem is, I expect, to develop the specification for the parsing. Examples of parser specifications include http://dev.w3.org/csswg/css3-syntax/ and a similar one for parsing HTML5.

The prerequisite to using a parser generator is that the language syntax has been defined by a grammar (where the grammar format is supported by the parser generator), instead of by a parsing algorithm.