12
votes

Mixing the lexer and parsing phases in one phase sometimes makes Parsec parsers less readable but also slows them down. One solution is to use Alex as a tokenizer and then Parsec as a parser of the token stream.

This is fine but it would be even better if I could get rid of Alex because it adds one preprocessing phase in the compilation pipeline, doesn't integrate well with haskell "IDEs", etc. I was wondering if there was such a thing as an haskell EDSL for describing tokenizers, very much in the style of Alex, but as a library.

2
This is a question that I have been looking into as of late but there have been nothing I've really seen. I'm imagining maybe a RegEx EDSL from which we make an untagged tokenizer (:: [RegEx] -> String -> [String]).Jason Reich
I could come up with a quick solution using any regexp library by trying to match the current string agains each regexp, but I would lose a lot of Alex' optimizations due to its knowledge of the set of all regexps.Paul Brauner

2 Answers

4
votes

Yes - http://www.cse.unsw.edu.au/~chak/papers/Cha99.html

Before Hackage, Manuel used to release the code in a package called CTK (compiler toolkit). I'm not sure what the status of project is these days.

I think Thomas Hallgren's lexer from the paper "Lexing Haskell in Haskell" was dynamic rather than a code generator, whilst the release is tailored to lexing Haskell the machinery in the library is more general. Iavor Diatchki has put the code on Hackage.

http://hackage.haskell.org/package/haskell-lexer

3
votes

You can use Parsec as the lexer too. First you parse the string into tokens, then you parse the tokens into the target data type.