4
votes

I'm trying to build a simple lexer/parser with Alex/Happy in Haskell, and I would like to keep some localisation information from the text file into my final AST.

I managed to build a lexer using Alex that build a list of Tokens with localisation:

data Token = Token AlexPosn Foo Bar
lexer :: String -> [Token]

in my Happy file, when declaring the %token part, I can declare what are the semantic part of the token with the $$ symbol

%token FOO  { Token _ $$ _ }

and in the parsing rule, the $i will refer to this $$.

foo_list: FOO  { [$1] }
        | foo_list FOO { $2 : $1 }

Is there a way to refer to the AlexPosn part and to the Foo part of the FOO token ? Right now I only know how do refer to only one of them. I can find information on a way to ''add several $$'', and to refer to them afterwards.

Is there a way to do so ?

V.

2
In fact, it doesn't seem possible even in the C flex/bison, so it should not be possible directly in haskell or caml. However, I could use a tuple data Token = Token (AlexPosn,Foo,Bar)) instead of several arguments. I'm leaving the question opened for a few days but I think I'll close it soon.Vinz

2 Answers

4
votes

In the end, I did find 2 solutions:

  • pack all the meaning data in a tuple, so that $$ point to this tuple, then extract the data by projection:

    data Token = Token (AlexPosn,Foo) Bar
    %token FOO { Token $$ some_bar }
    rule : FOO  { Ast (fst $1) (snd $1) }
    
  • do not use $$ at all: if you don't use $$, happy will give you the full token during the parsing, so it is up to you to extract what you really need from this token:

    data Token = Token AlexPosn Foo Bar
    %token FOO = { Token _ _ some_bar }
    rule : FOO  { Ast (get_pos $1) (get_foo $1) }
    
    get_pos :: Token -> AlexPosn
    get_foo :: Token -> Foo
    

    ...

I think the first one is the most elegant. The second one can be quite heavy in term of lines of code if you are carrying a lot of information: you will have to build "projections" by hand (pattern matching and so on), and doing so in a safe way can be tricky if your token type is quite big.

1
votes

It is also possible to keep multiple values like this:

data Token = Token AlexPosn Foo Bar
%token FOO { Token pos foo some_bar }
rule : FOO { Ast pos foo }

Although I'm not sure if Happy actually guarantees that this will always work. The reason for why it (maybe) works is that happy will generate code that pattern matches on Token pos foo some_bar, making pos and foo available in Ast pos foo.