I'm trying to use Menhir's incremental parsing API and introspection APIs in a generated parser. I want to, say, determine the semantic value associated with a particular LR(1) stack entry; i.e. a token that's been previously consumed by the parser.
Given an abstract parsing checkpoint, encapsulated in Menhir's type 'a env
, I can extract a “stack element” from the LR automaton; it looks like this:
type element = | Element: 'a lr1state * 'a * position * position -> element
The type element describes one entry in the stack of the LR(1) automaton. In a stack element of the form
Element (s, v, startp, endp)
,s
is a (non-initial) state andv
is a semantic value. The valuev
is associated with the incoming symbol A of the states
. In other words, the valuev
was pushed onto the stack just before the states
was entered. Thus, for some type'a
, the states
has type'a lr1state
and the valuev
has type'a
...In order to do anything useful with the value
v
, one must gain information about the type'a
, by inspection of the states
. So far, the type'a lr1state
is abstract, so there is no way of inspectings
. The inspection API (§9.3) offers further tools for this purpose.
Okay, cool! So I go and dive into the inspection API:
The type 'a terminal is a generalized algebraic data type (GADT). A value of type 'a terminal represents a terminal symbol (without a semantic value). The index 'a is the type of the semantic values associated with this symbol ...
type _ terminal = | T_A : unit terminal | T_B : int terminal
The type 'a nonterminal is also a GADT. A value of type 'a nonterminal represents a nonterminal symbol (without a semantic value). The index 'a is the type of the semantic values associated with this symbol ...
type _ nonterminal = | N_main : thing nonterminal
Piecing these together, I get something like the following (where "command" is one of my grammar's nonterminals, and thus N_command
is a string nonterminal
):
let current_command (env : 'a env) =
let rec f i =
match Interpreter.get i env with
| None -> None
| Some Interpreter.Element (lr1state, v, _startp, _endp) ->
match Interpreter.incoming_symbol lr1state with
| Interpreter.N Interpreter.N_command -> Some v
| _ -> f (i + 1)
in
f 0
Unfortunately, this is puking up very confusing type-errors for me:
File "src/incremental.ml", line 110, characters 52-53:
Error: This expression has type string but an expression was expected of type
string
This instance of string is ambiguous:
it would escape the scope of its equation
This is a bit above my level! I'm pretty sure I understand why I can't do what I tried to do above; but I don't understand what my alternatives are. In fact, the Menhir manual specifically mentions this complexity:
This function can be used to gain access to the semantic value
v
in a stack elementElement (s, v, _, _)
. Indeed, by case analysis on the symbolincoming_symbol s
, one gains information about the type'a
, hence one obtains the ability to do something useful with the valuev
.
Okay, but that's what I thought I did, above: case-analysis by match
'ing on incoming_symbol s
, pulling out the case where v
is of a single, specific type: string
.
tl;dr: how do I extract the string payload from this GADT, and do something useful with it?