How do I extract useful information from the payload of a GADT / existential type?

Question

I'm trying to use Menhir's incremental parsing API and introspection APIs in a generated parser. I want to, say, determine the semantic value associated with a particular LR(1) stack entry; i.e. a token that's been previously consumed by the parser.

Given an abstract parsing checkpoint, encapsulated in Menhir's type 'a env, I can extract a “stack element” from the LR automaton; it looks like this:

type element =
  | Element: 'a lr1state * 'a * position * position -> element
The type element describes one entry in the stack of the LR(1) automaton. In a stack element of the form Element (s, v, startp, endp), s is a (non-initial) state and v is a semantic value. The value v is associated with the incoming symbol A of the state s. In other words, the value v was pushed onto the stack just before the state s was entered. Thus, for some type 'a, the state s has type 'a lr1state and the value v has type 'a ...

In order to do anything useful with the value v, one must gain information about the type 'a, by inspection of the state s. So far, the type 'a lr1state is abstract, so there is no way of inspecting s. The inspection API (§9.3) offers further tools for this purpose.

Okay, cool! So I go and dive into the inspection API:

The type 'a terminal is a generalized algebraic data type (GADT). A value of type 'a terminal represents a terminal symbol (without a semantic value). The index 'a is the type of the semantic values associated with this symbol ...
type _ terminal =
| T_A : unit terminal
| T_B : int terminal
The type 'a nonterminal is also a GADT. A value of type 'a nonterminal represents a nonterminal symbol (without a semantic value). The index 'a is the type of the semantic values associated with this symbol ...
type _ nonterminal =
| N_main : thing nonterminal

Piecing these together, I get something like the following (where "command" is one of my grammar's nonterminals, and thus N_command is a string nonterminal):

let current_command (env : 'a env) =
   let rec f i =
      match Interpreter.get i env with
      | None -> None
      | Some Interpreter.Element (lr1state, v, _startp, _endp) ->
      match Interpreter.incoming_symbol lr1state with
      | Interpreter.N Interpreter.N_command -> Some v
      | _ -> f (i + 1)
   in
   f 0

Unfortunately, this is puking up very confusing type-errors for me:

File "src/incremental.ml", line 110, characters 52-53:
Error: This expression has type string but an expression was expected of type
         string
       This instance of string is ambiguous:
       it would escape the scope of its equation

This is a bit above my level! I'm pretty sure I understand why I can't do what I tried to do above; but I don't understand what my alternatives are. In fact, the Menhir manual specifically mentions this complexity:

This function can be used to gain access to the semantic value v in a stack element Element (s, v, _, _). Indeed, by case analysis on the symbol incoming_symbol s, one gains information about the type 'a, hence one obtains the ability to do something useful with the value v.

Okay, but that's what I thought I did, above: case-analysis by match'ing on incoming_symbol s, pulling out the case where v is of a single, specific type: string.

tl;dr: how do I extract the string payload from this GADT, and do something useful with it?

octachron octachron · Accepted Answer · 2019-08-11T18:18:25

If your error sounds like

This instance of string is ambiguous: it would escape the scope of its equation

it means that the type checker is not really sure if outside of the pattern matching branch the type of v should be a string, or another type that is equal to string but only inside the branch. You just need to add a type annotation when leaving the branch to remove this ambiguity:

 | Interpreter.(N N_command) -> Some (v:string)

How do I extract useful information from the payload of a GADT / existential type?

1 Answers