4
votes

I've tried several different parser generators (Bison, DParser, etc.) that claim to be able to generate GLR parsers i.e., ones that can handle ambiguous grammars. Here is a very simple ambiguous grammar of the type I'm talking about:

START: A | B;
A: C | D;
B: C | D;
C: T1 | T2;
D: T3 | T4;
T1: 't1';
T2: 't2';
T3: 't3';
T4: 't4';

I can generate the parsers just fine, but I get "unresolved ambiguity" errors or just outright crashes when I give the parser input that should be valid. There are no problems of any kind when I change the grammar to an unambiguous version.

What am I not understanding about GLR parsers? I thought the whole point was that in cases of ambiguity, ALL possible parses are tracked up until they merge or reach a dead end. All I need is a parser that can tell me whether there is ANY valid parse of the input.

Thanks for any help.

edit:

This is frustrating. Using %dprec and %merge I've been able to get Bison to handle ambiguous rules and terminals, but it still chokes on very simple but highly pathological pseudo-English grammars of the kind that I need to handle:

S: NP VP | NPA VP;
NPA: D N | NP PP;
NP: D N | NP PP | NPA;
VP: V NP | VP PP;
PP: P NP;
D: "the" | "a";
P: "in" | "with";
V: "saw";
N: "saw" | "girl" | "boy" | "park" | "telescope";

With input "a boy saw a girl", Bison is unable to parse and returns with code 1. Tom on the other hand, deals with this grammar and this input sentence flawlessly, and even naturally handles unknown terminals by just assigning them to all possible terminal types. But unlike Bison, Tom chokes on large grammars. (By "chokes" I mean fails in various different ways. If the failure modes would be helpful, I can report those.)

Anyone have any other ideas?

2

2 Answers

4
votes

Unfortunately, bison really insists on producing a (single) parse, so you have to specify some way to merge ambiguous parses. If you don't, and there is more than one possible parse, bison's GLR parser will indeed complain that the parse is ambiguous.

If you don't really care which of the multiple parses is accepted, then it's not too difficult to bend bison to your will. The simplest way is to just assign a different %dprec to every possibly ambiguous production. Bison will then select whichever applicable production happens to have the best precedence.

You can even get bison to tell you about multiple parses with a simple %merge function; there is an example in the bison manual. (The documentation of this feature isn't great but it might be adequate to your needs. If not, feel free to ask a more specific question.)

I don't have much experience with DParser, but the manual indicates that its behaviour when faced with multiple possible parses is similar: the default is to complain, but you can provide a trivial merge function of your own: (The quote comes from Section 12, Ambiguities)

Ambiguities are resolved automatically based on priorities and associativities. In addition, when the other resolution techniques fail, user defined ambiguity resolution is possible. The default ambiguity handler produces a fatal error on an unresolved ambiguity. This behavior can be replaced with a user defined resolvers the signature of which is provided in dparse.h.


Here's an example bison GLR grammar for the second example. I left out the lexer, which is really not relevant (and slightly embarrassing because I was rushing).

%{
  int yylex();
  void yyerror(const char* msg);
%}

%error-verbose
%glr-parser

%token WORD_A "a"
%token WORD_BOY "boy"
%token WORD_GIRL "girl"
%token WORD_IN "in"
%token WORD_LIKED "liked"
%token WORD_PARK "park"
%token WORD_SAW "saw"
%token WORD_TELESCOPE "telescope"
%token WORD_THE "the"
%token WORD_WITH "with"

%%
S  : NP VP      {puts("S: NP VP");}     %dprec 1
   | NPA VP     {puts("S: NPA VP");}    %dprec 2
   ;
NPA: D N        {puts("NPA: D N");}     %dprec 3
   | NP PP      {puts("NPA: NP PP");}   %dprec 4
   ;
NP : D N        {puts("NP: D N");}      %dprec 6
   | NP PP      {puts("NP: NP PP");}    %dprec 7
   | NPA        {puts("NP: NPA");}      %dprec 10
   ;
VP : V NP       {puts("VP: V NP ");}    %dprec 11
   | VP PP      {puts("VP: VP PP");}    %dprec 12
   ;
PP : P NP       {puts("PP: P NP");}     %dprec 14
   ;
D  : "the"      {puts("D: the");}       %dprec 15
   | "a"        {puts("D: a");}         %dprec 16
   ;
P  : "in"       {puts("P: in");}        %dprec 17
   | "with"     {puts("P: with");}      %dprec 18
   ;
V  : "liked"    {puts("V: liked");}     %dprec 19
   | "saw"      {puts("V: saw");}       %dprec 20
   ;
N  : "girl"     {puts("N: girl");}      %dprec 21
   | "boy"      {puts("N: boy");}       %dprec 22
   | "park"     {puts("N: park");}      %dprec 23
   | "saw"      {puts("N: saw");}       %dprec 24
   | "telescope"{puts("N: telescope");} %dprec 25
   ;
%%

int main(int argc, char** argv) {
  printf("yyparse returned %d\n", yyparse());
  return 0;
}

Compilation:

$ make ambig2
bison30 -v -d -o ambig2.c ambig2.y 
ambig2.y: warning: 6 shift/reduce conflicts [-Wconflicts-sr]
ambig2.y: warning: 10 reduce/reduce conflicts [-Wconflicts-rr]
gcc-4.8 -ggdb -Wall -D_POSIX_C_SOURCE=200809L -std=c99 -c -o ambig2.o ambig2.c
gcc-4.8   ambig2.o   -o ambig2
rm ambig2.o ambig2.c

Sample parses:

$ ./ambig2 <<<"a boy saw a girl"
D: a
N: boy
NPA: D N
V: saw
D: a
N: girl
NPA: D N
NP: NPA
VP: V NP 
S: NPA VP
yyparse returned 0

$ ./ambig2 <<<"a saw saw the saw in a saw"
D: a
N: saw
NPA: D N
V: saw
D: the
N: saw
NPA: D N
NP: NPA
VP: V NP 
P: in
D: a
N: saw
NPA: D N
NP: NPA
PP: P NP
VP: VP PP
S: NPA VP
yyparse returned 0
1
votes

Your grammar doesn't cause GLR parsers to choke.

You need a GLR parsing engine that delivers what GLR parsers are supposed to deliver: parsing in the face of ambiguities, and handing you the result. Presumably you use additional context to resolve the ambiguities. (You can tangle context-checking into the parsing process if you really insist on avoiding producing context-prevented ambiguities. If you do that, you get the kind of complications that the GCC guys had when they tried to parse C and C++ with LALR).

Here's the output for OP's problem, given to our DMS Software Reengineering Toolkit's GLR parser generator. I had to define a lexer and a grammar compatible for DMS:

Lexer (defining individual tokens as words; a more scalable version might have defined word class tokens such as D P V N):

%%
%%main
#skip "\s+"
#skip "[\u000d\u000a]+"
#token 'the' "the"
#token 'a' "a"
#token 'in' "in"
#token 'with' "with"
#token 'saw' "saw"
#token 'girl' "girl"
#token 'boy' "boy"
#token 'park' "park"
#token 'telescope' "telescope"
%%

Grammar (DMS doesn't bother with EBNF):

S = NP VP ;
S = NPA VP ;
NPA = D N ;
NPA = NP PP ;
NP = D N ;
NP = NP PP ;
NP = NPA ;
VP = V NP ;
VP = VP PP ;
PP = P NP ;
D = 'the' ;
D = 'a';
P = 'in' ;
P = 'with' ;
V = 'saw' ;
N = 'saw' ;
N = 'girl' ;
N = 'boy' ;
N = 'park' ;
N = 'telescope' ;

Sample file "aboysawagirl.txt"

a boy saw a girl\n

From start to finish, building lexer and parser (about 10 minutes of fumbling...)

Parsing the sample file and dumping the automatically built tree:

C:\DMS\Domains\simpenglish\Tools\Parser\Source>run ..\domainparser ++AST ..\..\Lexer\aboysawagirl.txt
simpenglish Domain Parser Version 2.5.15
Copyright (C) 1996-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
24 tree nodes in tree.
3 ambiguity nodes in tree.
(AMBIGUITY<S=11>@simpenglish=31@#1f35140^0{2} Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
 (S@simpenglish=1@#1f350e0^1#1f35140:1 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
  (AMBIGUITY<NP=12>@simpenglish=31@#1f34ba0^1#1f350e0:1{2} Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   (NP@simpenglish=5@#1f34b80^1#1f34ba0:1 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |(D@simpenglish=12@#1f34aa0^2#1f34b80:1#1f34b40:1 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | ('a'@simpenglish=22@#1f349c0^1#1f34aa0:1[Keyword:0] Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'a'
   |)D#1f34aa0
   |(N@simpenglish=18@#1f34b20^2#1f34b80:2#1f34b40:2 Line 1 Column 3 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | ('boy'@simpenglish=27@#1f34a80^1#1f34b20:1[Keyword:0] Line 1 Column 3 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'boy'
   |)N#1f34b20
   )NP#1f34b80
   (NP@simpenglish=7@#1f34c60^1#1f34ba0:2 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |(NPA@simpenglish=3@#1f34b40^2#1f35040:1#1f34c60:1 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | (D@simpenglish=12@#1f34aa0^2... [ALREADY PRINTED] ...)
   | (N@simpenglish=18@#1f34b20^2... [ALREADY PRINTED] ...)
   |)NPA#1f34b40
   )NP#1f34c60
  )AMBIGUITY#1f34ba0
  (VP@simpenglish=8@#1f34fc0^1#1f350e0:2 Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   (V@simpenglish=15@#1f34d60^1#1f34fc0:1 Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |('saw'@simpenglish=25@#1f34b00^2#1f34d60:1#1f34d40:1[Keyword:0] Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'saw'
   )V#1f34d60
   (AMBIGUITY<NP=12>@simpenglish=31@#1f34f00^2#1f34f80:2#1f34fc0:2{2} Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |(NP@simpenglish=5@#1f34e60^1#1f34f00:1 Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | (D@simpenglish=12@#1f34da0^2#1f34e60:1#1f34de0:1 Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |  ('a'@simpenglish=22@#1f34ce0^1#1f34da0:1[Keyword:0] Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'a'
   | )D#1f34da0
   | (N@simpenglish=17@#1f34dc0^2#1f34e60:2#1f34de0:2 Line 1 Column 13 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |  ('girl'@simpenglish=26@#1f34d80^1#1f34dc0:1[Keyword:0] Line 1 Column 13 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'girl'
   | )N#1f34dc0
   |)NP#1f34e60
   |(NP@simpenglish=7@#1f34f20^1#1f34f00:2 Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | (NPA@simpenglish=3@#1f34de0^1#1f34f20:1 Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |  (D@simpenglish=12@#1f34da0^2... [ALREADY PRINTED] ...)
   |  (N@simpenglish=17@#1f34dc0^2... [ALREADY PRINTED] ...)
   | )NPA#1f34de0
   |)NP#1f34f20
   )AMBIGUITY#1f34f00
  )VP#1f34fc0
 )S#1f350e0
 (S@simpenglish=2@#1f35040^1#1f35140:2 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
  (NPA@simpenglish=3@#1f34b40^2... [ALREADY PRINTED] ...)
  (VP@simpenglish=8@#1f34f80^1#1f35040:2 Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   (V@simpenglish=15@#1f34d40^1#1f34f80:1 Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |('saw'@simpenglish=25@#1f34b00^2... [ALREADY PRINTED] ...)
   )V#1f34d40
   (AMBIGUITY<NP=12>@simpenglish=31@#1f34f00^2... [ALREADY PRINTED] ...)
  )VP#1f34f80
 )S#1f35040
)AMBIGUITY#1f35140

Your simple english grammar parser parses your example sentence in different ways.

This is a lot more spectacular with a full C++11 grammar.