1
votes

I'm trying to figure out why my Antlr-generated parser is not recognizing part of an input as matching one of my rules (the "and_converge" rule, part of "gateway"). My grammar looks like:

process           :   PROCESS id CURLY_OPEN (stmt_list | pool_list) CURLY_CLOSE EOF ; /* starting rule */
stmt_list         :   (stmt STMT_TERM?)* ;
stmt              :   sequence | sequence_elem | association ;
sequence          :   sequence_elem sequence_flow sequence_elem (sequence_flow sequence_elem)* ;
sequence_elem     :   activity | gateway | event | link ;
activity          :   task | subprocess ;
task              :   SQRE_OPEN task_type id
                      (VERT_LINE (input_set)? (output_set)?)?   /* input/output sets */
                      (VERT_LINE attr_list)?                    /* attributes for activity */
                      (VERT_LINE boundary_event)*               /* associated boundary events */
                      SQRE_CLOSE ;
task_type         :   USER | SERVICE | SCRIPT ;
event             :   PAREN_OPEN event_type id (VERT_LINE attr_list)? PAREN_CLOSE ;
gateway           :   ANGLE_OPEN (fork_diverge | condition_diverge | event_diverge | and_converge | or_converge) ANGLE_CLOSE ;
fork_diverge      :   FORK id (VERT_LINE attr_list)? VERT_LINE outflows ;
event_diverge     :   EVENT_SPLIT VERT_LINE event_links ;
condition_diverge :   (OR_SPLIT | XOR_SPLIT) id (VERT_LINE attr_list)? VERT_LINE cond_outflows ;
and_converge      :   JOIN id (VERT_LINE attr_list)? (VERT_LINE inflows)? ;
or_converge       :   (XOR_JOIN | OR_JOIN) id (VERT_LINE attr_list)? (VERT_LINE inflows)? ;
inflows           :   IN ':' link_list ;
outflows          :   OUT ':' link_list ;
cond_outflows     :   OUT ':' cond_outflow (',' cond_outflow)* (DEFAULT ':' link)?;
cond_outflow      :   expression ':' link ;

For the sake of brevity I have omitted a large portion of the grammar, but you can see the full version here: https://github.com/bspies/dotbpm/blob/dotbpm-parser/src/main/java/dot/bpm/parser/antlr/DOTBPM.g4.

When I give input like the following to the grammar, it fails:

/* A fork followed by a join, 4 "sequences" altogether */
process fork_join {
   (> start) ==> [user t1] ==>
   <fork g1 | out: #[t2], #[t3]>
   [user t2] ==> #<g2>
   [user t3] ==> #<g2>
   <join g2> ==> (/ end)
}

It fails on the line with the "join" gateway: line 7:4 no viable alternative at input '<join'. Debugging it, it seems to fail at the "stmt" rule, being unable to tell which alternative to take. This is rather puzzling given that the "fork" rule that proceeds it works fine and takes a very similar path through the grammar, i.e. through the "gateway" rule.

The parse tree is here:

enter image description here

1
I recommend to print the tokens your lexer recognized, to see if that matches what you (and your parser) expected. - Mike Lischke
Yes, you can see from the parse tree. Everything up to #<g2> ("gateway_link") before the parse error is exactly as expected, and even the (/ end) ("event") after the error is correctly recognized. It's only the < (ANGLE_OPEN), join, and ==> (sequence flow) tokens that are lost. - Brennan Spies

1 Answers

1
votes

If you run this code:

String source = "/* A fork followed by a join, 4 \"sequences\" altogether */\n" +
        "process fork_join {\n" +
        "   (> start) ==> [user t1] ==>\n" +
        "   <fork g1 | out: #[t2], #[t3]>\n" +
        "   [user t2] ==> #<g2>\n" +
        "   [user t3] ==> #<g2>\n" +
        "   <join g2> ==> (/ end)\n" +
        "}";

DOTBPMLexer lexer = new DOTBPMLexer(CharStreams.fromString(source));

CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();

for (Token t : stream.getTokens()) {
    System.out.printf("type=%-20s text=`%s`%n", DOTBPMLexer.VOCABULARY.getDisplayName(t.getType()), t.getText().replace("\n", "\\n"));
}

and inspect its output:

type=PROCESS              text=`process`
type=ID                   text=`fork_join`
type='{'                  text=`{`
type='('                  text=`(`
type='>'                  text=`>`
type=ID                   text=`start`
type=')'                  text=`)`
type='==>'                text=`==>`
type='['                  text=`[`
type=USER                 text=`user`
type=ID                   text=`t1`
type=']'                  text=`]`
type='==>'                text=`==>`
type='<'                  text=`<`
type=FORK                 text=`fork`
type=ID                   text=`g1`
type='|'                  text=`|`
type=OUT                  text=`out`
type=':'                  text=`:`
type='#'                  text=`#`
type='['                  text=`[`
type=ID                   text=`t2`
type=']'                  text=`]`
type=','                  text=`,`
type='#'                  text=`#`
type='['                  text=`[`
type=ID                   text=`t3`
type=']'                  text=`]`
type='>'                  text=`>`
type='['                  text=`[`
type=USER                 text=`user`
type=ID                   text=`t2`
type=']'                  text=`]`
type='==>'                text=`==>`
type='#'                  text=`#`
type='<'                  text=`<`
type=ID                   text=`g2`
type='>'                  text=`>`
type='['                  text=`[`
type=USER                 text=`user`
type=ID                   text=`t3`
type=']'                  text=`]`
type='==>'                text=`==>`
type='#'                  text=`#`
type='<'                  text=`<`
type=ID                   text=`g2`
type='>'                  text=`>`
type='<'                  text=`<`
type=ID                   text=`join`
type=ID                   text=`g2`
type='>'                  text=`>`
type='==>'                text=`==>`
type='('                  text=`(`
type='/'                  text=`/`
type=ID                   text=`end`
type=')'                  text=`)`
type='}'                  text=`}`
type=EOF                  text=`<EOF>`

you would see that the input join became an ID, not a JOIN. This is what Mike meant by printing the tokens first.

This is because you have a typo:

JOIN           :   [Jj}[Oo][Ii][Nn] ;                     /* synchronized merge (AND) */
OR_JOIN        :   [Oo][Rr]'-'[Jj}[Oo][Ii][Nn] ;          /* structured synchronized merge (OR) */
XOR_JOIN       :   [Xx][Oo][Rr]'-'[Jj}[Oo][Ii][Nn] ;      /* unsynchronized merge (XOR) */

note the [Jj}[Oo] in all of these lexer rules, which will match a single character (one of: J, j, }, [, O or o).

What you (most probably) meant is this:

JOIN           :   [Jj][Oo][Ii][Nn] ;                     /* synchronized merge (AND) */
OR_JOIN        :   [Oo][Rr]'-'[Jj][Oo][Ii][Nn] ;          /* structured synchronized merge (OR) */
XOR_JOIN       :   [Xx][Oo][Rr]'-'[Jj][Oo][Ii][Nn] ;      /* unsynchronized merge (XOR) */