1
votes

I'm using ANTLRv3 to parse input that looks like this:

* this is an outline item at level 1
** item at level 2
*** item at level 3
* another item at level 1
* an item with *bold* text

Stars at the beginning of a line mark the start of an outline item. Stars can also be part of an item's text (e.g. *bold*).

This is the grammar to parse outline items without support for stars in the item text:

outline_item: OUTLINE_ITEM_MARKER ITEM_TEXT;
OUTLINE_ITEM_MARKER: STAR_IN_COLUMN_ZERO STAR* (' '|'\t');
ITEM_TEXT: ('a'..'z'|'A'..'Z'|'0'..'9'|'\r'|'\n'|' '|'\t')+;
fragment STAR_IN_COLUMN_ZERO: {getCharPositionInLine()==0}? '*';
fragment STAR: {getCharPositionInLine()>0}? '*';

For the input *** foo bar ANTLR produces the following parse tree:

without_star_in_item_text

So far this works as expected. Now I'm trying to add star to the possible characters for the item text, so I changed the lexer rule for ITEM_TEXT to the following:

ITEM_TEXT: ('a'..'z'|'A'..'Z'|'0'..'9'|'\r'|'\n'|' '|'\t'|STAR)+;

Now for the same input the following parse tree is produced:

with_star_in_item_text

This is the output in ANTLRWorks:

input.txt line 1:0 rule STAR failed predicate: {getCharPositionInLine()>0}?
input.txt line 1:1 missing OUTLINE_ITEM_MARKER at '** foo bar'

It seems that OUTLINE_ITEM_MARKER didn't match due to a MissingTokenException. What's wrong with the grammar, what do I need to change to allow stars to be part of ITEM_TEXT?

2

2 Answers

2
votes

Instead of a validating semantic predicate, use a gated semantic predicate 1 in your fragments.

The following grammar:

grammar Test;

outline_items
 : outline_item+ EOF
 ;

outline_item
 : OUTLINE_ITEM_MARKER ITEM_TEXT
 ;

OUTLINE_ITEM_MARKER 
 : STAR_IN_COLUMN_ZERO STAR* (' '|'\t')
 ;

ITEM_TEXT
 : ('a'..'z'|'A'..'Z'|'0'..'9'|'\r'|'\n'|' '|'\t'|STAR)+
 ;

fragment STAR_IN_COLUMN_ZERO
 : {getCharPositionInLine()==0}?=> '*'
 ;

fragment STAR
 : {getCharPositionInLine()>0}?=> '*'
 ;

Your input:

* this is an outline item at level 1
** item at level 2
*** item at level 3
* another item at level 1
* an item with *bold* text

will then be parsed as this:

enter image description here

1What is a 'semantic predicate' in ANTLR?

0
votes

Have you tried making your grammar simpler?

outline_item: OUTLINE_ITEM_MARKER ITEM_TEXT;

ITEM_TEXT:
    (' '|'\t') (' '|'\t'|'a'..'z'|'A'..'Z'|'0'..'9'| STAR)+
;

OUTLINE_ITEM_MARKER:
    STAR+ 
;

fragment STAR:   
    '*'
;

Or if you don't need to keep STAR as an explicit fragment, and you want to capture all characters in the item text, and not a subset:

outline_item: OUTLINE_ITEM_MARKER ITEM_TEXT;

ITEM_TEXT:
    (' '|'\t') (~('\n'|'\r'))+
;

OUTLINE_ITEM_MARKER:
    '*'+ 
;