2
votes

I'm trying to get a simple grammar to work using ANTLR4. Basically a list of keywords separated by ; that can be negated using Not. Something like this, for example:

Not negative keyword;positive

I wrote the following grammar:

grammar input;

input               : clauses;
keyword             : NOT? WORD;
clauses             : keyword (SEPARATOR clauses)?;

fragment N          : ('N'|'n') ;
fragment O          : ('O'|'o') ;
fragment T          : ('T'|'t') ;
fragment SPACE      : ' ' ;

SEPARATOR           : ';';
NOT                 : N O T SPACE;
WORD                : ~[;]+;

My issue is that in the keyword rule, WORD seems to have more priority than NOT. Not something is recognized as the Not something word instead of a negated something.

For instance, the parse tree I get is this

this.

What I'm trying to achieve is something like this

like this

How can you give an expression more priority over another on ANTLR4? Any tip on fixing this?

Please note that while this grammar is very simple and ANTLR4 can seem unecessary here, the true grammar I want to make is more complex and I have just simplified it here to demonstrate my issue.

Thank you for your time!

1
Possible duplicate of ANTLRv4: non-greedy rulessepp2k
Note that ~[';'] is equivalent to ~[';] and matches any character other than ' or ; (repeating a character in a character class does nothing). I assume you want just ~[;] or ~';' (both of which mean "any character other than ;").sepp2k
Thanks for the clarification, I fixed it on my code.Tommy228

1 Answers

0
votes

You have no explicit whitespace rule and you include whitespaces in your WORD rule. Yet you want words separated by whitespaces. That cannot work. Don't include whitespaces in words (that's against the usual meaning of a word anyway). Instead specify exactly what a word is really (usually a combination of letters and digits, not led by a letter). Additionally, I would restructure the grammar such that positive and negative are not part of keyword, but separate entitites. Here I defined them as own keywords, but if that is not what you want replace them with just WORD:

grammar input;

input               : clauses EOF;
keyword             : NOT? (POSITIVE | NEGATIVE) WORD?;
clauses             : keyword (SEPARATOR keyword)*;

fragment A: [aA];
fragment B: [bB];
fragment C: [cC];
fragment D: [dD];
fragment E: [eE];
fragment F: [fF];
fragment G: [gG];
fragment H: [hH];
fragment I: [iI];
fragment J: [jJ];
fragment K: [kK];
fragment L: [lL];
fragment M: [mM];
fragment N: [nN];
fragment O: [oO];
fragment P: [pP];
fragment Q: [qQ];
fragment R: [rR];
fragment S: [sS];
fragment T: [tT];
fragment U: [uU];
fragment V: [vV];
fragment W: [wW];
fragment X: [xX];
fragment Y: [yY];
fragment Z: [zZ];

SEPARATOR : ';';
NOT       : N O T;
POSITIVE  : P O S I T I V E;
NEGATIVE  : N E G A T I V E;

fragment LETTER: DIGIT | LETTER_NO_DIGIT;
fragment LETTER_NO_DIGIT: [a-zA-Z_$\u0080-\uffff];
WORD: LETTER_NO_DIGIT LETTER*;
WHITESPACE: [ \t\f\r\n] -> channel(HIDDEN);
fragment DIGIT:    [0-9];
fragment DIGITS: DIGIT+;

which gives you this parse tree for your input:

enter image description here