1
votes

I've created a grammar to express a search through a Map using key and value pairs using an ANTLR4 grammar file:

START: 'SEARCH FOR';
VALUE_EXPRESSION: 'VALUE:'[a-zA-Z0-9]+;
MATCH: 'MATCHING';
COMMA: ',';
KEY_EXPRESSION: 'KEY:'[a-zA-Z0-9]*; 
KEY_VALUE_PAIR: KEY_EXPRESSION MATCH VALUE_EXPRESSION;
r : START KEY_VALUE_PAIR (COMMA KEY_VALUE_PAIR)*;
WS: [ \n\t\r]+ -> skip;

The "Interpret Lexer" in ANTLRWorks produces:

enter image description here

And the "Parse Tree" like this:

enter image description here

I'm not sure if this is the correct (or even typical) way to go about parsing an input string but what I'd like to do is have each of the key/value pairs split up and placed under a parent node like such:

[SEARCH FOR] [PAIR],     [PAIR]
                |           |
               / \         / \
              /   \       /   \
             /     \     /     \
         colour    red size   small

My belief is that in doing this It will make like easier when I come to walk the tree.

I've searched around and tried to use the caret '^' character to specify the parent but ANTLRWorks always indicates that there is an error in my grammar.

Can anybody help with this, or possibly supply another solution (if this is an atypical approach)?

1

1 Answers

0
votes

You can probably simplify this even further. You might want to have a LEXER rule for your keys to keep track of them. So below, I am simply using string as the key. But you could define a lexer rule for 'colour', 'size', etc... Also, I did away with the matching. Instead, I created a set of pairs.

grammar GRAMMAR;

start: START set ;

set
    :   pair (',' pair)*
    ;

pair:   STRING ':' value ;

value
    :   STRING
    |   NUMBER
    ;

START: 'SEARCH FOR: ' ;
STRING :  '"' [a-zA-Z_0-9]* '"' ;

NUMBER
    :   '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5
    |   '-'? INT EXP             // 1e10 -3e4
    |   '-'? INT                 // -3, 45
    ;

fragment INT :   '0' | [1-9] [0-9]* ; // no leading zeros
fragment EXP :   [Ee] [+\-]? INT ; // \- since - means "range" inside [...]

WS  :   [ \t\n\r]+ -> skip ;