I want to parse the Rulebook "demo.rb" files like below:
rulebook Titanic-Normalization {
version 1
meta {
description "Test"
source "my-rules.xslx"
user "joltie"
}
rule remove-first-line {
description "Removes first line when offset is zero"
when(present(offset) && offset == 0) then {
filter-row-if-true true;
}
}
}
I wrote the ANTLR4 grammar file Rulebook.g4 like below. For now, it can parse the *.rb files generally well, but it throw unexpected error when encounter the "expression" / "statement" rules.
grammar Rulebook;
rulebookStatement
: KWRulebook
(GeneralIdentifier | Identifier)
'{'
KWVersion
VersionConstant
metaStatement
(ruleStatement)+
'}'
;
metaStatement
: KWMeta
'{'
KWDescription
StringLiteral
KWSource
StringLiteral
KWUser
StringLiteral
'}'
;
ruleStatement
: KWRule
(GeneralIdentifier | Identifier)
'{'
KWDescription
StringLiteral
whenThenStatement
'}'
;
whenThenStatement
: KWWhen '(' expression ')'
KWThen '{' statement '}'
;
primaryExpression
: GeneralIdentifier
| Identifier
| StringLiteral+
| '(' expression ')'
;
postfixExpression
: primaryExpression
| postfixExpression '[' expression ']'
| postfixExpression '(' argumentExpressionList? ')'
| postfixExpression '.' Identifier
| postfixExpression '->' Identifier
| postfixExpression '++'
| postfixExpression '--'
;
argumentExpressionList
: assignmentExpression
| argumentExpressionList ',' assignmentExpression
;
unaryExpression
: postfixExpression
| '++' unaryExpression
| '--' unaryExpression
| unaryOperator castExpression
;
unaryOperator
: '&' | '*' | '+' | '-' | '~' | '!'
;
castExpression
: unaryExpression
| DigitSequence // for
;
multiplicativeExpression
: castExpression
| multiplicativeExpression '*' castExpression
| multiplicativeExpression '/' castExpression
| multiplicativeExpression '%' castExpression
;
additiveExpression
: multiplicativeExpression
| additiveExpression '+' multiplicativeExpression
| additiveExpression '-' multiplicativeExpression
;
shiftExpression
: additiveExpression
| shiftExpression '<<' additiveExpression
| shiftExpression '>>' additiveExpression
;
relationalExpression
: shiftExpression
| relationalExpression '<' shiftExpression
| relationalExpression '>' shiftExpression
| relationalExpression '<=' shiftExpression
| relationalExpression '>=' shiftExpression
;
equalityExpression
: relationalExpression
| equalityExpression '==' relationalExpression
| equalityExpression '!=' relationalExpression
;
andExpression
: equalityExpression
| andExpression '&' equalityExpression
;
exclusiveOrExpression
: andExpression
| exclusiveOrExpression '^' andExpression
;
inclusiveOrExpression
: exclusiveOrExpression
| inclusiveOrExpression '|' exclusiveOrExpression
;
logicalAndExpression
: inclusiveOrExpression
| logicalAndExpression '&&' inclusiveOrExpression
;
logicalOrExpression
: logicalAndExpression
| logicalOrExpression '||' logicalAndExpression
;
conditionalExpression
: logicalOrExpression ('?' expression ':' conditionalExpression)?
;
assignmentExpression
: conditionalExpression
| unaryExpression assignmentOperator assignmentExpression
| DigitSequence // for
;
assignmentOperator
: '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|='
;
expression
: assignmentExpression
| expression ',' assignmentExpression
;
statement
: expressionStatement
;
expressionStatement
: expression+ ';'
;
KWRulebook: 'rulebook';
KWVersion: 'version';
KWMeta: 'meta';
KWDescription: 'description';
KWSource: 'source';
KWUser: 'user';
KWRule: 'rule';
KWWhen: 'when';
KWThen: 'then';
KWTrue: 'true';
KWFalse: 'false';
fragment
LeftParen : '(';
fragment
RightParen : ')';
fragment
LeftBracket : '[';
fragment
RightBracket : ']';
fragment
LeftBrace : '{';
fragment
RightBrace : '}';
Identifier
: IdentifierNondigit
( IdentifierNondigit
| Digit
)*
;
GeneralIdentifier
: Identifier
('-' Identifier)+
;
fragment
IdentifierNondigit
: Nondigit
//| // other implementation-defined characters...
;
VersionConstant
: DigitSequence ('.' DigitSequence)*
;
DigitSequence
: Digit+
;
fragment
Nondigit
: [a-zA-Z_]
;
fragment
Digit
: [0-9]
;
StringLiteral
: '"' SCharSequence? '"'
| '\'' SCharSequence? '\''
;
fragment
SCharSequence
: SChar+
;
fragment
SChar
: ~["\\\r\n]
| '\\\n' // Added line
| '\\\r\n' // Added line
;
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
BlockComment
: '/*' .*? '*/'
-> skip
;
LineComment
: '//' ~[\r\n]*
-> skip
;
I tested the Rulebook parser with unit test like below:
public void testScanRulebookFile() throws IOException {
String fileName = "C:\\rulebooks\\demo.rb";
FileInputStream fis = new FileInputStream(fileName);
// create a CharStream that reads from standard input
CharStream input = CharStreams.fromStream(fis);
// create a lexer that feeds off of input CharStream
RulebookLexer lexer = new RulebookLexer(input);
// create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// create a parser that feeds off the tokens buffer
RulebookParser parser = new RulebookParser(tokens);
RulebookStatementContext context = parser.rulebookStatement();
// WhenThenStatementContext context = parser.whenThenStatement();
System.out.println(context.toStringTree(parser));
// ParseTree tree = parser.getContext(); // begin parsing at init rule
// System.out.println(tree.toStringTree(parser)); // print LISP-style tree
}
For the "demo.rb" as above, the parser got the error as below. I also print the RulebookStatementContext as toStringTree.
line 12:25 mismatched input '&&' expecting ')'
(rulebookStatement rulebook Titanic-Normalization { version 1 (metaStatement meta { description "Test" source "my-rules.xslx" user "joltie" }) (ruleStatement rule remove-first-line { description "Removes first line when offset is zero" (whenThenStatement when ( (expression (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (postfixExpression (primaryExpression present)) ( (argumentExpressionList (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression offset))))))))))))))))) ))))))))))))))))) && offset == 0 ) then { filter-row-if-true true ;) }) })
I also write the unit test to test short input context like "when (offset == 0) then {\n" + "filter-row-if-true true;\n" + "}\n"
to debug the problem. But it still got the error like:
line 1:16 mismatched input '0' expecting {'(', '++', '--', '&&', '&', '*', '+', '-', '~', '!', Identifier, GeneralIdentifier, DigitSequence, StringLiteral}
line 2:19 extraneous input 'true' expecting {'(', '++', '--', '&&', '&', '*', '+', '-', '~', '!', ';', Identifier, GeneralIdentifier, DigitSequence, StringLiteral}
With two day's tries, I didn't got any progress. The question is so long as above, please someone give me some advises about how to debug ANTLR4 grammar extraneous / mismatched input error.