I got this parser grammar with which I also want to use something similar to Javascript template-strings.
parser grammar Test;
options {
tokenVocab = TestLexer;
}
definition: sourceElements? EOF ;
sourceElements: sourceElement+ ;
sourceElement: mapping ;
templateString: '`' TemplateStringCharacter* ('${' variable '}' TemplateStringCharacter*)+ '`' ;
fieldName: varname | ('[' value ']') ;
mapping: fieldName ':' ( '{' sourceElements '}'
| variable ( '{' sourceElements '}' )? '?'?
| value
| array )
;
funParameter: '(' value? (',' value)* ')' ;
array: '[' value? (',' value)* ']';
variable: (varname | '{' value '}' | '[' boolEx ']' | templateString) funParameter? ('.' variable)* ;
value: INT | BOOL | FLOAT | STRING | variable ;
varname: VAR ;
And this lexer grammar
lexer grammar TestLexer;
WS : [ \t\r\n\u000C]+ -> skip ;
NEWLINE : [\r\n] ;
BOOL : ('true'|'false') ;
TemplateStringLiteral : TemplateStringCharacter*;
VAR : [$]?[a-zA-Z0-9_]+|[@] ;
INT : '-'?[0-9]+ ;
FLOAT : '-'?[0-9]+'.'[0-9]+ ;
STRING : '"' DoubleStringCharacter* '"' | '\'' SingleStringCharacter* '\'' ;
TEMPSTART : '${' ;
TEMPEND : '}' ;
TemplateStart : '`' -> pushMode(template) ;
/// Comments
MultiLineComment : '/*' .*? '*/' -> channel(HIDDEN) ;
SingleLineComment : '//' ~[\r\n\u2028\u2029]* -> channel(HIDDEN) ;
mode template;
TemplateVariableStart: TEMPSTART -> pushMode(templateVariable);
TemplateStringLiteral : TemplateStringCharacter* ;
TemplateEnd : '`' -> popMode;
mode templateVariable;
WS : [ \t\r\n\u000C]+ -> skip ;
All : [^}]+ ;
TemplateVariableEnd : TEMPEND -> popMode;
fragment DoubleStringCharacter : ~["\r\n] ;
fragment SingleStringCharacter : ~['\r\n] ;
fragment TemplateStringCharacter : ~[`] ;
fragment DecimalDigit : [0-9] ;
When I input this:
test: {
abc: `Hello World`
}
The parsing tree looks like this:
(definition
(sourceElements
(sourceElement
(statement
(mapping
(fieldName
(varname test)
) : {
(sourceElements
(sourceElement
(statement mapping)
)
(sourceElement
(statement
(mapping abc : `)
)
)
(sourceElement
(statement mapping)
)
(sourceElement
(statement
(mapping Hello)
)
)
(sourceElement
(statement
(mapping World `)
)
)
)
}
)
)
)
)
<EOF>
)
And I get the error: line 2:8 no viable alternative at input 'abc:`Hello'
I don't understand, why it is even possible to match something like an empty mapping or a mapping like "World `" because a mapping would need to have a ":" in the middle. And why is the rule templateString not matching the whole "Hello World" from back tick to back tick?
EDIT:
After noticing that the Lexer wasn't regenerated when I thought it was I got errors like: "cannot create implicit token for string literal in non-combined grammar: ']'". So I had to move all implicit declarations to the lexer grammar. So I changed the code to this:
parser grammar Test;
options {
tokenVocab = TestLexer;
}
definition: sourceElements? EOF ;
sourceElements: sourceElement+ ;
sourceElement: mapping ;
templateString: OpenBackTick TemplateStringLiteral* (TemplateVariableStart variable CloseBrace TemplateStringLiteral*)+ CloseBackTick ;
fieldName: varname | OpenBracket value CloseBracket ;
mapping: fieldName Colon (
OpenBrace sourceElements CloseBrace
| variable ( OpenBrace sourceElements CloseBrace )? IF?
| value
| array
)
;
funParameter: OpenParen value? (Comma value)* CloseParen ;
array: OpenBracket value? (Comma value)* CloseBracket;
variable: (varname | OpenBrace value CloseBrace | templateString) funParameter? (Dot variable)* ;
value: INT | BOOL | FLOAT | STRING | variable ;
varname: VAR ;
And lexer grammar:
lexer grammar TestLexer;
OpenBracket: '[';
CloseBracket: ']';
OpenParen: '(';
CloseParen: ')';
OpenBrace: '{' ;
CloseBrace: '}' ;
IF: '?' ;
AND: 'AND' ;
OR: 'OR';
LessThan: '<';
MoreThan: '>';
LessThanEquals: '<=';
GreaterThanEquals: '>=';
Equals: '=';
NotEquals: '!=';
IN: 'IN';
NOT: '!';
Colon: ':';
Dot: '.' ;
Comma: ',' ;
OpenBackTick : '`' -> pushMode(template) ;
WS : [ \t\r\n\u000C]+ -> skip ;
NEWLINE : [\r\n] ;
BOOL : ('true'|'false') ;
VAR : [$]?[a-zA-Z0-9_]+|[@] ;
INT : '-'?[0-9]+ ;
FLOAT : '-'?[0-9]+'.'[0-9]+ ;
STRING : '"' DoubleStringCharacter* '"' | '\'' SingleStringCharacter* '\'' ;
/// Comments
MultiLineComment : '/*' .*? '*/' -> channel(HIDDEN) ;
SingleLineComment : '//' ~[\r\n\u2028\u2029]* -> channel(HIDDEN) ;
mode template;
TemplateVariableStart: '${' -> pushMode(templateVariable);
CloseBackTick : '`' -> popMode;
TemplateStringLiteral: TemplateStringCharacter ;
mode templateVariable;
WHS : [ \t\r\n\u000C]+ -> skip ;
All : [^}]+ ;
TemplateVariableEnd : CloseBrace -> popMode;
fragment DoubleStringCharacter : ~["\r\n] ;
fragment SingleStringCharacter : ~['\r\n] ;
fragment TemplateStringCharacter : ~[`] ;
fragment DecimalDigit : [0-9] ;
Now I get the error: line 1:0 mismatched input 'test' expecting {, '?', '[', VAR} Which is strange, cause 'test' should be matched by VAR. Any ideas why this is happening?
test
should be aVAR
, but clearly the lexer does not, so it'd be important to know what the lexer thinkstest
is. With your old code it would have been aTemplateStringLiteral
(except that that would have matched more than justtest
), but with your current code I don't see anything else that matches. Try to run your lexer withantlr4 TestLexer.g4 && javac *.java && grun TestLexer tokens -tokens
or iterate over the token stream in JavaScript. – sepp2kthis.tokenStream = new antlr4.CommonTokenStream(this.lexer);
Butthis.tokenStream.tokens
only returned a list like: [ CommonToken { source: [ [UpsatLexer], [InputStream] ], type: 3, channel: 0, start: 0, stop: 3, tokenIndex: 0, line: 1, column: 0, _text: null }, ...]. I tried some other functions, but not all java-functions seem to be present in JS. – Martin Cuplexer.getAllTokens()
and then you can print each one properly by calling itstoString
with the lexer as an argument. See this gist – sepp2kVAR
. – sepp2ktoString
, but that doesn't work for tokens), you can uselexer.ruleNames[tok.type]
to actually get the token type as a string, but if the numbers match that will just printVAR
as well. – sepp2k