Consider this very simplified example where an input of the following form should be matched
mykey -> This is the value
My real case is much more complex but this will do for showing what I try to achieve. mykey is an ID while on the right side of -> we have a set of Words. If I use
grammar Root;
parse
: ID '->' value
;
value
: Word+
;
ID
: ('a'..'z')+
;
Word
: ('a'..'z' | 'A'..'Z' | '0'..'9')+
;
WS
: ' ' -> skip
;
the example won't be parsed because the lexer will give an ID token for the first is which is not matched by Word+. In my real example, the value-language is vastly different and I'd like to parse it with a different grammar.
I have considered different solutions:
Switching the lexer
modebut AFAIK, switching the lexer to a different mode can only happen in a lexer rule. This is problematic for this case and my real case as well as there are no unique tokens that start and end thevaluepart. What I would need is something like "tokenizevaluewith different rules" which is, of course, stupid, because lexer and parser act independently and as soon as the parser starts, everything is already tokenizedUsing a different grammar for
value. When I see this right, the approach of importing a grammar won't work, since it always combines two grammars leading to the same situation of wrong tokenization.Creating a first crude parser, that accepts the whole language but doesn't create the correct tree for
value. I could then use a visitor and reparsevaluenodes with a different sub-parser possibly inserting a new, correct subtree for value. This feels a bit clumsy.
If you need a simple real-world application, then you could consider strings in Java. Some of them might be a regex which needs to be parsed with a completely different parser. It is similar to injected languages you can use inside IDEA.
Question: Is there an idiomatic way in ANTRL4 to parse a specific rule with a different grammar? Best case would be if I can specify this on the grammar level so that the resulting AST is a combination of the outer language that contains a sub-tree of the injected language.