I am new here as I am looking for a replacement for my long friends flex & bison (using lex/yacc since over 20 years). Mainly the reason for the change is the IMHO poor C++ support.
But while looking for a replacement context sensitivity would be the main selection criteria.
Am I correctly understanding that ANTLR (be it 3 or 4, as 4 has not yet C++ support) does not have any context sensitivity (meaning automatic context sensitivity, not manual predicates) ? And I am speaking of pure syntax context, not functional context like sorting out functions and constructors in C++ language which is not the duty of parsers.
Take the following as being th easy example:
Assume:
- Rule to line up stmt's for the complete input is not included
- Whitespace throwing away rule not included
Syntax may not be 100% correct but understandable for the purpose
stmt: DEFINE SYSTEM ID '{' istmt '}' | DEFINE COMPONENT ID '{' istmt '}' ;
DEFINE : 'define' ;
SYSTEM : 'system' ;
COMPONENT : 'component' ;
ID : [A-Za-z][A-Za-z]* ;
INT : [0-9]+
Input:
define system define { ...
Both defines will be returned as DEFINE by usual lexers as being the first longest match. The second "define" could be eaily resolved to ID by the parser passing valid tokens for the current context to the lexer, which will then select the first match among these (if any). I.e. when it comes to the ID, the lexer will identify "define" first as being the DEFINE, but this is not currently valid, so it takes the next match for the same "word" which would be ID and returns it as it is in the valid list. If no match is found in the valid list, the usual first match would be returned instead for error handling/recovery.
Of course there are more complex examples which cannot be solved at first level, but since any candidate will always macth the same lexical word looking forward buzilding and reducing a tree of possible pathes will in most cases eventually come up with a single branch identifying the correct token to be used.
Context sensitivity could be enables via an option as well as at lexer token level. I could also elaborate on a more complex example if required.