ANTLR4: context-sensitive spaces?

Question

In a grammar I would like to implement texts without string delimiting xxx. The idea is to define things like

a = xxx;

instead of

a ="xxx";

to simplify typewriting. Otherwise there should be variable definitions and other kind of stuff as well.

As a first approach I experimented with this grammar:

    grammar SpaceNoSpace;

    prog: stat+;

    stat:
     'somethingelse' ';'
    | typed description* content
    ;

    typed:
     'something' '-'  
         | 'anotherthing' '-'
    ;

    description: 
             'someSortOfDetails'  COLON  ID HASH  
         | 'otherSortOfDetails' COLON  ID HASH 
    ;

    content:    
        contenttext ';'
    ;

    contenttext: 
         (~';')*
    ;

    COLON: ':' ;
    HASH: '#';
    SEMI: ';';
    SPACE: ' ';
    ID: [a-zA-Z][a-zA-z0-9]+;
    WS  :   [ \t\n\r]+ -> channel(HIDDEN);
    ANY_CHAR : . ;

This works fine for input files like this:

    something-someSortOfDetails: aVariableName#
    this is the content of this;

    anotherthing-someSortOfDetails: aVariableName#
    here spaces are accepted as much        as you like;

    somethingelse;

But modifying the last line to

    somethingelse ;

leads to a syntax error:

    line 7:15 extraneous input ' ' expecting ';'

This probably reveals that the lexer rule

  WS  :   [ \t\n\r]+ -> channel(HIDDEN);

is not applied, (but the SPACE rule???).

Otherwise, if I delete the SPACE lexer-rule, the space in "somethingelse ;" is ignored (by lexer-rule WS), so that the parser rule stat : somethingelse as a consequence is detected correctly. But as a consequence of the deleted SPACE-rule the content text will be reduced to single in-between-spaces, so "this here" will be reduced to "this here".

This is not a big problem, but nevertheless it is an interesting question:

is it possible to implement context-sensitive WS or SPACE lexer rules:

within the content parser-rule any space should be preserved, in any other rule spaces should be ignored.

Is this possible to define such a context-sensitive lexer-rule behavior in ANTLR4?

this: stackoverflow.com/questions/29060496/… seems to be very close to an answer. Maybe this coulld be also be done within the grammar? Or even easier? — Mike75
That looks like an answer to me, which would make this question a duplicate. — rici

CoronA CoronA · Accepted Answer · 2016-02-02T05:03:50

Have you considered Lexer Modes? The section with mode(), pushMode(), popMode is probably interesting for you.

Yet I think that lexer modes are more a problem than a solution. Their purpose is to use (parser) context in the lexer. Consequently one should discard the paradigm of separating lexer and parser - and use a PEG-Parser instead.

ANTLR4: context-sensitive spaces?

2 Answers