2
votes

I'm matching user-defined HTML-template tags that look like this (simplified):

{% label %} ... {% endlabel %}

The "label" is an alphanumeric value that a user can define himself, e.g.:

{% mytag %}<div>...</div>{% endmytag %}

Is there a way to tell the parser that the LABEL start tag text has to match with the ENDLABEL end tag text? In other words, I want this to be invalid:

{% mytag %}<div>...</div>{% endnotmatchingtag %}

My lexer looks like this:

LABEL :                 ALPHA (ALPHA|DIGIT|UNDERSCORE)* ;
fragment UNDERSCORE:    '_' ;
fragment ALPHA:         [a-zA-Z] ;
fragment DIGIT:         [0-9] ;

END :                   'end'
ENDLABEL :              END LABEL
TAGSTART :              '{%'
TAGEND :                '%}'

WS :                    [ \t\r\n]+ -> skip ;

And the parser rule looks similar to this:

customtag: TAGSTART LABEL TAGEND block TAGSTART ENDLABEL TAGEND;    

(and a block matches text or other tags recursively)

Right now I'm checking for a match in the listener, but I was hoping I could do it in the parser. Is there any way to ensure that ENDLABEL is equal to 'end' + LABEL at the parser level in Antlr4?

... and is it possible to do it if I weren't prepending 'end' in the lexer?

2

2 Answers

1
votes

Create two additional lexer rules

EndTag : TAGSTART ENDLABEL TAGEND;
StartTag : TAGSTART LABEL TAGEND;

Make sure that the token ENDLABEL is not subsumed by LABEL (yet LABEL matches the same text, but is preferred because it is first in the grammar!)

Use the new tokens in your grammar, similar as you did:

taggedElement : StartTag othernodes EndTag;

and insert a semantic predicate

taggedElement : StartTag othernodes EndTag {matches($StartTag.text,$EndTag.text)};

where matches is true if the tags are matching.

0
votes

A parser deals with syntax at the grammar level. What you are requesting can not be expressed in a Context Free Grammar (CFG), which suggests to me that you are not capable of solving this at the parser level.

In your schenario, I would create a visitor which enforces your semantics. ANTLR 4 can generate abstract and base visitors for you, which you can then extend.