antlr parse xml lost token match or duplicated match

Question

I'm new to ANTLR, and trying following grammar in ANTLRWorks1.4.3.

grammar TextGra;
element :   starttag (element)* endtag
;
starttag:   '<' TAGNAME '>';
endtag  :   '</' TAGNAME '>';
TAGNAME :   ('a'..'z')|('A'..'Z')|('0'..'9');
WS  :   (' '|'\r'|'\n')+ {skip();} ;

when try to parse simple xml fragment like this

<a><b><c></c></b></a>

lost last two endtag elements,how to handle this situation? or it's the wrong way? Tag name can't be constrained under my situation Compare to others xml parse code. or can grammar use $0 to reference previous matched token?(like in regexp). Decide tagname in the endtag by the previous matched starttag in this situation. Thanks every one for the response!

Bart Kiers Bart Kiers · Accepted Answer · 2012-11-01T07:25:30

I am guessing you're using ANTLRWorks' interpreter: don't, it's buggy. Always use the debugger included in ANTLRWorks (press CTRL+D to start the debugger).

I didn't change your grammar or input, and this is what the interpreter produced:

enter image description here

And the debugger produced this:

enter image description here

antlr parse xml lost token match or duplicated match

1 Answers