Your W2 and NODE rules are the same, so nodes you intend to be NODE are matching W2.
grun with -tokens option: (notice, no NODE tokens)
[@0,0:0='(',<'('>,1:0]
[@1,1:1='1',<W2>,1:1]
[@2,3:3='C',<W1>,1:3]
[@3,5:6='10',<W2>,1:5]
[@4,8:8='2',<W2>,1:8]
[@5,9:9=')',<')'>,1:9]
[@6,10:10='.',<'.'>,1:10]
[@7,11:11='(',<'('>,1:11]
[@8,12:12='(',<'('>,1:12]
[@9,13:13='2',<W2>,1:13]
[@10,15:15='P',<W1>,1:15]
[@11,17:17='2',<W2>,1:17]
[@12,19:19='3',<W2>,1:19]
[@13,20:20=')',<')'>,1:20]
[@14,21:21='.',<'.'>,1:21]
[@15,22:22='(',<'('>,1:22]
[@16,23:23='3',<W2>,1:23]
[@17,25:25='S',<W1>,1:25]
[@18,27:27='3',<W2>,1:27]
[@19,29:29='2',<W2>,1:29]
[@20,30:30=')',<')'>,1:30]
[@21,31:31=')',<')'>,1:31]
[@22,32:32='*',<'*'>,1:32]
[@23,33:33='.',<'.'>,1:33]
[@24,34:34='(',<'('>,1:34]
[@25,35:35='2',<W2>,1:35]
[@26,37:37='T',<W1>,1:37]
[@27,39:39='2',<W2>,1:39]
[@28,41:41='4',<W2>,1:41]
[@29,42:42=')',<')'>,1:42]
[@30,43:42='<EOF>',<EOF>,1:43]
If I replace the NODEs in your parse rule with W2s (sorry, I have no idea what this is supposed to represent), I get:
It appears that your misconception is that the recursive descent parsing starts with the parser rule and when it encounters a Lexer rule, attempts to match it.
This is not how ANTLR works. With ANTLR, your input is first run through the Lexer (aka Tokenizer) to produce a stream of tokens. This step knows absolutely nothing about your parser rules. (That's why it's so often useful to use grun
to dump the stream of tokens, this gives you a picture of what your parser rules are acting upon (and you can see, in your example that there are no NODE tokens, because they all matched W2).
Also, a suggestion... It would appear that commas are an essential part of correct input (unless (1C102).((2P23).(3S32))*.(2T24)
is considered valid input. On that assumption, I removed the -> skip
and added them to your parser rule (that's why you see them in the parse tree). The resulting grammar I used was:
grammar Semi;
prog: expr+;
expr: expr '*' | expr ('.' | '+') expr | tuple | LP expr RP;
tuple: LP W2 COMMA W1 COMMA W2 COMMA W2 RP;
LP: '(';
RP: ')';
W1: [PCST0];
W2: [0-9]+;
NODE: [0-9]+;
WS: [ \t\r\n]+ -> skip; // toss out whitespace
COMMA: ',';
To take a bit more liberty with your grammar, I'd suggest that your Lexer rules should be raw type focused. And, that you can use labels to make the various elements or your tuple more easily accessible in your code. Here's an example:
grammar Semi;
prog: expr+;
expr: expr '*' | expr ('.' | '+') expr | tuple | LP expr RP;
tuple: LP nodef=INT COMMA w1=PCST0 COMMA w2=INT COMMA nodet=INT RP;
LP: '(';
RP: ')';
PCST0: [PCST0];
INT: [0-9]+;
COMMA: ',';
WS: [ \t\r\n]+ -> skip; // toss out whitespace
With this change, your tuple Context class will have accessors for w1, w1, and node. node will be an array of NBR tokens as I've defined it here.
'(' expr ')'
production, but with your lexical rules. If you look at the tokens generated for your input, you'll likely find that they aren't what you expect. – sepp2k0
by itself will produce aW1
token, but presumably you want to allow 0s as W2 or nodes as well. The easiest solution for that is probably to have dedicated lexer rule just for'0'
, which can be used anywhere that aW1
,W2
or node can be used. – sepp2k(0, 0, 0, 0)
, which should be valid according to my understanding of your format, would be lexed asLP W1 W1 W1 W1 RP
, which doesn't match yourtuple
rule. Now you could change tuple to allowW1
everywhere, but then(P, P, P, P)
would also be allowed, which I believe you don't want. So I'm suggesting to change it so that0 12 P
gets lexed asZERO NUMBER W1_LETTER
. Then you could definew2
andnode
(as parser rules, not lexer rules) asZERO | NUMBER
and you could definew1
asZERO | W1_LETTER
. – sepp2k