0
votes

Still in the process of learning ANTLR... Recently I have been posting 2 questions regarding parsing some text and extracting information leaving aside "unwanted" words or character. Following a very interesing discussion with Bart Kiers on parsing a noisy datastream Part 1 and and parsing a noisy datastream Part 2, I'm ending up with one more problem...

Originally, my grammar looks like this

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY2            :'A'..'Z'+ {skip();};
ANY             : . {skip();};

parse 
  :  sentenceParts+ EOF 
  ;

sentenceParts  
  :  SUBJECT VERB INDIRECT_OBJECT  
  ;    

a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV. will produce the following

alt text

This is good... and it does what I want, i.e. extracting only the word CAT, SLEEPING and SOFA, leaving aside other words. Now, for another reason, I need to introduce a new token in my grammar, let's call it OTHER : 'PLANE'. It will be used later by another rule. I still want my primary rule to work : SUBJECT VERB INDIRECT_OBJECT. Let's say the token 'PLANE' appears in my sentence, like

it's 10PM and the Lazy CAT on the PLANE is currently SLEEPING heavily on the SOFA in front of the TV. It will produce the following error (no surprise here as the lexer has a clear definition of 'PLANE' as a token)

alt text



Is there a way to tell ANTLR that if I'm entering the rule sentenceParts I only care about the 3 tokens I have defined, namely SUBJECT, VERB or INDIRECT_OBJECT and that, even if it comes across a different token, not to take it into account ? I would like to be able to do that without putting OTHER? everywhere in this rule

2

2 Answers

1
votes

Well in fact, I might have found a way to do it... Although it's questionable at that point to introduce tokens if you don't want to parse them, this solution works :

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';
OTHER       : 'PLANE';
OTHER2      : 'BEAUTIFUL';
OTHER3      : 'HEAVILLY';
ANY2            :'A'..'Z'+ {skip();};
ANY             : . {skip();};

parse : sentenceParts+ EOF ;

next : ( options {greedy=false;}: .)*;

sentenceParts
: SUBJECT next VERB next INDIRECT_OBJECT
;



it's 10PM and the Lazy CAT on the BEAUTIFUL PLANE is currently SLEEPING HEAVILLY on the SOFA in front of the TV

alt text
0
votes

Is there a way to tell ANTLR that if I'm entering the rule sentenceParts I only care about the 3 tokens I have defined, namely SUBJECT, VERB or INDIRECT_OBJECT and that, even if it comes across a different token, not to take it into account ? I would like to be able to do that without putting OTHER? everywhere in this rule

No.

You either ignore the token, or you don't, in which case you'll have to make it optional in your parser rule(s).