
I try to use the OData v4 grammar for Antlr4 provided by the OASIS group. See the following link: https://tools.oasis-open.org/version-control/browse/wsvn/odata/trunk/spec/grammar/ANTLR/#_trunk_spec_grammar_ANTLR_

Based on these files and Antlr v4 Maven plugin, I successfully generated classes to parse OData URLs.

I try to use the parser as described below:

String expression = "$top=2&$orderby=name";
ANTLRInputStream in = new ANTLRInputStream(expression);

ODataParserLexer lexer = new ODataParserLexer(in);
ODataParserParser parser = new ODataParserParser(
            new CommonTokenStream(lexer));

ODataErrorListener errorListener = new ODataErrorListener();

ODataParseListener listener = new ODataParseListener();

OdataUriContext ctx = parser.odataUri();

When calling the method odataUri, I have the following error reported in the error listener:

line 1:66 mismatched input '<EOF>' expecting Protocol

This is strange since the lexer is able to get tokens for the string to parse:


Perhaps the method odataUri isn't the one to call on the parser. But after having read the parser grammar file, it seems to be the case.

-- Edited on 12/01

I detected a problem with a rule name:

odataUri : Protocol ColSlaSla host ( COLON port )?
       ( ODataSignal_METADATA | ODataSignal_BATCH | odataRelativeUri )? EOF;

Protocol :  

The rule Protocol can't be found. If I updated its name to protocol, it's much better...

Following Bart's advice, I printed the names of rules associated with tokens. With a generated with Antlr4 maven plugin, I can't get the correct ones. With the classic generation, I have this:

    index = 93, ODataParserLexer.tokenNames[index] = HTTPORHTTPS
    index = 92, ODataParserLexer.tokenNames[index] = ColSlaSla
    index = 23, ODataParserLexer.tokenNames[index] = Ls32
    index = 60, ODataParserLexer.tokenNames[index] = '/'
    index = 4, ODataParserLexer.tokenNames[index] = 'odata'
    index = 60, ODataParserLexer.tokenNames[index] = '/'
    index = 251, ODataParserLexer.tokenNames[index] = ODATA_ID_CHAR8
    index = 28, ODataParserLexer.tokenNames[index] = SubDelims
    index = 25, ODataParserLexer.tokenNames[index] = DecOctet
    index = 28, ODataParserLexer.tokenNames[index] = SubDelims
    index = 60, ODataParserLexer.tokenNames[index] = '/'
    index = 251, ODataParserLexer.tokenNames[index] = ODATA_ID_CHAR8
    index = 66, ODataParserLexer.tokenNames[index] = '?'
    index = 128, ODataParserLexer.tokenNames[index] = ODataSignal_TOP
    index = 28, ODataParserLexer.tokenNames[index] = SubDelims
    index = 25, ODataParserLexer.tokenNames[index] = DecOctet
    index = 28, ODataParserLexer.tokenNames[index] = SubDelims
    index = 126, ODataParserLexer.tokenNames[index] = ODataSignal_ORDERBY
    index = 28, ODataParserLexer.tokenNames[index] = SubDelims
    index = 250, ODataParserLexer.tokenNames[index] = ODATA_ID_CHAR4

The tokens and associated rules seems correct.

I also enabled trace on the parser (parser.setTrace(true)) and execute again my code. I still have an error

enter   odataUri, LT(1)=<EOF>
enter   protocol, LT(1)=<EOF>
line 1:66 mismatched input '<EOF>' expecting HTTPORHTTPS
Error on query : 
=> line 1 : mismatched input '<EOF>' expecting HTTPORHTTPS
Context : [590]
exit    protocol, LT(1)=<EOF>
exit    odataUri, LT(1)=<EOF>

Thanks very much for your help. Thierry

That grammar is full of bad practices and even errors. I don't think you're working with a parser generated from the SVN repo you linked to. My guess is that "http" does not get tokenized as a Protocol token. To be sure, simply print all the token types of the tokens of the lexer.Bart Kiers
Thanks Bart for these hints! As far as I can see, the method Token#getType returns an integer. How can I get the corresponding rule (perhaps with method ODataParserLexer#tokenNames or ODataParserLexer#ruleNames)?Thierry Templier
Bart, I updated my question content...Thierry Templier

1 Answers


The grammar specified has a lot of ambiguous matches and needed to be rewritten to eliminate ambiguous matches possibly using semantic predicates or lexer modes. For expamle (i rewrote grammar start rules):

odataUri : serviceRoot? EOF  ;

serviceRoot : Protocol host segments relative? # OnSerivceRoot ;

segments    : Segments ;

host         : (addr | regName) port?;
addr         : ColSlaSla IPv4address ;

regName      : HOST ;

port          : PortDef ;

relative : (ODataSignal_METADATA | ODataSignal_BATCH) | odataRelativeUri;

odataRelativeUri : resourcePath ( question queryOptions )?;
question : QUESTION ;

PortDef     : COLON Digits ;
Segments    : SLASH ((Unreserved | PctEncoded | SubDelims | COLON | AT_SIGN)+ SLASH)* ;
HOST         : ColSlaSla HOST_DEF ;
HOST_DEF     : (Unreserved | PctEncoded | SubDelims)+ ;
Protocol :  HttpOrHttpsAnyCase;
Digits  : Digit+ ;
Digit  : [0-9] ;
Alpha  : [a-zA-Z];