0
votes

Hi I am trying to parse a sip Uri using antlr4. For the time being I have strip off the complexity to keep the question simple

Antlr4 Grammar

sipUri          : SIP_SCHEME coreUri EOF ;
coreUri         : USER_INFO? hostPort ;
hostPort        : 'abc.com' ;

SIP_SCHEME           : 'sip:';
USER_INFO            : USER PASSWORD? '@' ;
fragment USER        : ALPHA_NUM+ ;
fragment PASSWORD    : ':' ALPHA_NUM+ ;
fragment ALPHA_NUM   :  ALPHA | DIGIT ;
fragment ALPHA       : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT       : ('0'..'9') ;

String Input 1 : sip:user:[email protected]

output of Input 1

String Input2 : sip:[email protected]

output of Input 2

In second input, "sip" was parsed as USER and "user" was parsed as PASSWORD, since "sip" qualifies to be a USER/PASSWORD as per grammar.

Hope I described my problem. Don't know how to proceed now in this situation?

1

1 Answers

0
votes

I don't know why the result is what it is, but it probably has to do with how the lexer works.

However, by moving stuff to the parser you can avoid this particular problem:

sipUri          : SIP_SCHEME coreUri EOF ;
coreUri         : userInfo? hostPort ;
hostPort        : 'abc.com' ;
userInfo        : USER PASSWORD? '@';

SIP_SCHEME           : 'sip:';
USER                 : ALPHA_NUM+ ;
PASSWORD             : ':' ALPHA_NUM+ ;

That said, I think it is better not to try to assign semantic meaning (user, password) to the lexer tokens, but to move that logic to the application. The problem is though, as you are probably aware, that the allowed character sets differ between user, password, hostname and URI parameters, and I don't know the best way to handle that.