0
votes

I am trying to write antlr lexer grammar rule to validate email address I have got most of it working however I am unable to validate that the character '.' does not appear consecutively. For example my code passes the example, [email protected] which it should not. I have tried several regex but nothing seems to work well. Can somebody please help me out here, I have just started learning this so I don't know much. Here is what I have so far.

fragment LOCALCHARS_first_last : [a-zA-Z0-9-_~!$&'()*+,;=:]; //local part must not include character '.' 
fragment LOCALCHARS : [a-zA-Z0-9-_~!$&'.()*+,;=:]+;
fragment LOCALPART:  LOCALCHARS_first_last LOCALCHARS LOCALCHARS_first_last; //'.' cannot be first or last character
fragment DOMAINPART: [a-zA-Z0-9-.]+;
fragment EMAIL: LOCALPART '@' DOMAINPART;

CHECKEMAIL: (EMAIL) {
   System.out.println("valid email: "+getText());
};
1

1 Answers

0
votes

Explanation

When you define a fragment fragment EXAMPLE : [a-z.]+ you can match with it strings like: abc, a.b.c but also a repetitive ones like: aaa and ....

Solution

You need to look on the dot . as a separator, that separates email's local and/or domain sub-parts.

fragment LOCAL_SUBPART : [a-zA-Z0-9-_~!$&\'()*+,;=:]+;
fragment DOMAIN_SUBPART : [a-zA-Z0-9-]+;

EMAIL: LOCAL_SUBPART ('.' LOCAL_SUBPART)* '@' DOMAIN_SUBPART ('.' DOMAIN_SUBPART)*;