1
votes

It's quite strange, but DefaultErrorStrategy does not do anything for catching unrecognized characters from a stream. I tried a custom error strategy, a custom error listener and BailErrorStrategy - no luck here.

My grammar

grammar Polynomial;

parse           : canonical EOF
                ;

canonical       : polynomial+                                     #canonicalPolynom
                | polynomial+ EQUAL polynomial+                   #equality
                ;

polynomial      : SIGN? '(' (polynomial)* ')'                     #parens
                | monomial                                        #monom
                ;

monomial        : SIGN? coefficient? VAR ('^' INT)?               #addend
                | SIGN? coefficient                               #number
                ;

coefficient             : INT | DEC;

INT                     : ('0'..'9')+;
DEC                     : INT '.' INT;
VAR                     : [a-z]+;
SIGN                    : '+' | '-';
EQUAL                   : '=';
WHITESPACE              : (' '|'\t')+ -> skip;

and I'm giving an input 23*44=12 or @1234

I'm expecting that my parser throws mismatched token or any kind of exception for a character * or @ that is not defined in my grammar.

Instead, my parser just skips * or @ and traverse a tree like there are do not exist.

My handler function where I'm calling lexer, parser and that's kind of stuff.

private static (IParseTree tree, string parseErrorMessage) TryParseExpression(string expression)
        {
            ICharStream stream = CharStreams.fromstring(expression);
            ITokenSource lexer = new PolynomialLexer(stream);

            ITokenStream tokens = new CommonTokenStream(lexer);
            PolynomialParser parser = new PolynomialParser(tokens);

            //parser.ErrorHandler = new PolynomialErrorStrategy(); -> I tried custom error strategy
            //parser.RemoveErrorListeners();
            //parser.AddErrorListener(new PolynomialErrorListener()); -> I tried custom error listener
            parser.BuildParseTree = true;

            try
            {
                var tree = parser.canonical();
                return (tree, string.Empty);
            }
            catch (RecognitionException re)
            {
                return (null, re.Message);
            }
            catch (ParseCanceledException pce)
            {
                return (null, pce.Message);
            }
        }            

I tried to add a custom error listener.

public class PolynomialErrorListener : BaseErrorListener
    {
        private const string Eof = "EOF";

        public override void SyntaxError(TextWriter output, IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg,
            RecognitionException e)
        {
            if (msg.Contains(Eof))
            {
                throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. Missing an expression after '=' sign");
            }

            if (e is NoViableAltException || e is InputMismatchException)
            {
                throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. Probably, not closed operator");
            }

            throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. {msg}");
        }

        private static string GetSyntaxErrorHeader(int errorPosition)
        {
            return $"Expression is invalid. Input is not valid at {--errorPosition} position";
        }
    }

After that, I tried to implement a custom error strategy.

public class PolynomialErrorStrategy : DefaultErrorStrategy
    {
        public override void ReportError(Parser recognizer, RecognitionException e)
        {
            throw e;
        }

        public override void Recover(Parser recognizer, RecognitionException e)
        {
            for (ParserRuleContext context = recognizer.Context; context != null; context = (ParserRuleContext) context.Parent) {
                context.exception = e;
            }

            throw new ParseCanceledException(e);
        }

        public override IToken RecoverInline(Parser recognizer)
        {
            InputMismatchException e = new InputMismatchException(recognizer);
            for (ParserRuleContext context = recognizer.Context; context != null; context = (ParserRuleContext) context.Parent) {
                context.exception = e;
            }

            throw new ParseCanceledException(e);
        }

        protected override void ReportInputMismatch(Parser recognizer, InputMismatchException e)
        {
            string msg = "mismatched input " + GetTokenErrorDisplay(e.OffendingToken);
            // msg += " expecting one of " + e.GetExpectedTokens().ToString(recognizer.());
            RecognitionException ex = new RecognitionException(msg, recognizer, recognizer.InputStream, recognizer.Context);
            throw ex;
        }

        protected override void ReportMissingToken(Parser recognizer)
        {
            BeginErrorCondition(recognizer);
            IToken token = recognizer.CurrentToken;
            IntervalSet expecting = GetExpectedTokens(recognizer);
            string msg = "missing " + expecting.ToString() + " at " + GetTokenErrorDisplay(token);
            throw new RecognitionException(msg, recognizer, recognizer.InputStream, recognizer.Context);
        }
    }

Is there any flag that I forgot to specify in a parser or I have incorrect grammar?

Funny thing that I'm using ANTLR plugin in my IDE and when I'm testing my grammar in here this plugin correctly responds with line 1:2 token recognition error at: '*'

Full source code: https://github.com/EvgeniyZ/PolynomialCanonicForm

I'm using ANTLR 4.8-complete.jar

Edit

I tried to add to a grammar rule

parse           : canonical EOF
                ;

Still no luck here

1
Remember there are two recognizers here: a lexer and a parser. Each can have an error listener added. Your code adds a listener for the parser, but you don't add one for the lexer (see CanonicalFormer.cs). So, it'll run the default for the lexer. Both default error handlers are quite primitive. Also, you should use Antlr4BuildTasks for builds that use Antlr.Runtime.Standard, i.e., in Polynomial.WebApi.csproj. Just search for Antlr4BuildTasks and read the instructions. --Ken - kaby76

1 Answers

1
votes

What happens if you do this:

parse
 : canonical EOF
 ;

and also invoke this rule:

var tree = parser.parse();

By adding the EOF token (end of input), you are forcing the parser to consume all tokens, which should result in an error when the parser cannot handle them properly.

Funny thing that I'm using ANTLR plugin in my IDE and when I'm testing my grammar in here this plugin correctly responds with line 1:2 token recognition error at: '*'

That is what the lexer emits on the std.err stream. The lexer just reports this warning and goes its merry way. So the lexer just ignores these chars and therefor never end up in the parser. If you add the following line at the end of your lexer:

// Fallback rule: matches any single character if not matched by another lexer rule
UNKNOWN : . ;

then the * and @ chars will be sent to the parser as UNKNOWN tokens and should then cause recognition errors.