I'm reading a "Compiler Construction, Principles and Practice" book by Kenneth Louden and trying to understand error recovery in Yacc.
The author is giving an example using the following grammar:
%{
#include <stdio.h>
#include <ctype.h>
int yylex();
int yyerror();
%}
%%
command : exp { printf("%d\n", $1); }
; /* allows printing of the result */
exp : exp '+' term { $$ = $1 + $3; }
| exp '-' term { $$ = $1 - $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : NUMBER { $$ = $1; }
| '(' exp ')' { $$ = $2; }
;
%%
int main() {
return yyparse();
}
int yylex() {
int c;
/* eliminate blanks*/
while((c = getchar()) == ' ');
if (isdigit(c)) {
ungetc(c, stdin);
scanf("%d\n", &yylval);
return (NUMBER);
}
/* makes the parse stop */
if (c == '\n') return 0;
return (c);
}
int yyerror(char * s) {
fprintf(stderr, "%s\n", s);
return 0;
} /* allows for printing of an error message */
Which produces the following state table (referred to as table 5.11 later on)
Numbers in the reductions correspond to the following productions:
(1) command : exp.
(2) exp : exp + term.
(3) exp : exp - term.
(4) exp : term.
(5) term : term * factor.
(6) term : factor.
(7) factor : NUMBER.
(8) factor : ( exp ).
Then Dr. Louden gives the following example:
Consider what would hapen if an error production were added to the yacc definition
factor : NUMBER {$$ = $1;} | '(' exp ')' {$$=$2;} | error {$$ = 0;} ;
Consider first erroneous input 2++3 as in the previous example (We continue to use Table 5.11, although the additional error production results in a slightly different table.) As before the parser will reach the following point:
parsing stack input $0 exp 2 + 7 +3$
Now the error production for
factor
will provide that error is a legal lookahead in state 7 and error will be immediately shifted onto the stack and reduced tofactor
, causing the value 0 to be returned. Now the parser has reached the following point:parsing stack input $0 exp 2 + 7 factor 4 +3$
This is a normal situation, and the parser will continue to execute normally to the end. The effect is to interpret the input as 2+0+3 - the 0 between the two + symbols is there because that is where the error pseudotoken is inserted, and by the action for the error production, error is viewed as equivalent to a factor with value 0.
My question is very simple:
How did he know by looking at the grammar that in order to recover from this specific error (2++3) he needs to add an error pseudotoken to the factor
production? Is there a simple way to do it? Or the only way to do it is to work out multiple examples with the state table and recognize that this particular error will occur on this given state and therefore and if I add an error pseudotoken to a some specific production the error will be fixed.
Any help is appreciated.