1
votes

I'm using Lex and Bison parser generators. I have my .lex file, which defines the syntax and .ypp file which defines the semantics. in my .ypp I have this line:

Statement : Type ID ASSIGN Exp {check_types_match($1.type, $4.type)} SC
  • Type can be int or boolean.
  • ID is an identifier.
  • ASSIGN is the = symbol.
  • Exp can be many things, among them is Exp : true, which saves the type of the expression as boolean.
  • SC is a semlicon ";".
  • check_types_match checks for type mismatch and prints the line (yylineno) of the error if there's any.

in this simple input file:

int x = true 
;

I get that the error is in line 2 and not in line 1. How can I make it print the error in line 1 instead?

1

1 Answers

1
votes

The statement is not recognized as such until you reach the semicolon, which is on line 2. So at the moment that check_types_match is called, yylineno must point to line 2.

If you want to produce an error message with a different line number, you certainly need to decide which line should be printed. Here you have at least two possibilities, since the error is between the token int and the token true. In this case, both of those are on line 1, but what if the program text had been:

int x =
  true;

It seems reasonable that one of those tokens should be flagged as causing the error, so the problem reduces to figuring out what line the token appeared on. Since that token is ancient history by the time the reduction happens, the only way to do that is to remember the location of every token which might still be needed, which is normally every token still on the parser stack.

Fortunately, bison has a simple way of doing that. If needed, it will maintain a location stack parallel to the parser stack, and then you can access the location object for token 1 by simply referencing @1. Even better, simply using a reference to a location object somewhere in your bison file is sufficient to convince bison to maintain this information. So you could change your action to:

Statement : Type ID ASSIGN Exp {check_types_match($1.type, $4.type, @1)} SC

(Or @4, if you think that it is more appropriate to ascribe the error to the Exp.)

Of course, it is never quite that simple. It is also necessary to arrange for bison to know the location of every incoming token, and also to understand how to create a location for a newly created non-terminal (such as Exp in the above example.)

Since a location object may refer to the location of a sequence of tokens (as in the non-terminal case), which may be spread over several lines, it is normal for the location object to indicate both a starting and ending point. Furthermore, it is common to want both a line number and a column offset to produce accurate error messages. Consequently, the default location object has the following type:

typedef struct YYLTYPE {
  int first_line;
  int first_column;
  int last_line;
  int last_column;
} YYLTYPE;

And by default, the location object for a non-terminal is computed as though you had written something like

@$.first_line = @1.first_line;
@$.first_column = @1.first_column;
@$.last_line = @N.last_line;
@$.last_column = @N.last_column;

where N is the index of the last grammar symbol in the right-hand side. (Since bison doesn't have any notation for "the number of grammar symbols" and doesn't allow variables in $N constructs, you can't actually write that. But that's the idea.)

Since all of that is pretty well what you want, there is no problem from bison's side. But you also need to get the information from flex in the first place.

If you use the simple interface between flex and bison, which relies on global variables, then the name of the location object corresponding with the current token is yylloc (similar to yylval). flex can automatically create yylineno but it does not automatically store it in yylloc, nor does it have any built-in mechanism to track column numbers nor to handle the case where the token returned is spread over more than one line (which might be possible for string constants, for example).

Getting all that infrastructure correct is a bit outside the scope of this question, since you only ask for line number information. If you only need to track line-numbers and you don't have multi-line tokens, it would be sufficient to add the following to every flex rule:

yylloc.first_line = yylloc.last_line = yylineno;

If you do have multi-line tokens, you could use the following instead:

yylloc.first_line = yylloc.last_line;
yylloc.last_line = yylineno;

That would have to be added to every token action, even the ones which don't do anything (comments and whitespace). Fortunately, flex has a macro which is added at the beginning of every action, so you don't have to complicate your entire flex file. It's sufficient to add something like:

#define YY_USER_ACTION do {             \
  yylloc.first_line = yylloc.last_line; \
  yylloc.last_line = yylineno;          \
} while(0)

(If you end up tracking column numbers, too, you will need to modify that.)

You also need to ensure that yylloc.last_line is initialized to 1; otherwise, your first token will start at line 0.

For more information, please read the manual:

If you are using reentrant/pure scanners and parsers, you'll need to refer to the documentation for how the location object is passed without globals. Note that the %bison-locations declaration is not always what you want (and is definitely not what you want if you are not using a reentrant/pure scanner and parser.)