0
votes

I am an absolute beginner in yacc/lex and I stumble upon something that seems simple to me, but I am unable to understand. I have the two following rules : S : E; and E : STR; (and in the lexer, [a-z]+ is mapped to STR). My guess is that when I give the input "hithere" for example, the input is consumed and the parser should exit, no ?

The thing is, the parser is still waiting for input, so somehow S : E is not consumed (or so I guess). If I continue giving input, a syntax error is raised (which is expected).

My question is, in which case does the parser stop asking for input ? Maybe more precisely, why is the rule S : E; not satisfied for my specific example ?

I attach here my .l and my .y files :

test1.l :

%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
%}

%option noyywrap

%%
[a-z]+                  {yylval.str = yytext; return (STR);}
.                       { ; }
%%

test1.y:

%{
#include <stdio.h>
#include <stdlib.h>
extern int yylex();
%}

%union {
    char    *str;
}

%token <str> STR
%type <str> E

%%

S : E                   {printf("%s\n", $1);}
  ;

E : STR                 {$$ = $1;}
  ;

%%

int yyerror(char *msg) {
    printf("%s\n", msg);
    return (0);
}

int main() {
    yyparse();
    return (0);
}

The thing that seems really weird to me is that if I give the input "hithere", "hithere" is printed back on my terminal, so that is a strong indicator to me that S : E; actually has been recognized and printf() executed.

2

2 Answers

0
votes

Bison/yacc (and many, though not all, derivatives) actually construct an "augmented" grammar by adding a new start production which is effectively:

$start: S END

Where S is your start symbol (or the first non-terminal in the grammar if you don't specify), and END is a token representing the end of input. (It is a real token, whose value is 0. (f)lex scanners return 0 when they get an end-of-file, so to the parser it looks like its being given an END token.)

So the parser won't return until it sees an END token, which means that the scanner has seen an end of file. If your input is coming from a terminal, you need to send an EOF, typically by typing the EOF character: control-D on most Unix-like systems, or control-Z on Windows/DOS.

Unlike many parser generators, bison will perform a reduction without reading a lookahead symbol if the lookahead symbol is not necessary to decide that the reduction must be performed. In tbe case of your grammar, that is possible with the S: E production because there is no possible shift; either the reduction is correct (if the next token is END) or the input is not syntactically valid (if the next token is anything else). So the semantic value of the string is printed. For an even slightly more complicated grammar, that wouldn't happen (until the EOF is recognized).

0
votes

It's waiting for more input so it can reduce the production S : E ;. You need to type ctrl/d or ctrl/z depending on your system.