1
votes

I am learning Flex/Bison. The manual of Bison says:

A literal string token is written like a C string constant; for example, "<=" is a literal string token. A literal string token doesn’t need to be declared unless you need to specify its semantic value data type

But I do not figure how to use it and I do not find an example.

I have the following code for testing:

example.l

%option noyywrap nodefault

%{
#include "example.tab.h"
%}

%%

[ \t\n] {;}
[0-9] { return NUMBER; }
. { return yytext[0]; }

%%

example.y

%{
#include <stdio.h>
#define YYSTYPE char *
%}

%token NUMBER

%%

start: %empty | start tokens

tokens:
       NUMBER "<=" NUMBER { printf("<="); }
     | NUMBER "=>" NUMBER { printf("=>\n"); }
     | NUMBER '>' NUMBER { printf(">\n"); }
     | NUMBER '<' NUMBER { printf("<\n"); }

%%

main(int argc, char **argv) {
   yyparse();
}

yyerror(char *s) {
   fprintf(stderr, "error: %s\n", s);
}

Makefile

#!/usr/bin/make
# by RAM

all: example

example.tab.c example.tab.h: example.y
    bison -d $<

lex.yy.c: example.l example.tab.h
    flex $<

example: lex.yy.c example.tab.c
    cc -o $@ example.tab.c lex.yy.c -lfl

clean:
    rm -fr example.tab.c example.tab.h lex.yy.c example

And when I run it:

$ ./example 
3<4
<
6>9
>
6=>9
error: syntax error

Any idea?


UPDATE: I want to clarify that I know alternative ways to solve it, but I want to use literal string tokens.

One Alternative: using multiple "literal character tokens":

tokens:
       NUMBER '<' '=' NUMBER { printf("<="); }
     | NUMBER '=' '>' NUMBER { printf("=>\n"); }
     | NUMBER '>' NUMBER { printf(">\n"); }
     | NUMBER '<' NUMBER { printf("<\n"); }

When I run it:

$ ./example 
3<=9
<=

Other alternative:

In example.l:

"<="  { return LE; }
"=>"  { return GE; }

In example.y:

...
%token NUMBER
%token LE "<="
%token GE "=>"

%%

start: %empty | start tokens

tokens:
       NUMBER "<=" NUMBER { printf("<="); }
     | NUMBER "=>" NUMBER { printf("=>\n"); }
     | NUMBER '>' NUMBER { printf(">\n"); }
     | NUMBER '<' NUMBER { printf("<\n"); }
...

When I run it:

$ ./example 
3<=4
<=

But the manual says:

A literal string token doesn’t need to be declared unless you need to specify its semantic value data type

2
Are you sure that . matches multiple characters instead of just a single character?mroman
Also yytext[0]?mroman
. match only one character, it is okay, but is not the problem I think.RAM

2 Answers

1
votes

The quoted manual paragraph is correct, but you need to read the next paragraph, too:

You can associate the literal string token with a symbolic name as an alias, using the %token declaration (see Token Declarations). If you don’t do that, the lexical analyzer has to retrieve the token number for the literal string token from the yytname table.

So you don't need to declare the literal string token, but you still need to arrange for the lexer to send the correct token number, and if you don't declare an associated token name, the only way to find the correct value is to search for the code in the yytname table.

In short, your last example where you define LE and GE as aliases, is by far the most common approach. Separating the tokens into individual characters is not a good idea; it might create shift-reduce conflicts and it will definitely allow invalid inputs, such as putting whitespace between the characters.

If you want to try the yytname solution, there is sample code in the bison manual. But please be aware that this code discovers bison's internal token number, which is not the number which needs to be returned from the scanner. There is no way to get the external token number which is easy, portable and documented; the easy and undocumented way is to look the token number up in yytoknum but since that array is not documented and conditional on a preprocessor macro, there is no guarantee that it will work. Note also that these tables are declared static so the function(s) which rely on them must be included in the bison input file. (Of course, these functions can have external linkage so that they can be called from the lexer. But you can't just use yytname directly in the lexer.)

0
votes

I havent used flex/bison for a while but two things:

. as far as I remember only matches a single character. yytext is a pointer to a null terminated string char* so yytext[0] is a char which means that you can't match strings this way. You probably need to change it to return yytext. Otherwise . will probably create a token PER character and you'd probably have to write NUMBER '<' '=' NUMBER.