2
votes

I am starting a toy compiler, and I am making the simplest thing I can imagine, but it won't work.

Lex compiles, and Yacc compiles, and they link together, but the outputted program does not do what I expected.

Lex:

%{
#include <stdlib.h>
void yyerror(char *);
#include "y.tab.h"
%}

%%


a                       { 
                            yylval = atoi(yytext);
                            return AAA;
                        }
.                       yyerror("invalid character");

%%
int yywrap(void) {
 return 1;
}

Yacc:

%{
    void yyerror(char *);
    int yylex(void);
    int sym[26];
    #include <stdio.h>
%}

%token AAA

%%
daaaa:
AAA             {printf("%d\n", $1);}

%%

void yyerror(char *s) {
 fprintf(stderr, "%s\n", s);
}

int main(void) {
 yyparse();
 return 0;
}

The program I am trying to compile with this compiler is a file containing: a. that's it.

I don't know what's happened!

Clarification: What I expected the compiled compiler to do was to accept a file into it, process the file, and spit out a compiled version of that file.

2
It would help if you revealed your expectations and how they are unfulfilled by your program.rici
Can you explain the purpose of trying to convert the letter "a" to an integer with atoi?Brian Tompsett - 汤莱恩
@BrianTompsett-汤莱恩 yylval returns an int by default, and by subtracting var by "a" gives the input a surefire way to get 26 different yylval values without remembering previous inputs.Stegosaurus
@Stegosaurus atoi("a") returns zero, and so does atoi("b"). There is no 'subtracting var by "a"' here, and nothing surefire about this bug in your code.user207421
When I compile the code and run it (./gl <<< 'a' in Bash), it prints 0 and a couple of newlines. I'm not sure what you expected it to print. It will only read from standard input unless you take steps to organize it differently (by setting yyin to point to a different file stream).Jonathan Leffler

2 Answers

3
votes

Can you explain, maybe in an answer, exactly what you did, and how it worked, because as far as I can tell, and as far as I have tested the question, it shouldn't work as you say.

I took your code verbatim, creating files grammar.y and lexer.l. I then compiled the code. I'm working on Mac OS X 10.11.4, using GCC 6.1.0, Bison 2.3 (disguised as yacc) and Flex 2.5.35 (disguised as lex).

$ yacc -d grammar.y
$ lex lexer.l
$ gcc -o gl y.tab.c lex.yy.c
$ ./gl <<< 'a'
0

$

I subsequently made two changes. In grammar.y, I changed main() to:

int main(void) {
 #if YYDEBUG
 yydebug = 1;
 #endif
 yyparse();
 return 0;
}

and in lexer.l, I changed the default character rule to:

\n|.                    yyerror("invalid character");

(The . doesn't match newline, so the newline after the a in the input was echoed by default in the original output.)

With a similar compilation, the output becomes:

$ ./gl <<< 'a'
0
invalid character
$

With the compilation specifying -DYYDEBUG too:

$ gcc -DYYDEBUG -o gl lex.yy.c y.tab.c
$

the output includes useful debugging information:

$ ./gl <<< 'a'
Starting parse
Entering state 0
Reading a token: Next token is token AAA ()
Shifting token AAA ()
Entering state 1
Reducing stack by rule 1 (line 12):
   $1 = token AAA ()
0
-> $$ = nterm daaaa ()
Stack now 0
Entering state 2
Reading a token: invalid character
Now at end of input.
Stack now 0 2
Cleanup: popping nterm daaaa ()
$ ./gl <<< 'aa'
Starting parse
Entering state 0
Reading a token: Next token is token AAA ()
Shifting token AAA ()
Entering state 1
Reducing stack by rule 1 (line 12):
   $1 = token AAA ()
0
-> $$ = nterm daaaa ()
Stack now 0
Entering state 2
Reading a token: Next token is token AAA ()
syntax error
Error: popping nterm daaaa ()
Stack now 0
Cleanup: discarding lookahead token AAA ()
Stack now 0
$

The second a in the input correctly triggers a syntax error (it isn't allowed by the grammar). Other characters are permitted, generate a 'invalid character' message, and are otherwise ignored (so ./gl <<< 'abc' generates 3 invalid character messages, one for the b, one for the c, and one for the newline).

Changing the assignment to yylval in lexer.l to:

yylval = 'a'; // atoi(yytext);

changes the number printed from 0 to 97, which is the character code for 'a' in ASCII, ISO 8859-1, Unicode, etc.

I've been using a here string as the source of data. It would be equally feasible to have used a file as the input:

$ echo a > program
$ cat program
a
$ ./gl < a
Starting parse
Entering state 0
Reading a token: Next token is token AAA ()
Shifting token AAA ()
Entering state 1
Reducing stack by rule 1 (line 12):
   $1 = token AAA ()
97
-> $$ = nterm daaaa ()
Stack now 0
Entering state 2
Reading a token: invalid character
Now at end of input.
Stack now 0 2
Cleanup: popping nterm daaaa ()
$

If you want to read files specified by name on the command line, you have to write more code in main() to process those files.

2
votes

The program does not accept a file because it was not told to.

In the Yacc program, extern FILE *yyin; must be added in the definitions section.

I believe that's it.