In an attempt to recreate Python's blocks defined by the indentation, I've stumbled on this right at the start.
When I try my lexer/scanner separately it returns me the expected results, rightly using the start conditions I've made. But when coupling it with the Bison parser the right state is not kept and I receive tokens from an unexpected state.
The expected behavior for me would be returning "INDENT" tokens for tabs/spaces at the beginning of a line, and after finding another symbol(not tab/space) returning "OTHER" tokens for every symbol, until starting a new line.
First case, lexer returning expected results
scanner.l
%{
#include <iostream>
%}
%option noyywrap
%x INDENT
%%
BEGIN(INDENT);
<INDENT>[ \t] { std::cout << "INDENT "; }
<INDENT>.|\n { yyless(0); BEGIN(INITIAL); }
\n { std::cout << std::endl; BEGIN(INDENT); }
. { std::cout << "OTHER "; }
%%
int main(){
yylex();
return 0;
}
Entering " test " (two spaces before and after "test") returns "INDENT INDENT OTHER OTHER OTHER OTHER OTHER OTHER".
Second case, parser returning unexpected results
scanner.l
%{
#include <iostream>
#include "parser.h"
%}
%option noyywrap
%x INDENT
%%
BEGIN(INDENT);
<INDENT>[ \t] { return T_INDENT; }
<INDENT>.|\n { yyless(0); BEGIN(INITIAL); }
\n { BEGIN(INDENT); return T_NEWLINE; }
. { return T_OTHER; }
%%
parser.y
%{
#include <iostream>
extern int yylex();
void yyerror(const char *s);
%}
%define parse.error verbose
%token T_INDENT T_OTHER T_NEWLINE
%%
program : program symbol
| %empty
;
symbol : T_INDENT { std::cout << "INDENT "; }
| T_NEWLINE { std::cout << std::endl; }
| T_OTHER { std::cout << "OTHER "; }
;
%%
void yyerror(const char *s){
std::cout << s;
}
int main(){
yyparse();
return 0;
}
Entering " test " (same as before) returns "INDENT INDENT OTHER OTHER OTHER OTHER INDENT INDENT". While the expected result was the same as above.
The Bison parser seems to be receiving the wrong tokens, as if it was not respecting the start conditions. I've read something about the parser messing up the start conditions because of the lookahead behavior, but I'm not sure the problem is within this nor how I would counter it.
^[ \t] BEGIN INDENT;
, not start in that state. – user207421