I'm trying to sort this out, it's for a school thing and I need a little help. I'm trying to figure how to split a single line into multiple parts, and since in flex there is no grouping, I'm trying to get it done with start conditions, but it's not working as I expected.
%s LINE_NAME
%s LINE_GRADE
ws [ \t]
DNI [0-9]{7,8}-[A-Za-z]
DNIERROR [0-9A-Za-z]+-[0-9A-Za-z]+
NOTA [0-9].[0-9]{1,2}|10.[0]{1,2}
NOTAERROR [0-9]{2}.[0-9]{2,}
NOMBRE [A-Z][a-z]+
NOMBRECOMPLETO {NOMBRE}{ws}+{NOMBRE}","{ws}*{NOMBRE}
%%
<INITIAL>^{DNI} {
printf("%s;", yytext);
BEGIN LINE_NAME;}
<LINE_NAME>^{ws}*{NOMBRECOMPLETO} {
printf("%s;", yytext);
BEGIN LINE_GRADE;}
<LINE_GRADE>^{ws}*{NOTA} {
printf("%s\n", yytext);
;}
%%
int main(int argc, char* argv[]){
yylex();
}
My input file is something like
11223344-Z Alonso Barreiro, Ana 5.68
01234567-B Alonso Barros, Antonio 4.8
12345678-X Alonso Calvo, Andres 2.8
13345678-X Barreiro Calvo, Luis 3.68
It should be producing an output like
11223344-Z;Alonso Barreiro, Ana;5.68
01234567-B;Alonso Barros, Antonio;4.8
12345678-X;Alonso Calvo, Andres;2.8
13345678-X;Barreiro Calvo, Luis;3.68
But it's only recognizing the first state 11223344-Z;
and vomiting the rest as unparsed.
I get to understand that this code should work on an input that splits each part in separate lines, but I need to know if I can do what I'm doing on a single line so I can retrieve each part and separate them with a token like ";" or whatever.
Thanks in advance.
UPDATE: After following rici's answer I've edited my code to look like this
%s LINE_NAME
%s LINE_GRADE
%s LINE_OK
%s LINE_ERROR_DNI
%s LINE_ERROR_GRADE
ws [ ]
DNI [0-9]{7,8}-[A-Za-z]
DNIERROR [0-9A-Za-z]+-[0-9A-Za-z]+
NOTA [0-9].[0-9]{1,2}|10.[0]{1,2}
NOTAERROR [0-9]{2}.[0-9]{2,}
NOMBRE [A-ZÁÉÍÓÚ][a-záéíóúü]+
NOMBRECOMPLETO {NOMBRE}{ws}+{NOMBRE}","{ws}*{NOMBRE}
%option nodefault
%%
<INITIAL>^{DNI} {
printf("%s;", yytext);
BEGIN LINE_NAME;}
<INITIAL>^{DNIERROR} {
printf("%s; x;", yytext);
BEGIN LINE_ERROR_DNI;}
<LINE_NAME>^\t{NOMBRECOMPLETO} {
printf("%s;", yytext);
BEGIN LINE_GRADE;}
<LINE_GRADE>^\t{NOTA} {
printf("%s", yytext);
BEGIN LINE_OK;}
<LINE_GRADE>^\t{NOTAERROR} {
printf("%s; x", yytext);
BEGIN LINE_ERROR_GRADE;}
<LINE_ERROR_DNI>.*\n {
printf(" - DNI ERROR\n");
BEGIN(INITIAL);}
<LINE_ERROR_GRADE>.*\n {
printf(" - GRADE ERROR\n");
BEGIN(INITIAL);}
<LINE_OK>{ws}*\n {
printf(" - GOOD\n");
BEGIN(INITIAL);}
\n {
printf(" - UNEXPECTED END OF LINE\n");
BEGIN(INITIAL);}
<<EOF>> {
yyterminate();}
.* { printf(" ");}
%%
int main(int argc, char* argv[]){
yylex();
}
It's still not working and for every line in my file it says ' - UNEXPECTED END OF FILE' What am I matching wrong?
Of course, If I add a rule like this
<INITIAL>^{DNI}\t{NOMBRECOMPLETO}\t{NOTA} {
printf("%s;", yytext);
BEGIN LINE_OK;}
It recognizes it as a good line, but that's not what I'm trying to achieve since that wouldn't be different from just {DNI}\t{NOMBRECOMPLETO}\t{NOTA}
and then strtok-ing
^
means "at the beginning of a line". It doesn't mean "at the beginning of a line or the end of the last match," only "at the beginning of a line". Every match starts right after the end of the previous match. – rici