0
votes

I'm trying to sort this out, it's for a school thing and I need a little help. I'm trying to figure how to split a single line into multiple parts, and since in flex there is no grouping, I'm trying to get it done with start conditions, but it's not working as I expected.

%s  LINE_NAME
%s  LINE_GRADE

ws [ \t]
DNI [0-9]{7,8}-[A-Za-z]
DNIERROR [0-9A-Za-z]+-[0-9A-Za-z]+
NOTA [0-9].[0-9]{1,2}|10.[0]{1,2}
NOTAERROR [0-9]{2}.[0-9]{2,}
NOMBRE [A-Z][a-z]+
NOMBRECOMPLETO {NOMBRE}{ws}+{NOMBRE}","{ws}*{NOMBRE}

%%
<INITIAL>^{DNI}    {
    printf("%s;", yytext);
    BEGIN LINE_NAME;}

<LINE_NAME>^{ws}*{NOMBRECOMPLETO}   {
    printf("%s;", yytext);
    BEGIN LINE_GRADE;}

<LINE_GRADE>^{ws}*{NOTA}  {
    printf("%s\n", yytext);
    ;}

%%

int main(int argc, char* argv[]){
    yylex();
}

My input file is something like

11223344-Z Alonso Barreiro, Ana 5.68
01234567-B Alonso Barros, Antonio 4.8
12345678-X Alonso Calvo, Andres 2.8
13345678-X Barreiro Calvo, Luis 3.68

It should be producing an output like

11223344-Z;Alonso Barreiro, Ana;5.68
01234567-B;Alonso Barros, Antonio;4.8
12345678-X;Alonso Calvo, Andres;2.8
13345678-X;Barreiro Calvo, Luis;3.68

But it's only recognizing the first state 11223344-Z; and vomiting the rest as unparsed.

I get to understand that this code should work on an input that splits each part in separate lines, but I need to know if I can do what I'm doing on a single line so I can retrieve each part and separate them with a token like ";" or whatever.

Thanks in advance.

UPDATE: After following rici's answer I've edited my code to look like this

%s  LINE_NAME
%s  LINE_GRADE
%s  LINE_OK
%s  LINE_ERROR_DNI
%s  LINE_ERROR_GRADE

ws [ ]
DNI [0-9]{7,8}-[A-Za-z]
DNIERROR [0-9A-Za-z]+-[0-9A-Za-z]+
NOTA [0-9].[0-9]{1,2}|10.[0]{1,2}
NOTAERROR [0-9]{2}.[0-9]{2,}
NOMBRE [A-ZÁÉÍÓÚ][a-záéíóúü]+
NOMBRECOMPLETO {NOMBRE}{ws}+{NOMBRE}","{ws}*{NOMBRE}

%option nodefault

%%
<INITIAL>^{DNI}    {
    printf("%s;", yytext);
    BEGIN LINE_NAME;}

<INITIAL>^{DNIERROR}    {
    printf("%s; x;", yytext);
    BEGIN LINE_ERROR_DNI;}

<LINE_NAME>^\t{NOMBRECOMPLETO}   {
    printf("%s;", yytext);
    BEGIN LINE_GRADE;}

<LINE_GRADE>^\t{NOTA}  {
    printf("%s", yytext);
    BEGIN LINE_OK;}

<LINE_GRADE>^\t{NOTAERROR}  {
    printf("%s; x", yytext);
    BEGIN LINE_ERROR_GRADE;}

<LINE_ERROR_DNI>.*\n {
    printf(" - DNI ERROR\n");
    BEGIN(INITIAL);}

<LINE_ERROR_GRADE>.*\n {
    printf(" - GRADE ERROR\n");
    BEGIN(INITIAL);}

<LINE_OK>{ws}*\n {
    printf(" - GOOD\n");
    BEGIN(INITIAL);}

\n { 
    printf(" - UNEXPECTED END OF LINE\n");
    BEGIN(INITIAL);}

<<EOF>> {
    yyterminate();}

.* { printf(" ");}

%%

int main(int argc, char* argv[]){
    yylex();
}

It's still not working and for every line in my file it says ' - UNEXPECTED END OF FILE' What am I matching wrong?

Of course, If I add a rule like this

<INITIAL>^{DNI}\t{NOMBRECOMPLETO}\t{NOTA}    {
    printf("%s;", yytext);
    BEGIN LINE_OK;}

It recognizes it as a good line, but that's not what I'm trying to achieve since that wouldn't be different from just {DNI}\t{NOMBRECOMPLETO}\t{NOTA} and then strtok-ing

1
^ means "at the beginning of a line". It doesn't mean "at the beginning of a line or the end of the last match," only "at the beginning of a line". Every match starts right after the end of the previous match.rici
Oh, snap. You're right. I was so focused on the states thing that I forgot a simple thing like that. Thanks a lot again.Trigork

1 Answers

1
votes

Nothing in any of your patterns recognizes a newline. So when flex hits the newline, the default rule will match, and you'll still be in the start condition for {NOTA}.

I recommend using %option nodefault, which will produce an error rather than invoking the default action. And then you'll have to insert your own matches for any other string. A simple error action would be to match any character and then skip to the newline or EOF. Don't forget to reset the start condition to initial when you hit the newline. In fact, you might just want to use the following:

\n { BEGIN(INITIAL); }

although that won't signal an error if a grade is missing.

flex isn't really the ideal tool for this sort of parsing, but the way you're using start conditions is reasonable.