Shift/reduce error in parsing a data format containing slash-separated fields that end in a double slash

Question

I am parsing a series of lines, where each line contains a number of fields. The fields are separated by slash. The final line ends with two slashes. Several fields at the end of each line are optional; when omitted, the corresponding slashes are likewise omitted. I am getting shift/reduce conflicts.

I created a Minimal Reproducible Example to illustrate. There is just one line, containing two fields, the second field is optional, the line ends in two slashes. Here are two sample inputs:

/1/2//

/1//

Below is my lexer and parser. Bison raises a shift/reduce error message. I think that I know why Bison raises a shift/reduce error: after parsing the first field (e.g., after parsing "1") it encounters a slash, Bison doesn't know whether to reduce (there's only one field) or to shift (there's an upcoming second field). So there is an ambiguity and hence the shift/reduce error. Is that correct? Is there a way to modify the grammar so that there isn't an ambiguity?

Here is my .l file:

%{
#include "helloworld.tab.h"
%}

%%
"/"             { return yytext[0]; }
[ ]+
[0-9]+          { return(DGT); }
%%
int yywrap(){ return 1;}

Here is my .y file:

int yylex(void);
extern FILE *yyin;
void yyerror(const char* msg);
%}

%token DGT

%%
test: row '/' '/' 
 ;
 
row: '/' DGT '/' DGT        
 | '/' DGT                   
 ;
%%

int main(int argc, char *argv[])
{  
    yyin = fopen(argv[1], "r");
    
    yyparse();
    
    fclose(yyin);
    
    return 0;
}

void yyerror(const char *msg)
{
  fprintf(stderr, "%s\n", msg);
}

I think I would treat // as a single separate token. You need row: /*empty*/ | row '/' DGT ; surely? or if there must be at least one field row : '/' DGT | row '/' DGT ;. — user207421

rici rici · Accepted Answer · 2021-08-26T01:37:19

Here's a simple solution, which is actually similar to the more general grammar you started with:

/* This extra production allows the parser to recognise multiple lines
 * It requires that your lexer returns `'\n'` when it reads one.
 */
file  : %empty | file line '\n'
line  : fields '/' '/'
fields: %empty | fields field
field : '/' DGT

The big difference here is that the parser always makes fields out of the fields seen up to that point, which means that there is no confusion between the / which starts a field and the / which starts the end of line indicator.

Of course, the end of line indicator is a little redundant in this case because the grammar also happens to insist that the line be terminated with a newline. Putting that in a separate production was a bit arbitrary, but it seemed more likely to correspond with your real problem.

Shift/reduce error in parsing a data format containing slash-separated fields that end in a double slash

1 Answers