0
votes

I am new to lex/yacc and I am writing a SQL parser using lex/yacc.However for a particular JOIN query (listed below) my parser is deliberately accessing the 'select_statement' rule instead of the 'nested_join_statement' rule.

I am getting the correct output for this query:SELECT * FROM sample1 JOIN sample2 ON sample1.C1 = sample2.C4;
(this going to the 'join_statement rule', satisfying the simple_join rule within)

But When I am trying this query:SELECT * FROM (SELECT * FROM (SELECT * FROM sample1)T8)AS temp1 JOIN (SELECT * FROM (SELECT * FROM sample2)T9)as temp2 ON temp1.C1 = temp2.C4;

(this should ideally go to the 'nested_join_statement' but instead it is going to the 'SELECT selection FROM LPAREN select_statement2 RPAREN VAR' rule within the 'select statement' rule and I get the following error message: ERROR:syntax error, unexpected AS, expecting VAR )

I am placing priority for the nested_join_statement over the select_statement but still get this error. I dont understand why.

the VAR in lex is defined as [A-Za-z][A-Za-z0-9_#-]*

Any help would be appreciated. I am desperate.

    manipulation_statement: NEWLINE
            | join_statement SEMICOLON
            | nested_join_statement SEMICOLON
            | select_statement SEMICOLON { flag=0;q=0;}
            | join_statement SEMICOLON NEWLINE
            | nested_join_statement SEMICOLON NEWLINE
            | select_statement SEMICOLON NEWLINE { flag=0;q=0;}
            ;
    nested_join_statement: two_nest_select_join { for(x=0;x<q;x++) strcpy(sj[x], ""); i=0;j=0;vardot=0;q=0;flag=0;nest=0;}
            ;
    join_statement: 
            simple_join { for(x=0;x<q;x++) strcpy(sj[x], ""); i=0;j=0;vardot=0;q=0;flag=0;}
            | simple_join_nest_select { for(x=0;x<q;x++) strcpy(sj[x], ""); i=0;j=0;vardot=0;q=0;flag=0;}
           ;
    simple_join: SELECT selection FROM VAR JOIN VAR ON VAR DOTS VAR EQUAL VAR DOTS VAR { printf("inside simple join\n");                                                                
                                                            if(strcmp($4,$8) == 0) join_table($4,$6,sj,q,$10,$14,"","");
                                                            else join_table($4,$6,sj,q,$14,$10,"",""); p=q;}
        ;

    simple_join_nest_select: SELECT selection FROM VAR JOIN LPAREN select_statement RPAREN AS VAR ON VAR DOTS VAR EQUAL VAR DOTS VAR {
                                                   if (strcmp($4,$12) == 0) join_table($4,"_temp_",sj,q,$14,$18,"","");
                                                    else join_table($4,"_temp_",sj,q,$18,$14,"",""); p=q;}
        ;

    two_nest_select_join: SELECT selection FROM LPAREN select_statement RPAREN AS VAR JOIN LPAREN select_statement RPAREN AS VAR ON VAR DOTS VAR EQUAL VAR DOTS VAR {
                                        join_table("temp1","temp2",sj,q,$18,$22,"",""); }
        ;

    select_statement:  SELECT selection FROM VAR WHERE where_clause { select_table($4,s,i,whr_var,whr_val); 
                                                             for(x=0;x<i;x++) strcpy(s[x],"");i=0; strcpy(whr_var,""); strcpy(whr_val,"");j=0;}
            | SELECT selection FROM VAR {   select_table($4,s,i,whr_var,whr_val); 
                                            for(x=0;x<i;x++) strcpy(s[x],"");i=0;j=0;}
            | SELECT selection FROM LPAREN select_statement2 RPAREN VAR { printf("inner tab: %s\n", inner_tab); printf("dep:%d\n", dep+1);
                                                                            for(x=0; x<i;x++) printf("%s\n", col_array[x]);
                                                                            for(y=0; y<j;y++) printf("%d\n", col_count[y]);
                                                                            strcpy(inner_tab,""); nest =1; dep=0;
                                                                            for(x=0; x<i;x++) strcpy(col_array[x], "");
                                                                            for(y=0; y<j;y++) col_count[x] =0;
                                                                            i=0;j=0;k=0;m=0;}
            ;
    select_statement2: SELECT selection2 FROM VAR { dep++; inner_tab=malloc(strlen($4)); strcpy(inner_tab,$4); }
                | select_list
            ;
    select_list: SELECT selection2 FROM LPAREN select_statement2 RPAREN VAR { dep++; }
        ;
    selection2: ASTERISK { col_array[i] = $1; i++; m++; col_count[j] = m; j++; printf("in level:%d value of k:%d\n",j,m); k=0; m=0;}
      | comma_list2 { col_count[j] = m; j++; printf("in level:%d value of k:%d\n",j,m); k=0; m=0;  }
      ;
    comma_list2: VAR { col_array[i] = $1; i++; m++;}
        | comma_list2 COMMA VAR { col_array[i] = $3; i++; m++; 
                                  k=m;}
        ;
    selection: ASTERISK { if(q != 0 || nest == 1) { for(x=0;x<j;x++) col_count[x]=0; i=0; j=0;flag=1;}
                        s[i] = $1; col_array[i] = $1; i++; col_count[j] = i; j++; 
                        if(q == 0) {sj[q] = $1; q++;} printf("in level t:%d value of k:%d\n",j,i); printf("sj2:%s\n",sj[0]);}
      | comma_list { if (flag ==1 ) {col_count[j] = i; j++; printf("in level 2:%d value of k:%d\n",j,i); k=0; m=0; printf("temp:"); 
                                    for(x=0;x<i;x++) printf("%s ",s[x]); } else {col_count[j] = i; j++;} }
      ;
    comma_list: 
        VAR { if(q != 0 || nest == 1) { for(x=0;x<j;x++) col_count[x]=0; i=0; j=0;flag=1;}
                s[i] = $1; col_array[i] = $1; i++; if(q==0) {sj[q] = $1; q++; }}
        | VAR DOTS VAR { strcpy(temp,""); strcat(temp,$1); strcat(temp,$2); strcat(temp,$3); 
            sj[q] = temp;  q++; 
            printf("temps:"); for(x=0;x<q;x++) printf("%s\n",sj[x]);}
        | comma_list COMMA vardot {if (flag == 1) { s[i] = $3; col_array[i] = $3, i++;} else {sj[q] = $3; q++; vardot++;s[i] = $3; col_array[i] = $3, i++;}
                                                                            }
        ;
1
I replaced the Adobe Flex Tag with gnu-Flex; as I think that is more appropriate for your question.JeffryHouser
Yes, and you guys will keep having to do that for new GNU Flex questions until you choose a more reasonable spelling for the Adobe Flex tag!Kaz

1 Answers

2
votes

You have reduce/reduce conflicts in your grammar -- between selection and selection2, and between comma_list and comma_list2. Since the '2' versions come first in your grammar file, it will always reduce using those rules, which means that in any place where it can't tell the difference between a select_statement and a select_statement2 just by looking at the everything up to the relevant 'FROM' token it may do the wrong thing.

In your example, after it sees SELECT * FROM (SELECT * FROM, it has to make this choice (since the nested select in the parens might be either a select for a two_nest_select_join, or a select2 for a simple select_statement), and chooses the select2, so parses this as a select_statement, leading to the error you see when it gets to the AS token -- its expecting a VAR

If you want to fix this, you need to either use more lookahead, or change the grammar to get rid of the reduce/reduce conflict.

To use more lookahead, you may be able to use bison's %glr-parser option. Since you have no ambiguities here, you don't need to add any additional disambiguation code, but if you have ambiguities elsewhere, you may get runtime errors as a result.

To get rid of the reduce/reduce conflicts, you need to get rid of the duplicate selection2 rules -- if you change them all to the corresponding selection rules, you get rid of the conflicts, BUT you now accept a few constructs that you previously would have rejected (such as SELECT * FROM (SELECT * FROM VAR WHERE where_clause) VAR which you might want to disallow.