0
votes

I am using LALR(1) parsing from lark-parser library. I have written a grammar to parse an ORM like language. An example of my language is pasted below:

Table1
.join(table=Table2, left_on=[column1], right_on=[column_2])
.group_by(col=[column1], agg=[sum])
.join(table=Table3, left_on=[column1], right_on=[column_3])
.some_column

My grammar is:

start: [CNAME (object)*]
object: "." (CNAME|operation)
operation: [(join|group) (object)*]

join: "join" "(" [(join_args ",")* join_args] ")"
join_args: "table" "=" CNAME
         | "left_on" "=" list
         | "right_on" "=" list

group: "group_by" "(" [(group_args ",")* group_args] ")"
group_args: "col" "=" list
          | "agg" "=" list 

list: "[" [CNAME ("," CNAME)*] "]"

%import common.CNAME     //# Variable name declaration
%import common.WS        //# White space declaration
%ignore WS

When I parse the language, it gets parsed correctly but I get shift-reduce conflict warning. I believe that this is due to collision at object: "." (CNAME|operation), but I may be wrong. Is there any other way to write this grammar ?

1

1 Answers

2
votes

I think you should replace

operation: [(join|group) (object)*]

With just

operation: join | group

You've already allowed repetition of object in

start: [CNAME (object)*]

So also allowing object* at the end of operation is ambiguous, leading to the conflict.

Personally, I would have gone for something like:

start    : [ CNAME ("." qualifier)* ]
qualifier: CNAME | join | group

Because I don't see the point of object. But that's just a minor style difference.