2
votes

I'm trying to implement js parser in haskell. But I'm stuck with automatic semicolon insertion. I have created test project to play around with problem, but I can not figure out how to solve the problem.

In my test project program is a list of expressions (unary or binary):

data Program = Program [Expression]

data Expression
    = UnaryExpression Number
    | PlusExpression Number Number

Input stream is a list of tokens:

data Token
    = SemicolonToken
    | NumberToken Number
    | PlusToken

I want to parse inputs like these:
1; - Unary expression
1 + 2; - Binary expression
1; 2 + 3; - Two expressions (unary and binary)
1 2 + 3; - Same as previous input, but first semicolon is missing. So parser consume token 1, but token 2 is not allowed by any production of grammar (next expected token is semicolon or plus). Rule of automatic semicolon insertion says that in this case a semicolon is automatically inserted before token 2.

So, what is the most elegant way to implement such parser behavior.

1

1 Answers

1
votes

You have

expression = try unaryExpression <|> plusExpression

but that doesn't work, since a UnaryExpression is a prefix of a PlusExpression. So for

input2 = [NumberToken Number1, PlusToken, NumberToken Number1, SemicolonToken]

the parser happily parses the first NumberToken and automatically adds a semicolon, since the next token is a PlusToken and not a SemicolonToken. Then it tries to parse the next Expression, but the next is a PlusToken, no Expression can start with that.

Change the order in which the parsers are tried,

expression = try plusExpression <|> unaryExpression

and it will first try to parse a PlusExpression, and only when that fails resort to the shorter parse of a UnaryExpression.