2
votes

I try to parse the call of a function, here are the variants:

add 8 2
add x y
add (inc x) (dec y)
funcWithoutArgs

Depending on how I distribute my analyzers in the code, and perhaps also how they are coded, I get errors, as well as successful but unwanted analyses. For example, this:

add 4 7

returns the following AST:

[Call ("foo",[Number 4]);
 Number 7]

He therefore only takes the first parameter.

When I do that:

foo x y

He sends me back this AST:

[Call ("foo",[Call ("x",[Call ("y",[])])])]

And that's not what I want, since here, each parameter calls the next one as a parameter.

Another example, when I do this:

foo x y
inc x

I get:

[Call ("foo",[Call ("x",[Call ("y",[Call ("inc",[Call ("x",[])])])])])]

It does the same as above, but also calls the code that follows the line. When I ask my analyzer for a new line (see code), it sends me this:

[Call ("foo",[]); Call ("x",[]); Call ("y",[]); Call ("inc",[]); Call ("x",[])]

Even in brackets it doesn't work:

foo (x) (y)

Give:

[Call ("foo",[]); Call ("x",[]); Call ("y",[])]

And:

add (inc x) (dec y)

Give:

Error in Ln: 1 Col: 1
Note: The error occurred on an empty line.

The parser backtracked after:
  Error in Ln: 2 Col: 5
  add (inc x) (dec y)
      ^
  Expecting: end of input or integer number (32-bit, signed)

  The parser backtracked after:
    Error in Ln: 2 Col: 10
    add (inc x) (dec y)
             ^
    Expecting: ')'

[]

In short, my function call analyzer does not work properly. Every time I change something, like a new line, an attempt, or a different hierarchy, something doesn't work... Do you have any idea how to solve this very annoying problem?

Here is the minimum functional code that was used:

open FParsec

// Ast

type Expression =
    | Number of int
    | Call of string * Expression list

type Program = Expression list

// Tools

let private bws p =
    spaces >>? p .>>? spaces

let private suiteOf p =
    sepEndBy p spaces1

let inline private betweenParentheses p label =
    between (pstring "(") (pstring ")") p
    <?> (label + " between parentheses")

let private identifier =
    many1Satisfy2 isLetter (fun c -> isLetter c)

// Expressions

let rec private call = parse {
        let! call = pipe2 (spaces >>? identifier) (spaces >>? parameters)
                        (fun id parameters -> Call(id, parameters)) // .>>? newline
        return call
    }

and private parameters = suiteOf expression

and private callFuncWithoutArgs =
    identifier |>> fun id -> Call(id, [])

and private number = pint32 |>> Number

and private betweenParenthesesExpression =
    parse { let! ex = betweenParentheses expression "expression"
            return ex }

and private expression =
    bws (attempt betweenParenthesesExpression <|>
         attempt number <|>
         attempt call <|>
         callFuncWithoutArgs)

// -------------------------------

let parse code =
    let parser = many expression .>>? eof

    match run parser code with
        | Success(result, _, _) -> result
        | Failure(msg, _, _) ->
            printfn "%s" msg
            []

System.Console.Clear()

parse @"
add 4 7

foo x y

inc x

foo (x) (y)

add (inc x) (dec y)

" |> printfn "%A"
1
First question: why do you want funcWithoutArgs to be parsed as a function call rather than as an identifier? In every language I'm aware of, calling a function requires different syntax than referring to it: e.g., func() is a function call, while func is simply a reference to the function. I think part (though not all) of your problem may be stemming from the fact that funcWithoutArgs is parsing as a function call. - rmunn
Second question: you state that [Call ("foo",[Call ("x",[Call ("y",[])])])] is not what you want foo x y to parse to, but I can't tell from the types in your example what you would want foo x y to parse to. I would assume you would want it to parse to something like [Call ("foo", [Identifier "x"; Identifier "y"])], but there's no Identifier in the DU in your pared-down example. So what do you want foo x y to actually parse to? What's your actually desired result? - rmunn
@rmunn First answer: funcWithoutArgs is, as its name suggests, supposed to be a function call, without arguments, which can be seen as a simple identifier, in fact. For example, in OCaml, Haskell or even F#, there is, as far as I know, no distinction, since the variables/data are simply data that can be applied. That's what I've been trying to replicate. What would you have suggested? Maybe to create a new Variable node? - Foxy
Second answer: in fact, foo x y should be analyzed as follows: [call ("foo",[Call ("x", []); call ("y", []))]. According to what I said above. - Foxy

1 Answers

3
votes

Your main problem is that you have the wrong high-level design for your parser.

Your current design is that an expression can be:

  1. An expression (a "sub-expression", so to speak) between parentheses (no problem here)
  2. A number (no problem here)
  3. A call with parameters, which is an identifier followed by a space-separated list of expressions (this is the main part of the problem)
  4. A call without parameters, which is a single identifier (this contributes to the problem)

Looking at the expression foo x y, let's apply those rules in order as a parser would. There are no parentheses and foo isn't a number, so it's either 3 or 4. First we try 3. foo is followed by x y: does x y parse as an expression? Why, yes, it does: it parses as a call with parameters, where x is the function and y is the parameter. Since x y matches 3, it parses according to rule 3 without checking rule 4, and so foo x y matches like foo (x y) would: a call to foo with a single parameter, which is a call to x with parameter y.

How to fix this? Well, you could try swapping the order of 3 and 4, so that a function call without parameters is checked before a call with parameters (which would make x y parse as just x. But that would fail, because foo x y would match as just foo. So putting rule 4 before rule 3 doesn't work here.

The real solution is to split the rules for an expression apart into two levels. The "inner" level, which I'll call a "value", could be:

  1. An expression between parentheses
  2. A number
  3. A function call without parameters

And the "outer" level, the parse rules for expressions, would be:

  1. A function call with parameters, all of which are values, not expressions
  2. A value

Note that these parsing levels are mutually recursive, so you'll need to use createParserForwardedToRef in your implementation. Let's look at how foo x y would be parsed with this design:

First, foo parses as an identifier, so check if it could be a function call with parameters. Does x parse as a value? Yes, under rule 3 of values. And does y parse as a value? Yes, under rule 3 of values. So foo x y parses as a function call.

Now what about funcWithoutParameters? It would fail rule 1 of expressions because it's not followed by a list of parameters. So it would be checked for rule 2 of expressions, and then it would match under rule 3 of values.

Okay, a basic sanity check of the pseudocode works, so let's turn this into code. But first, I'll mention the other problem in your parser which I haven't mentioned yet, which is that you don't realize that the FParsec spaces parser also matches newlines. So when you wrap your expression parser in bws ("between whitespace"), it will also consume newlines after the text it parses. So when you're parsing something like:

foo a b
inc c

The suiteOf expression sees the list a b inc c and turns all of those into parameters for foo. In my code below I've distinguished between FParsec's spaces parser (which includes newlines) and a parser that parses only horizontal whitespace (space and tab but not newline), using each in the appropriate place. The following code implements the design I mentioned in this answer and its output looks right to me for all the test expressions you wrote:

open FParsec

// Ast

type Expression =
    | Number of int
    | Call of string * Expression list

type Program = Expression list

// Tools

let private justSpaces  = skipMany  (pchar ' ' <|> pchar '\t')
let private justSpaces1 = skipMany1 (pchar ' ' <|> pchar '\t')

let private bws p =
    spaces >>? p .>>? spaces

let private suiteOf p =
    sepEndBy1 p (justSpaces1)

let inline private betweenParentheses p label =
    between (pstring "(") (pstring ")") p
    <?> (label + " between parentheses")

let private identifier =
    many1Satisfy2 isLetter (fun c -> isLetter c)

// Expressions

let private expression, expressionImpl = createParserForwardedToRef()

let private betweenParenthesesExpression =
    parse { let! ex = betweenParentheses expression "expression"
            return ex }

let private callFuncWithoutArgs =
    (identifier |>> fun id -> Call(id, []))

let private number = pint32 |>> Number

let private value =
    justSpaces >>? (attempt betweenParenthesesExpression <|>
                    attempt number <|>
                    callFuncWithoutArgs)

let private parameters = suiteOf value

let rec private callImpl = parse {
        let! call = pipe2 (justSpaces >>? identifier) (justSpaces >>? parameters)
                          (fun id parameters -> Call(id, parameters))
        return call }

let call = callImpl

expressionImpl.Value <-
    bws (attempt call <|>
         value)

// -------------------------------

let parse code =
    let parser = many expression .>>? (spaces >>. eof)

    match run parser code with
        | Success(result, _, _) -> result
        | Failure(msg, _, _) ->
            printfn "%s" msg
            []

System.Console.Clear()

parse @"
add 4 7

foo x y

inc x

foo (x) (y)

add (inc x) (dec y)
" |> printfn "%A"

P.S. I used the following operator suggested by http://www.quanttec.com/fparsec/users-guide/debugging-a-parser.html to greatly help me in tracing the problem:

let (<!>) (p: Parser<_,_>) label : Parser<_,_> =
    fun stream ->
        printfn "%A: Entering %s" stream.Position label
        let reply = p stream
        printfn "%A: Leaving %s (%A)" stream.Position label reply.Status
        reply

Usage: turn let parseFoo = ... into let parseFoo = ... <!> "foo". Then you'll get a stream of debugging output in your console that looks something like this:

(Ln: 2, Col: 20): Entering expression
(Ln: 3, Col: 1): Entering call
(Ln: 3, Col: 5): Entering parameters
(Ln: 3, Col: 5): Entering bwParens
(Ln: 3, Col: 5): Leaving bwParens (Error)
(Ln: 3, Col: 5): Entering number
(Ln: 3, Col: 6): Leaving number (Ok)
(Ln: 3, Col: 7): Entering bwParens
(Ln: 3, Col: 7): Leaving bwParens (Error)
(Ln: 3, Col: 7): Entering number
(Ln: 3, Col: 8): Leaving number (Ok)
(Ln: 3, Col: 8): Leaving parameters (Ok)
(Ln: 3, Col: 8): Leaving call (Ok)
(Ln: 3, Col: 8): Leaving expression (Ok)

That helps greatly when you're trying to figure out why your parser isn't doing what you expect.