2
votes

I am trying to parse a simple script language using FsLex and FsYacc, and I have a problem with distinguishing the minus operator from negative numbers.

If I evaluate the term "1 - 2", the parser will return the desired AST: Minus(NumberLiteral(1.0),NumberLiteral(2.0)). But if I evaluate the term "1-2", the lexer will produce the number 1, followed by the number -2, which is not a valid input.

I have made a minimal program to reproduce my problem. Ast defined like this:

module Ast

type Expression =
  | NumberLiteral of double
  | Minus of Expression * Expression

The lexer code looks like this:

{
module Lexer
open Microsoft.FSharp.Text.Lexing
open Parser
}

let whitespace = ' '
let digit = ['0' - '9']
let number = '-'?digit+

rule token = parse
    | whitespace*   { token lexbuf }
    | '-'           { MINUS }
    | number        { lexbuf |> LexBuffer<_>.LexemeString |> System.Double.Parse |> NUMBER }
    | eof           { EOF }

The parser looks like this:

%{
open Ast
%}
%start start
%token EOF MINUS
%token <double> NUMBER
%type < Expression > start
%%

start: 
    | expression EOF    { $1 }

expression:
    | NUMBER            { NumberLiteral $1 }
    | expression
      MINUS expression  { Minus($1, $3) }

My initial thought was to not handle the - as part of the number in the lexer, and let the parser determine if the MINUS token should result in a minus operator or a negative number. Unfortunately that would also result in the input "- 2" to be evaluated as a negative number, because the whitespace would be consumed.

But I assume that this must be a common problem, and a common solution must exist. So how do I best handle this?

1

1 Answers

3
votes

The usual solution is that -2 is actually an expression. You can "constant fold" -- directly evaluate expressions whose arguments are constants -- if you feel that it would be too inefficient to evaluate -2 (or you could just handle that as a special case in the production MINUS expression).