I am trying to parse a simple script language using FsLex and FsYacc, and I have a problem with distinguishing the minus operator from negative numbers.
If I evaluate the term "1 - 2", the parser will return the desired AST: Minus(NumberLiteral(1.0),NumberLiteral(2.0))
. But if I evaluate the term "1-2", the lexer will produce the number 1, followed by the number -2, which is not a valid input.
I have made a minimal program to reproduce my problem. Ast defined like this:
module Ast
type Expression =
| NumberLiteral of double
| Minus of Expression * Expression
The lexer code looks like this:
{
module Lexer
open Microsoft.FSharp.Text.Lexing
open Parser
}
let whitespace = ' '
let digit = ['0' - '9']
let number = '-'?digit+
rule token = parse
| whitespace* { token lexbuf }
| '-' { MINUS }
| number { lexbuf |> LexBuffer<_>.LexemeString |> System.Double.Parse |> NUMBER }
| eof { EOF }
The parser looks like this:
%{
open Ast
%}
%start start
%token EOF MINUS
%token <double> NUMBER
%type < Expression > start
%%
start:
| expression EOF { $1 }
expression:
| NUMBER { NumberLiteral $1 }
| expression
MINUS expression { Minus($1, $3) }
My initial thought was to not handle the -
as part of the number in the lexer, and let the parser determine if the MINUS
token should result in a minus operator or a negative number. Unfortunately that would also result in the input "- 2" to be evaluated as a negative number, because the whitespace would be consumed.
But I assume that this must be a common problem, and a common solution must exist. So how do I best handle this?