1
votes

I'm using a FParsec to write a small org-mode parser, for fun, and I'm having a little trouble with parsing a table row into a list of strings. My current code looks like this:

let parseRowEntries :Parser<RowEntries, unit> =
    let skipInitialPipe = skipChar '|'
    let notaPipe  = function
        | '|' -> false
        | _ -> true
    let pipeSep = pchar '|'

    skipInitialPipe >>. sepEndBy (many1Satisfy notaPipe) pipeSep
    |>> RowEntries

This works fine until you parse the string |blah\n|blah\n|blah| which should fail because of the newline character. Unfortunately simply making \n false in the notaPipe condition causes the parser to stop after the first 'blah' and say it was parsed successfully. What I want the manySatisfy to do is parse (almost) any characters, stopping at the pipe, failing to parse for newlines (and likely the eof character).

I've tried using charsTillString but that also just halts parsing at the first pipe, without an error.

1
So the rule is that a line must start and end with a pipe character, right? I.e., |foo|\n|bar|\n is valid, but |foo\n|bar\n is invalid because there's no terminating pipe? In that case I think what you want is something using the between combinator. I'll do a few tests and then write up an answer with what I've found. - rmunn

1 Answers

1
votes

If I've understood your spec correctly, this should work:

let parseOneRow :Parser<_, unit> =
    let notaPipe  = function
        | '|' -> false
        | '\n' -> false
        | _ -> true
    let pipe = pchar '|'

    pipe >>. manyTill (many1Satisfy notaPipe .>> pipe) (skipNewline <|> eof)

let parseRowEntries :Parser<_, unit> =
    many parseOneRow

run parseRowEntries "|row|with|four|columns|\n|second|row|"
// Success: [["row"; "with"; "four"; "columns"]; ["second"; "row"]]

The structure is that each row starts with a pipe, then the segments within a row are conceptually row|, with|, and so on. The .>> combinator discards the pipe. The reason the "till" part of that line uses skipNewline instead of newline is because the eof parser returns unit, so we need a parser that expects newlines and returns unit. That's the skipNewline parser.

I've tried throwing newlines in where they don't belong (before the pipes, for example) and that causes this parser to fail exactly as it should. It also fails if a column is empty (that is, two pipe characters occur side by side like ||), which I think is also what you want. If you want to allow empty rows, just use manySatisfy instead of many1Satisfy.