When a Parsec parser, like parse_atom, is run on a particular string, there are four possible results:
- It succeeds, consuming some input.
- It fails, consuming some input.
- It succeeds, consuming no input.
- It fails, consuming no input.
In the Parsec source code, these are referred to as "consumed ok", "consumed err", "empty ok" and "empty err" (sometimes abbreviated cok, cerr, eok, eerr).
When two Parsec parsers are used in an alternative, like p <|> q, here's how it's parsed. First, Parsec tries to parse with p. Then:
- If this results in "consumed ok" or "empty ok", the parse succeeds and this becomes the result of the entire parser
p <|> q.
- If this results in "empty err", Parsec tries the alternative
q, and this becomes the result of the entire p <|> q parser.
- If this results in "consumed err", the entire parser
p <|> q fails with "consumed err" (cerr).
Note the critical difference between p returning cerr (which causes the whole parser to fail) versus returning eerr (which causes the alternative parser q to be tried).
The try function changes the behavior of a parser by converting a "cerr" result to an "eerr" result.
This means that if you are trying to parse the text "s(t)" with different parsers:
- with the parser
parse_atom <|> parse_op, the parser parse_atom returns "cok" consuming "s" and leaving unparseable text "(t)" which causes an error
- with the parser
try parse_atom <|> parse_op, the parser parse_atom still returns "cok" consuming "s", so the try (which only changes cerr to eerr) has no effect, and the unparseable text "(t)" causes the same error
- with the parser
parse_op <|> parse_atom, the parser parse_op successfully parses the string (actually, it doesn't because the recursive call to parse_exp can't parse "t", but let's ignore that); however, if the same parser was used on the text "s", then parse_op would consume the "s" before failing (i.e., cerr), causing the entire parse to fail instead of trying the alternative parse_atom
- with the parser
try parse_op <|> parse_atom, this would parse "s(t)", exactly as the previous example, and the try would have no effect; however, it would also work on the text "s", because parse_op would consume the "s" before failing with cerr, then try would "rescue" the parse by turning the cerr into an eerr, and the alternative parse_atom would be checked, successfully parsing (cok) the atom "s".
That's why the "correct" parser for your problem is try parse_op <|> parse_atom.
Be warned that this behavior isn't a fundamental aspect of monadic parsers. It's a design choice made by Parsec (and compatible parsers like Megaparsec). Other monadic parsers can have different rules for how alternatives with <|> work.
The "general fix" for these kind of Parsec parsing problems is to be aware of the facts that in the expression p <|> q:
p is tried first, and if it succeeds, q will be ignored, even if q would provide a "longer" or "better" or "more sensible" parse or avoid additional parsing errors further down the road. In parse_atom <|> parse_op, because parse_atom can succeed on strings meant for parse_op, this order won't work correctly.
q is only tried if p fails without consuming input. You must arrange for p to not consume anything on failure, possibly by using try, if you expect the alternative q to be checked. So, parse_op <|> parse_atom isn't going to work if parse_op starts to consume something (like an identifier) before realizing it can't continue and returning cerr.
As an alternative to using try, you can also think more carefully about the structure of your parser. An alternative way of writing parse_exp, for example, would be:
parse_exp :: Parser Exp
parse_exp = do
-- there's always an identifier
x <- many1 letter
-- there *might* be an expression in parentheses
y <- optionMaybe (parens parse_exp)
case y of
Nothing -> return (Atom x)
Just y' -> return (Op x y')
where parens = between (char '(') (char ')')
This can be written a little more concisely, but even then it's not as "elegant" as something like try parse_op <|> parse_atom. (It performs better, though, so that might be a consideration in some applications.)