How to make a sub parser with Parsec?

Question

I would like to parse several lists of commands indented or formated as array with Parsec. As example, my lists will be formated like this:

Command1 arg1 arg2       Command1 arg1 arg2         Command1 arg1 arg2
Command2 arg1                                       Command3 arg1 arg2 arg3
                         Command3 arg1 arg2 arg3
                                                    Command4
Command3 arg1 arg2 arg3  Command2 arg1
                         Command4
Command4
Command5 arg1                                       Command2 arg1

These commands are supposed to be parsed column by column with state changes in the parser.

My idea is to gather the commands into separated list of string and parse these strings into a subparser (executed inside the main parser).

I inspected the API of the Parsec library but I didn't find a function to do that.

I considered using runParser but this function only extract the results of the parser and not its state.

I also considered making a function inspired by runParsecT and mkPT to make my own parser, but the constructors ParsecT or initialPos are not available (not exported by the library)

Is it possible to run a subparser inside a parser with Parsec?

If not, does a library such as megaparsec can solve my problem?

You could just getState and include the state in the result of the parser. — Julia Path
I had this idea, but I would like the state of the parser to be returned when the subparser fail (to push the errors messages to the main parser). — JeanJouX
megaparsec does have runParser' which returns the state. — Julia Path
@JeanJouX If you only care about simple error messages, maybe the ParserError is enough? The parser user state is meant to backtrack on failure. If you want some state/output to collect state/data that does not backtrack, chances are you should keep that outside of Parsec (e.g. stacking a ParserT on a WriterT). — that other guy

Hans Krüger Hans Krüger · Accepted Answer · 2019-01-29T23:37:53

Not a complete answer, more a question for clarification:

Is it necessary to build a list of strings? I would prefer to parse the input and convert it into a more special datatype. By that you can use the type guarantees of haskell.

I would begin by defining a datatype for my commands:

data Command = Command1 Argtype1 
               | Command2 Argtype2
               | Command3 Argtype1 Argtype2

data Argtype1 = Arg1 | Arg2 | ArgX
data Argtype2 = Arg2_1 | Arg2_2

After that you can parse the input and put it in datatypes.

At the end of the parsing you can mappend the results (that is for lists adding at the front with operation (:)).

You end up with a datatype of [Command]. With that you can work further.

For parsing the text you can follow the introduction to the package megaparsec at (https://markkarpov.com/megaparsec/parsing-simple-imperative-language.html)

Or do you mean something completly different? Perhaps that every line (containing some commands) is as it whole shall be one input of a state machine and the state machine changes in relation to the commands? Then I wonder why the state machine shall be implemented as a parser.

How to make a sub parser with Parsec?

2 Answers