2
votes

I would like to parse several lists of commands indented or formated as array with Parsec. As example, my lists will be formated like this:

Command1 arg1 arg2       Command1 arg1 arg2         Command1 arg1 arg2
Command2 arg1                                       Command3 arg1 arg2 arg3
                         Command3 arg1 arg2 arg3
                                                    Command4
Command3 arg1 arg2 arg3  Command2 arg1
                         Command4
Command4
Command5 arg1                                       Command2 arg1

These commands are supposed to be parsed column by column with state changes in the parser.

My idea is to gather the commands into separated list of string and parse these strings into a subparser (executed inside the main parser).

I inspected the API of the Parsec library but I didn't find a function to do that.

I considered using runParser but this function only extract the results of the parser and not its state.

I also considered making a function inspired by runParsecT and mkPT to make my own parser, but the constructors ParsecT or initialPos are not available (not exported by the library)

Is it possible to run a subparser inside a parser with Parsec?

If not, does a library such as megaparsec can solve my problem?

2
You could just getState and include the state in the result of the parser.Julia Path
I had this idea, but I would like the state of the parser to be returned when the subparser fail (to push the errors messages to the main parser).JeanJouX
megaparsec does have runParser' which returns the state.Julia Path
@JeanJouX If you only care about simple error messages, maybe the ParserError is enough? The parser user state is meant to backtrack on failure. If you want some state/output to collect state/data that does not backtrack, chances are you should keep that outside of Parsec (e.g. stacking a ParserT on a WriterT).that other guy

2 Answers

4
votes

Not a complete answer, more a question for clarification:

Is it necessary to build a list of strings? I would prefer to parse the input and convert it into a more special datatype. By that you can use the type guarantees of haskell.

I would begin by defining a datatype for my commands:

data Command = Command1 Argtype1 
               | Command2 Argtype2
               | Command3 Argtype1 Argtype2

data Argtype1 = Arg1 | Arg2 | ArgX
data Argtype2 = Arg2_1 | Arg2_2 

After that you can parse the input and put it in datatypes.

At the end of the parsing you can mappend the results (that is for lists adding at the front with operation (:)).

You end up with a datatype of [Command]. With that you can work further.

For parsing the text you can follow the introduction to the package megaparsec at (https://markkarpov.com/megaparsec/parsing-simple-imperative-language.html)


Or do you mean something completly different? Perhaps that every line (containing some commands) is as it whole shall be one input of a state machine and the state machine changes in relation to the commands? Then I wonder why the state machine shall be implemented as a parser.

2
votes

As a starting point, the simplest answer to "How to make a sub parser" is using the monadic bind, applicative <*>, alternative <|>, and the combinators provided by the library. Assuming that each command belongs to a single type (as in Hans Kruger's answer), and with arbitrary number of columns, the below might make a good template.

import Text.Parsec
import Text.Parsec.Char
import Data.List(transpose)

cmdFileParser :: Parsec s u [[CommandType]] 
cmdFileParser = sepBy sepParser cmdLineParser
   where
     sepParser = newline --From Text.Parsec.Char

cmdLineParser :: Parsec s u [CommandType]
cmdLineParser = sepBy sepParser cmdParser
   where
     sepParser = tab


cmdParser :: Parsec s u CommandType
cmdParser =   parseCommand1
              <|> parseCommand2
              <|> parseCommand3 
              <|> etc 

Then, after the the parsing, transpose the [[CommandType]] to group commands by column

main = do
  ...
  let ret = runParser cmdFileParser 
                       "debug string telling what was parsed" 
                       stringToParse
  case ret of
    Left e -> putStrLn "wasn't parsed"
    Right cmds -> doSomethingWith (transpose cmds)

I would say that the above is a typical approach. There are variations of course. For instance if you know there should be only three columns, you might have instead of the above cmdLineParser the below

cmdLineParser :: Parsec s u (CommandType,CommandType,CommandType)
cmdLineParser = (\a b c -> (a,b,c)) <$> ct <*> ct <*> cmdParser
   where
     ct = cmdParser <* tab

I would say that using getState is atypical. When I first started using Parsec, I remember getting something like what I think you are after working, but it wasn't pretty. Of course, if you really want to just return the strings you can always parse for any char except your newlines and tabs.

cmdParser :: Parsec s u String
cmdParser = many (noneOf "\n\t")

Although, careful of using the above. I've been burned in my use of many before, where it takes too much or always succeeds. So I don't have high confidence that that exact formulation will get you the command string. Also, if you just parse that command as a string, then reparse the command in your main, you will be parsing twice!