17
votes

The indents package for Haskell's Parsec provides a way to parse indentation-style languages (like Haskell and Python). It redefines the Parser type, so how do you use the token parser functions exported by Parsec's Text.Parsec.Token module, which are of the normal Parser type?

Background

Parsec comes with a load of modules. most of them export a bunch of useful parsers (e.g. newline from Text.Parsec.Char, which parses a newline) or parser combinators (e.g. count n p from Text.Parsec.Combinator, which runs the parser p, n times)

However, the module Text.Parsec.Token would like to export functions which are parametrized by the user with features of the language being parsed, so that, for example, the braces p function will run the parser p after parsing a '{' and before parsing a '}', ignoring things like comments, the syntax of which depends on your language.

The way that Text.Parsec.Token achieves this is that it exports a single function makeTokenParser, which you call, giving it the parameters of your specific language (like what a comment looks like) and it returns a record containing all of the functions in Text.Parsec.Token, adapted to your language as specified.

Of course, in an indentation-style language, these would need to be adapted further (perhaps? here's where I'm not sure – I'll explain in a moment) so I note that the (presumably obsolete) IndentParser package provides a module Text.ParserCombinators.Parsec.IndentParser.Token which looks to be a drop-in replacement for Text.Parsec.Token.

I should mention at some point that all the Parsec parsers are monadic functions, so they do magic things with state so that error messages can say at what line and column in the source file the error appeared

My Problem

For a couple of small reasons it appears to me that the indents package is more-or-less the current version of IndentParser, however it does not provide a module that looks like Text.ParserCombinators.Parsec.IndentParser.Token, it only provides Text.Parsec.Indent, so I am wondering how one goes about getting all the token parsers from Text.Parsec.Token (like reserved "something" which parses the reserved keyword "something", or like braces which I mentioned earlier).

It would appear to me that (the new) Text.Parsec.Indent works by some sort of monadic state magic to work out at what column bits of source code are, so that it doesn't need to modify the token parsers like whiteSpace from Text.Parsec.Token, which is probably why it doesn't provide a replacement module. But I am having a problem with types.

You see, without Text.Parsec.Indent, all my parsers are of type Parser Something where Something is the return type and Parser is a type alias defined in Text.Parsec.String as

type Parser = Parsec String ()

but with Text.Parsec.Indent, instead of importing Text.Parsec.String, I use my own definition

type Parser a = IndentParser String () a

which makes all my parsers of type IndentParser String () Something, where IndentParser is defined in Text.Parsec.Indent. but the token parsers that I'm getting from makeTokenParser in Text.Parsec.Token are of the wrong type.

If this isn't making much sense by now, it's because I'm a bit lost. The type issue is discussed a bit here.


The error I'm getting is that I've tried replacing the one definition of Parser above with the other, but then when I try to use one of the token parsers from Text.Parsec.Token, I get the compile error

Couldn't match expected type `Control.Monad.Trans.State.Lazy.State
                                Text.Parsec.Pos.SourcePos'
            with actual type `Data.Functor.Identity.Identity'
Expected type: P.GenTokenParser
                 String
                 ()
                 (Control.Monad.Trans.State.Lazy.State Text.Parsec.Pos.SourcePos)
  Actual type: P.TokenParser ()

Links

Sadly, neither of the examples above use token parsers like those in Text.Parsec.Token.

1
A Parser Something is a ParsecT String () Identity Something. The wrapped monad is Identity. But an IndentParser wraps State SourcePos. The things you get from a TokenParser are all ParsecT s u m Something, so perhaps it's as easy as generalising your types to ParsecT String () m Something from Parser Something. Then they can be used with m = Identity or m = State SourcePos, as needed.Daniel Fischer
I'm trying something like that, and getting Not in scope: type variable 'm'. So I try adding the context Monad m =>, and I get Illegal polymorphic or qualified type: forall (m :: * -> *). Monad m => ParsecT s u mBeetle
Can you give an example of such a signature?Daniel Fischer
I tried replacing the line type Parser a = IndentParser String () a with type Beetle s u = (Monad m) => ParsecT s u m followed by type Parser = Beetle String () or type Beetle s u a = ParsecT s u m a and type Parser a = Beetle String () a. All my parser functions are still declared as things like foo :: Parser Something.Beetle

1 Answers

14
votes

What are you trying to do?

It sounds like you want to have your parsers defined everywhere as being of type

Parser Something

(where Something is the return type) and to make this work by hiding and redefining the Parser type which is normally imported from Text.Parsec.String or similar. You still need to import some of Text.Parsec.String, to make Stream an instance of a monad; do this with the line:

import Text.Parsec.String ()

Your definition of Parser is correct. Alternatively and equivalently (for those following the chat in the comments) you can use

import Control.Monad.State
import Text.Parsec.Pos (SourcePos)

type Parser = ParsecT String () (State SourcePos)

and possibly do away with the import Text.Parsec.Indent (IndentParser) in the file in which this definition appears.

Error, error on the wall

Your problem is that you're looking at the wrong part of the compiler error message. You're focusing on

Couldn't match expected type `State SourcePos' with actual type `Identity'

when you should be focusing on

Expected type: P.GenTokenParser ...
  Actual type: P.TokenParser ...

It compiles!

Where you "import" parsers from Text.Parsec.Token, what you actually do, of course (as you briefly mentioned) is first to define a record your language parameters and then to pass this to the function makeTokenParser, which returns a record containing the token parsers.

You must therefore have some lines that look something like this:

import qualified Text.Parsec.Token as P

beetleDef :: P.LanguageDef st
beetleDef =
    haskellStyle {
        parameters, parameters etc.
        }

lexer :: P.TokenParser ()
lexer = P.makeTokenParser beetleDef

... but a P.LanguageDef st is just a GenLanguageDef String st Identity, and a P.TokenParser () is really a GenTokenParser String () Identity.

You must change your type declarations to the following:

import Control.Monad.State
import Text.Parsec.Pos (SourcePos)
import qualified Text.Parsec.Token as P

beetleDef :: P.GenLanguageDef String st (State SourcePos)
beetleDef =
    haskellStyle {
        parameters, parameters etc.
        }

lexer :: P.GenTokenParser String () (State SourcePos)
lexer = P.makeTokenParser beetleDef

... and that's it! This will allow your "imported" token parsers to have type ParsecT String () (State SourcePos) Something, instead of Parsec String () Something (which is an alias for ParsecT String () Identity Something) and your code should now compile.

(For maximum generality, I'm assuming that you might be defining the Parser type in a file separate from, and imported by, the file in which you define your actual parser functions. Hence the two repeated import statements.)

Thanks

Many thanks to Daniel Fischer for helping me with this.