Attoparsec is pure (Data.Attoparsec.Internal.Types.Parser
is not a transformer and doesn’t include IO
) so you’re right that you can’t expand includes from within a parser directly.
Splitting the parser into two passes seems like the right approach: one pass acts like the C preprocessor, accepting a file with include
statements interleaved with other stuff. The “other stuff” only needs to be basically lexically valid, not your full parser—just like the C preprocessor only cares about tokens and matching parentheses, not matching other brackets or anything semantic. You then replace the includes, producing a fully expanded file that you can give to your existing parser.
If an included file must be syntactically “standalone” in some sense†, then you can parse a whole file first, interleaved with include
s, then replace them. For instance:
-- Whatever items you’re parsing.
data Item
-- A reference to an included path.
data Include = Include FilePath
parse :: Parser [Either Include Item]
-- Substitute includes; also calls ‘parse’
-- recursively until no includes remain.
substituteIncludes :: [Either Include Item] -> IO [Item]
† Say, if you’re just using attoparsec for lexing tokens that can’t cross file boundaries anyway, or you’re doing full parsing but want to disallow an include file that contains e.g. unmatched brackets.
The other option is to embed IO
in your parser directly by using a different parsing library such as megaparsec, which provides a ParsecT
transformer that you can wrap around IO
to do IO
directly in your parser. I would probably do this for a prototype, but it seems tidier to separate the concerns of parsing and expansion as much as possible.