I'm trying to parse binary data using pipes-attoparsec in Haskell. The reason pipes (proxies) are involved is to interleave reading with parsing to avoid high memory use for large files. Many binary formats are based on blocks (or chunks), and their sizes are often described by a field in the file. I'm not sure what a parser for such a block is called, but that's what i mean by "sub-parser" in the title. The problem I have is to implement them in a concise way without a potentially large memory footprint. I've come up with two alternatives that each fail in some regard.
Alternative 1 is to read the block into a separate bytestring and start a separate parser for it. While concise, a large block will cause high memory use.
Alternative 2 is to keep parsing in the same context and track the number of bytes consumed. This tracking is error-prone and seems to infest all the parsers that compose into the final blockParser. For a malformed input file it could also waste time by parsing further than indicated by the size field before the tracked size can be compared.
import Control.Proxy.Attoparsec
import Control.Proxy.Trans.Either
import Data.Attoparsec as P
import Data.Attoparsec.Binary
import qualified Data.ByteString as BS
parser = do
size <- fromIntegral <$> anyWord32le
-- alternative 1 (ignore the Either for simplicity):
Right result <- parseOnly blockParser <$> P.take size
return result
-- alternative 2
(result, trackedSize) <- blockparser
when (size /= trackedSize) $ fail "size mismatch"
return result
blockParser = undefined
main = withBinaryFile "bin" ReadMode go where
go h = fmap print . runProxy . runEitherK $ session h
session h = printD <-< parserD parser <-< throwParsingErrors <-< parserInputD <-< readChunk h 128
readChunk h n () = runIdentityP go where
go = do
c <- lift $ BS.hGet h n
unless (BS.null c) $ respond c *> go