First of all, I think that the answers others have supplied will work at least 95% of the time. It's always good practice to code for the problem at hand by using appropriate data types (or tuples in some cases). However, sometimes you really don't know in advance what you're looking for in the list, and in these cases trying to enumerate all possibilities is difficult/time-consuming/error-prone. Or, you're writing multiple variants of the same sort of thing (manually inlining multiple folds into one) and you'd like to capture the abstraction.
Fortunately, there are a few techniques that can help.
The framework solution
(somewhat self-evangelizing)
First, the various "iteratee/enumerator" packages often provide functions to deal with this sort of problem. I'm most familiar with iteratee, which would let you do the following:
import Data.Iteratee as I
import Data.Iteratee.Char
import Data.Maybe
-- first, you'll need some way to process the Atoms/Sheets/etc. you're getting
-- if you want to just return them as a list, you can use the built-in
-- stream2list function
-- next, create stream transformers
-- given at :: B.ByteString -> Maybe Atom
-- create a stream transformer from ByteString lines to Atoms
atIter :: Enumeratee [B.ByteString] [Atom] m a
atIter = I.mapChunks (catMaybes . map at)
otIter :: Enumeratee [B.ByteString] [Sheet] m a
otIter = I.mapChunks (catMaybes . map ot)
-- finally, combine multiple processors into one
-- if you have more than one processor, you can use zip3, zip4, etc.
procFile :: Iteratee [B.ByteString] m ([Atom],[Sheet])
procFile = I.zip (atIter =$ stream2list) (otIter =$ stream2list)
-- and run it on some data
runner :: FilePath -> IO ([Atom],[Sheet])
runner filename = do
resultIter <- enumFile defaultBufSize filename $= enumLinesBS $ procFile
run resultIter
One benefit this gives you is extra composability. You can create transformers as you like, and just combine them with zip. You can even run the consumers in parallel if you like (although only if you're working in the IO
monad, and probably not worth it unless the consumers do a lot of work) by changing to this:
import Data.Iteratee.Parallel
parProcFile = I.zip (parI $ atIter =$ stream2list) (parI $ otIter =$ stream2list)
The result of doing so isn't the same as a single for-loop - this will still perform multiple traversals of the data. However, the traversal pattern has changed. This will load a certain amount of data at once (defaultBufSize
bytes) and traverse that chunk multiple times, storing partial results as necessary. After a chunk has been entirely consumed, the next chunk is loaded and the old one can be garbage collected.
Hopefully this will demonstrate the difference:
Data.List.zip:
x1 x2 x3 .. x_n
x1 x2 x3 .. x_n
Data.Iteratee.zip:
x1 x2 x3 x4 x_n-1 x_n
x1 x2 x3 x4 x_n-1 x_n
If you're doing enough work that parallelism makes sense this isn't a problem at all. Due to memory locality, the performance is much better than multiple traversals over the entire input as Data.List.zip
would make.
The beautiful solution
If a single-traversal solution really does make the most sense, you might be interested in Max Rabkin's Beautiful Folding post, and Conal Elliott's followup work (this too). The essential idea is that you can create data structures to represent folds and zips, and combining these lets you create a new, combined fold/zip function that only needs one traversal. It's maybe a little advanced for a Haskell beginner, but since you're thinking about the problem you may find it interesting or useful. Max's post is probably the best starting point.
atoms
andsheets
as accumulator variables and return as tuple at the end. – Ingoalmost
. If you do not know the number and kind of lists you need to produce, you can't do it in one function. It's that simple. Remember, your function needs to have a type. If, OTOH, you know it will always be [a] lists for one and the same a for all extraction functions, you could return a list of lists or a map of lists. – Ingodata ParseResult = PAtom Atom | PSheet Sheet
, map aB.ByteString -> ParseResult
, then define(patoms, rest) = partition isAtom parseRes
andatoms = fromPAtom patoms
andsheets = fromPSheet rest
? – Ptival