Scenario: I have a ~900mb text file that is formatted as follows
...
Id: 109101
ASIN: 0806978473
title: The Beginner's Guide to Tai Chi
group: Book
salesrank: 672264
similar: 0
categories: 3
|Books[283155]|Subjects[1000]|Sports[26]|Individual Sports[16533]|Martial Arts[16571]|General[16575]
|Books[283155]|Subjects[1000]|Sports[26]|Individual Sports[16533]|Martial Arts[16571]|Taichi[16583]
|Books[283155]|Subjects[1000]|Sports[26]|General[11086921]
reviews: total: 2 downloaded: 2 avg rating: 5
2000-4-4 cutomer: A191SV1V1MK490 rating: 5 votes: 0 helpful: 0
2004-7-10 cutomer: AVXBUEPNVLZVC rating: 5 votes: 0 helpful: 0
(----- empty line ------)
Id :
and want to parse the information from it.
Problem: As a first step (and because I need it for another context) I want to process the file line by line and then collect the "chunks" belonging to one product together and then process them seperately with other logic.
So the plan is the following:
- Define a source that represents the text file
- Define a conduit (?) that takes one line each from that source and...
- ... passes it to some other components.
Now, I am trying to adapt the following example:
doStuff = do
writeFile "input.txt" "This is a \n test." -- Filepath -> String -> IO ()
runConduitRes -- m r
$ sourceFileBS "input.txt" -- ConduitT i ByteString m () -- by "chunk"
.| sinkFile "output.txt" -- FilePath -> ConduitT ByteString o m ()
readFile "output.txt"
>>= putStrLn
So sourceFileBS "input.txt" is of type ConduitT i ByteString m (), that is, a conduit with
- input type
i - output type
ByteStream - monad type
t - result type
().
sinkFile streams all incoming data into the given file. sinkFile "output.txt" is a conduit with input type ByteStream.
What I want now is to process the input source line-by-line, that is, pass on only one line each downstream. In pseudocode:
sourceFile "input.txt"
splitIntoLines
yieldMany (?)
other stuff
How do I do that?
What I currently have is
copyFile = do
writeFile "input.txt" "This is a \n test." -- Filepath -> String -> IO ()
runConduitRes -- m r
(lineC $ sourceFileBS "input.txt") -- ConduitT i ByteString m () -- by "chunk"
.| sinkFile "output.txt" -- FilePath -> ConduitT ByteString o m ()
readFile "output.txt"
>>= putStrLn --
but that gives the following type error:
* Couldn't match type `bytestring-0.10.8.2:Data.ByteString.Internal.ByteString'
with `Void'
Expected type: ConduitT
()
Void
(ResourceT
(ConduitT
a0 bytestring-0.10.8.2:Data.ByteString.Internal.ByteString m0))
()
Actual type: ConduitT
()
bytestring-0.10.8.2:Data.ByteString.Internal.ByteString
(ResourceT
(ConduitT
a0 bytestring-0.10.8.2:Data.ByteString.Internal.ByteString m0))
()
* In the first argument of `runConduitRes', namely
`(lineC $ sourceFileBS "input.txt")'
In the first argument of `(.|)', namely
`runConduitRes (lineC $ sourceFileBS "input.txt")'
In a stmt of a 'do' block:
runConduitRes (lineC $ sourceFileBS "input.txt")
.| sinkFile "output.txt"
|
28 | (lineC $ sourceFileBS "input.txt") -- ConduitT i ByteString m () -- by "chunk"
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This makes me believe that the problem now is that the first conduit in line does not have an input type compatible with runConduitRes.
I just cant make sense of it and really need a hint.
Thanks a lot in advance.
runConduitRes $ (lineC $ ...) .| .... Otherwise, you are passing two arguments torunConduitRes, the first one being the functionlineC. - chirunConduitRes (lineC $ sourceFileBS "input.txt") .| sinkFile "output.txt". I think you might be missing a$afterrunConduitRes: otherwise you are trying to run(lineC $ sourceFileBS "input.txt")instead of the whole pipeline. - danidiaz