Today I asked GHC to compile an 8MB Haskell source file. GHC thought about it for about 6 minutes, swallowing almost 2GB of RAM, and then finally gave up with an out-of-memory error.
[As an aside, I'm glad GHC had the good sense to abort rather than floor my whole PC.]
Basically I've got a program that reads a text file, does some fancy parsing, builds a data structure and then uses show
to dump this into a file. Rather than include the whole parser and the source data in my final application, I'd like to include the generated data as a compile-time constant. By adding some extra stuff to the output from show
, you can make it a valid Haskell module. But GHC apparently doesn't enjoy compiling multi-MB source files.
(The weirdest part is, if you just read
the data back, it actually doesn't take much time or memory. Strange, considering that both String
I/O and read
are supposedly very inefficient...)
I vaguely recall that other people have had trouble with getting GHC to compile huge files in the past. FWIW, I tried using -O0
, which speeded up the crash but did not prevent it. So what is the best way to include large compile-time constants in a Haskell program?
(In my case, the constant is just a nested Data.Map
with some interesting labels.)
Initially I thought GHC might just be unhappy at reading a module consisting of one line that's eight million characters long. (!!) Something to do with the layout rule or such. Or perhaps that the deeply-nested expressions upset it. But I tried making each subexpression a top-level identifier, and that was no help. (Adding explicit type signatures to each one did appear to make the compiler slightly happier, however.) Is there anything else I might try to make the compiler's job simpler?
In the end, I was able to make the data-structure I'm actually trying to store much smaller. (Like, 300KB.) This made GHC far happier. (And the final application much faster.) But for future reference, I'd be interested to know what the best way to approach this is.
read
in, with no extra file IO. GHC will compile files with 50MB worth of string on my laptop. – leftaroundaboutData.Binary
would be the way to go, but the datatype is hidden, my ghc-foo isn't strong enough to see how to makeData.Map k v
an instance ofData.Binary
. What is the type of your key, perhapsData.IntMap
would work. Also, what version of ghc are you using, looks like there's a big re-write ofData.Map.*
in 7.6.1 (released tomorrow) haskell.org/pipermail/haskell-cafe/2012-May/101082.html. – ja.Data.Binary
now has an instance forData.Map
. I don't know when that was added. – dfeuer