24
votes

Today I asked GHC to compile an 8MB Haskell source file. GHC thought about it for about 6 minutes, swallowing almost 2GB of RAM, and then finally gave up with an out-of-memory error.

[As an aside, I'm glad GHC had the good sense to abort rather than floor my whole PC.]

Basically I've got a program that reads a text file, does some fancy parsing, builds a data structure and then uses show to dump this into a file. Rather than include the whole parser and the source data in my final application, I'd like to include the generated data as a compile-time constant. By adding some extra stuff to the output from show, you can make it a valid Haskell module. But GHC apparently doesn't enjoy compiling multi-MB source files.

(The weirdest part is, if you just read the data back, it actually doesn't take much time or memory. Strange, considering that both String I/O and read are supposedly very inefficient...)

I vaguely recall that other people have had trouble with getting GHC to compile huge files in the past. FWIW, I tried using -O0, which speeded up the crash but did not prevent it. So what is the best way to include large compile-time constants in a Haskell program?

(In my case, the constant is just a nested Data.Map with some interesting labels.)

Initially I thought GHC might just be unhappy at reading a module consisting of one line that's eight million characters long. (!!) Something to do with the layout rule or such. Or perhaps that the deeply-nested expressions upset it. But I tried making each subexpression a top-level identifier, and that was no help. (Adding explicit type signatures to each one did appear to make the compiler slightly happier, however.) Is there anything else I might try to make the compiler's job simpler?

In the end, I was able to make the data-structure I'm actually trying to store much smaller. (Like, 300KB.) This made GHC far happier. (And the final application much faster.) But for future reference, I'd be interested to know what the best way to approach this is.

2
I too have been caught out by assuming it would be better to put my data in the source code, only to find it was much faster to read it from file at runtime.AndrewC
Or, if you'd just like to bundle the data and program in one file, you can include it as a string constant which is than merely read in, with no extra file IO. GHC will compile files with 50MB worth of string on my laptop.leftaroundabout
I remember that GHC always had problems compiling long literal lists and such. Can't find a recentish ticket or mailing list thread going into any kind of details, though.Daniel Fischer
seems like Data.Binary would be the way to go, but the datatype is hidden, my ghc-foo isn't strong enough to see how to make Data.Map k v an instance of Data.Binary. What is the type of your key, perhaps Data.IntMap would work. Also, what version of ghc are you using, looks like there's a big re-write of Data.Map.* in 7.6.1 (released tomorrow) haskell.org/pipermail/haskell-cafe/2012-May/101082.html.ja.
@ja, FYI, Data.Binary now has an instance for Data.Map. I don't know when that was added.dfeuer

2 Answers

5
votes

Your best bet is probably to compile a string representation of your value into the executable. To do this in a clean manner, please refer to my answer in a previous question.

To use it, simply store your expression in myExpression.exp and do read [litFile|myExpression.exp|] with the QuasiQuotes extension enabled, and the expression will be "stored as a string literal" in the executable.


I tried doing something similar for storing actual constants, but it fails for the same reason that embedding the value in a .hs file would. My attempt was:

Verbatim.hs:

module Verbatim where

import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Language.Haskell.Meta.Parse

readExp :: String -> Q Exp
readExp = either fail return . parseExp

verbatim :: QuasiQuoter
verbatim = QuasiQuoter { quoteExp = readExp }

verbatimFile :: QuasiQuoter
verbatimFile = quoteFile verbatim

Test program:

{-# LANGUAGE QuasiQuotes #-}
module Main (main) where

import Verbatim

main :: IO ()
main = print [verbatimFile|test.exp|]

This program works for small test.exp files, but fails already at about 2MiB on this computer.

1
votes

There's a simple solution — your literal should have type ByteString. See https://github.com/litherum/publicsuffixlist/pull/1 for details.