Compiling large data structures in Haskell

Question

I have a CSV file with stock trading history, its size is 70 megabytes. I want to run my program on it, but do not want to wait for 30 seconds every start.

1. Just translate CSV file into Haskell source file like this:

From                       | TO
-------------------------------------------
1380567537,122.166,2.30243 | history = [
...                        |       (1380567537,122.166,2.30243)
...                        |     , ...
...                        |     ]

2. Use Template Haskell to parse file compile-time.

Trying first approach I found my GHC eat up 12gb of memory after 3 hours of trying to compile one list (70 mb source code).

So is TH the single available approach? Or I can just use hard-coded large data structure in source file? And why GHC can't compile file? Does it go to combinatorial explosion because of complex optimizations or something?

Using fast libraries like bytestring and attoparsec will reduce time to much lesser than 30 seconds. — Satvik
Don, yes, it is related, but answer on that question was about inserting bytestring-literals into the code and then converting it into a structure; but I wanted to get already compiled structure in my program. — Kirill Taran
jtobin, question is not about it, but I will try it. Thank you anyway. — Kirill Taran

Tikhon Jelvis Tikhon Jelvis · Accepted Answer · 2013-10-01T06:15:27

Hard-coding so much data is not a common use-case, so it isn't surprising the compiler doesn't handle it well.

A better solution would be to put the data into some format that is easier to read than CSV. For example, consider writing a program that parses your CSV file and serializes the resulting structure using some package like cereal. Then your main program can read the binary file, which should be much faster than your CSV file.

This approach has the added benefit that running your program on new data will be easier and won't require recompiling.

Compiling large data structures in Haskell

1 Answers