4
votes

I know Data.Text is a much more efficient way of storing string data than String = [Char]. However, it seems that a number of functions I see in libraries expect a String passed to them. A linked list of Chars seems very inefficient to read, considering pointers will take up more space than the string itself. Besides list fusion (which may not always be possible), are there any optimizations that GHC makes to the storage of [Char]'s, and does it apply similar principles to other lists?

3
I doubt there are too many string-specific optimizations--it seems anything you could do to improve a list of Chars could also be done to a list of Ints or whatever else you want.Tikhon Jelvis

3 Answers

5
votes

The reason why all the base library functions use String instead of a more efficient type is that the text library needed for Text is not part of the base library. However, the text library provides its own variants of the various input/output functions. You can find them in Data.Text.IO.

Also note that for efficient I/O you would normally use one of the modern abstractions likes conduits, iteratees or pipes.

2
votes

Under GHC, String uses 5 words per code point in the average case. This is, however, mitigated by the fact that the runtime preallocates characters in the ASCII range.

-1
votes

Here is the answer.

Bytestring's are sort of like lists, only each element is one byte (or 8 bits) in size. The way they handle laziness is also different.