I know Data.Text
is a much more efficient way of storing string data than String = [Char]
. However, it seems that a number of functions I see in libraries expect a String
passed to them. A linked list of Char
s seems very inefficient to read, considering pointers will take up more space than the string itself. Besides list fusion (which may not always be possible), are there any optimizations that GHC makes to the storage of [Char]
's, and does it apply similar principles to other lists?
4
votes
I doubt there are too many string-specific optimizations--it seems anything you could do to improve a list of Chars could also be done to a list of Ints or whatever else you want.
– Tikhon Jelvis
3 Answers
5
votes
The reason why all the base library functions use String
instead of a more efficient type is that the text library needed for Text
is not part of the base library. However, the text library provides its own variants of the various input/output functions. You can find them in Data.Text.IO
.
Also note that for efficient I/O you would normally use one of the modern abstractions likes conduits, iteratees or pipes.
2
votes
Under GHC, String
uses 5 words per code point in the average case. This is, however, mitigated by the fact that the runtime preallocates characters in the ASCII range.
-1
votes
Here is the answer.
Bytestring's are sort of like lists, only each element is one byte (or 8 bits) in size. The way they handle laziness is also different.