84
votes

While the general opinion of the Haskell community seems to be that it's always better to use Text instead of String, the fact that still the APIs of most of maintained libraries are String-oriented confuses the hell out of me. On the other hand, there are notable projects, which consider String as a mistake altogether and provide a Prelude with all instances of String-oriented functions replaced with their Text-counterparts.

So are there any reasons for people to keep writing String-oriented APIs except backwards- and standard Prelude-compatibility and the "switch-making intertia"? Are there possibly any other drawbacks to Text as compared to String?

Particularly, I'm interested in this because I'm designing a library and trying to decide which type to use to express error messages.

5
How hard will it be to support both?Daniel Wagner
@Vektorweg I'd argue. Since String is just an alias for a list of Chars, it's natural that it has different performance characteristics from a monolithic data, which Text is. Both types are not at all of compiler's concern, since they aren't primitive and are defined in libraries.Nikita Volkov
@DanielWagner Wouldn't it turn to be a not much motivated complication? Anyway, the question is about a general approach, if there was a shared pattern of supporting both types throughout libraries that would be considerable.Nikita Volkov
@Vektorweg There's a good blog post about the Sufficiently Smart Compiler. (I just now realised it mentions GHC as well.)kqr
Go for Text ! Future generations may one day benefit from a String-free world if we stop adding String-dependent code.Titou

5 Answers

33
votes

My unqualified guess is that most library writers don't want to add more dependencies than necessary. Since strings are part of literally every Haskell distribution (it's part of the language standard!), it is a lot easier to get adopted if you use strings and don't require your users to sort out Text distributions from hackage.

It's one of those "design mistakes" that you just have to live with unless you can convince most of the community to switch over night. Just look at how long it has taken to get Applicative to be a superclass of Monad – a relatively minor but much wanted change – and imagine how long it would take to replace all the String things with Text.


To answer your more specific question: I would go with String unless you get noticeable performance benefits by using Text. Error messages are usually rather small one-off things so it shouldn't be a big problem to use String.

On the other hand, if you are the kind of ideological purist that eschews pragmatism for idealism, go with Text.


* I put design mistakes in scare quotes because strings as a list-of-chars is a neat property that makes them easy to reason about and integrate with other existing list-operating functions.

23
votes

If your API is targeted at processing large amounts of character oriented data and/or various encodings, then your API should use Text.

If your API is primarily for dealing with small one-off strings, then using the built-in String type should be fine.

Using String for large amounts of text will make applications using your API consume significantly more memory. Using it with foreign encodings could seriously complicate usage depending on how your API works.

String is quite expensive (at least 5N words where N is the number of Char in the String). A word is same number of bits as the processor architecture (ex. 32 bits or 64 bits): http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.html

12
votes

There are at least three reasons to use [Char] in small projects.

  1. [Char] does not rely on any arcane staff, like foreign pointers, raw memory, raw arrays, etc that may work differently on different platforms or even be unavailable altogether

  2. [Char] is the lingua franka in haskell. There are at least three 'efficient' ways to handle unicode data in haskell: utf8-bytestring, Data.Text.Text and Data.Vector.Unboxed.Vector Char, each requiring dealing with extra package.

  3. by using [Char] one gains access to all power of [] monad, including many specific functions (alternative string packages do try to help with it, but still)

Personally, I consider utf16-based Data.Text one of the most questionable desicions of the haskell community, since utf16 combines flaws of both utf8 and utf32 encoding while having none of their benefits.

5
votes

I wonder if Data.Text is always more efficient than Data.String???

"cons" for instance is O(1) for Strings and O(n) for Text. Append is O(n) for Strings and O(n+m) for strict Text's. Likewise,

    let foo = "foo" ++ bigchunk
        bar = "bar" ++ bigchunk

is more space efficient for Strings than for strict Texts.

Other issue not related to efficiency is pattern matching (perspicuous code) and lazyness (predictably per-character in Strings, somehow implementation dependent in lazy Text).

Text's are obviously good for static character sequences and for in-place modification. For other forms of structural editing, Data.String might have advantages.

4
votes

I do not think there is a single technical reason for String to remain. And I can see several ones for it to go.

Overall I would first argue that in the Text/String case there is only one best solution :

  • String performances are bad, everyone agrees on that

  • Text is not difficult to use. All functions commonly used on String are available on Text, plus some useful more in the context of strings (substitution, padding, encoding)

  • having two solutions creates unnecessary complexity unless all base functions are made polymorphic. Proof : there are SO questions on the subject of automatic conversions. So this is a problem.

So one solution is less complex than two, and the shortcomings of String will make it disappear eventually. The sooner the better !