16
votes

The two languages where I have used symbols are Ruby and Erlang and I've always found them to be extremely useful.

Haskell does have algebraic datatypes, but I still think symbols would be mighty convenient. An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".

The syntactic sugar for atoms can be minor - :something or <something> is an atom. All atoms are instances of a Type called Atom which derives Show and Eq. You can then use it for more descriptive error codes, for example

type ErrorCode = Atom
type Message = String
data Error = Error ErrorCode Message
loginError = Error :redirect "Please login first"

In this case :redirect is more efficient than using a string ("redirect") and easier to understand than an integer (404).

The benefit may seem minor, but I say it is worth adding atoms as a language feature (or at least a GHC extension).

So why have symbols not been added to the language? Or am I thinking about this the wrong way?

5
Wouldn't error codes and such be a situation where you'd want a predefined set of values, rather than allowing arbitrary stuff that might be nonsense, though? Presumably there'd be code elsewhere handling the errors, and you'd want to make sure you only give it things it knows how to deal with.C. A. McCann
Not necessarily. I might want to use the error codes as I come up with them, without having to define the entire set of errors as a data type first. Handler code can simply handle the cases it wants to handle, while lumping the rest in a default handler.Anupam Jain
That doesn't seem terribly idiomatic for Haskell. But even so, I'd think the library sclv mentioned would suffice, so I guess I'm still not seeing why it would make much difference.C. A. McCann

5 Answers

20
votes

I agree with camccann's answer that it's probably missing mainly because it would have to be baked quite deeply into the implementation and it is of too little use for this level of complication. In Erlang (and Prolog and Lisp) symbols (or atoms) usually serve as special markers and serve mostly the same notion as a constructor. In Lisp, the dynamic environment includes the compiler, so it's partly also a (useful) compiler concept leaking into the runtime.

The problem is the following, symbol interning is impure (it modifies the symbol table). Because we never modify an existing object it is referentially transparent, however, but if implemented naïvely can lead to space leaks in the runtime. In fact, as currently implemented in Erlang you can actually crash the VM by interning too many symbols/atoms (current limit is 2^20, I think), because they can never get garbage collected. It's also difficult to implement in a concurrent setting without a huge lock around the symbol table.

Both problems can be (and have been) solved, however. For example, see Erlang EEP 20. I use this technique in the simple-atom package. It uses unsafePerformIO under the hood, but only in (hopefully) rare cases. It could still use some help from the GC to perform an optimisation similar to indirection shortening. It also uses quite a few IORefs internally which isn't too great for performance and memory usage.

In summary, it can be done but implementing it properly is non-trivial. Compiler writers always weigh the power of a feature against its implementation and maintenance efforts, and it seems like first-class symbols lose out on this one.

14
votes

I think the simplest answer is that, of the things Lisp-style symbols (which is where both Ruby and Erlang got the idea, I believe) are used for, in Haskell most are either:

  • Already done in some other fashion--e.g. a data type with a bunch of nullary constructors, which also behave as "convenient names for integers".

  • Awkward to fit in--things that exist at the level of language syntax instead of being regular data usually have more type information associated with them, but symbols would have to either be distinct types from each other (nearly useless without some sort of lightweight ad-hoc sum type) or all the same type (in which case they're barely different from just using strings).

Also, keep in mind that Haskell itself is actually a very, very small language. Very little is "baked in", and of the things that are most are just syntactic sugar for other primitives. This is a bit less true if you include a bunch of GHC extensions, but GHC with -XAndTheKitchenSinkToo is not the same language as Haskell proper.

Also, Haskell is very amenable to pseudo-syntax and metaprogramming, so there's a lot you can do even without having it built in. Particularly if you get into TH and scary type metaprogramming and whatever else.

So what it mostly comes down to is that most of the practical utility of symbols is already available from other features, and the stuff that isn't available would be more difficult to add than it's worth.

9
votes

Atoms aren't provided by the language, but can be implemented reasonably as a library:

http://hackage.haskell.org/package/simple-atom

There are a few other libs on hackage, but this one looks the most recent and well-maintained.

3
votes

Haskell uses type constructors* instead of symbols so that the set of symbols a function can take is closed, and can be reasoned about by the type system. You could add symbols to the language, but it would put you in the same place that using strings would - you'd have to check all possible symbols against the few with known meanings at runtime, add error handling all over the place, etc. It'd be a big workaround for all the compile-time checking.

The main difference between strings and symbols is interning - symbols are atomic and can be compared in constant time. Both are types with an essentially infinite number of distinct values, though, and against the grain of Haskell's specifying arguments and results with finite types.

  • I'm more familiar with OCaml than Haskell, so "type constructor" may not be the right term. Things like None or Just 3.
1
votes

An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".

Use Enum instead.

data FileType = GZipped | BZipped | Plain
  deriving Enum

descr ft  =  ["compressed with gzip",
              "compressed with bzip2",
              "uncompressed"] !! fromEnum ft