Semantics and Representation
The two types cannot be used interchangeable.
The type
data Bool = False | True
is a lifted type. This means that a variable x :: Bool
can still be unevaluated (e.g. a thunk). It will therefore be represented as a pointer.
In contrast
(# (
is unlifted, and really can only have the two values, and will be represented as just a machine integer.
Space efficiency
This does, however not mean that the latter is more space efficient: Because True
and False
are nullary constructors (they take no arguments), they exist once and for all in the static code of your program, and the pointers just point to them. So the cost is one machine word in all cases.
Run time efficiency
The unboxed variant may be slightly more efficient than Bool
:
For Bool
, the code first has to check is this already evaluated?, and then it can branch on which constructor it is.
The branching is pretty efficient: The code does not have to follow the pointer for that: The “pointer tag” in the low few bits of the pointer address will indicate which constructor it is. There is a slight cost for masking the other bits, though.
In the unboxed variant, well, it's just 0
or 1
, and no thunk check is needed.
Things that might be true one day
jberryman says: as of Jan 2021 the following don't appear to be implemented yet, but this is how things might one day work.
Unboxing in data types
If you define a data type like this
data Foo = Foo {-
you should get exactly the same result as if you’d write
data Foo = Foo (
as that was the main purpose of the unboxed sum extension.
Unboxing in functions
If you define a function that is strict in a Bool
argument, e.g.
foo True = "Hello"
foo False = "World!"
then I could imagine (but did not check) that the compiler optimizes that into a worker that takes a (# (# #) | (# #) #)
and a wrapper that takes care of the thunk-check. The wrapper might then inline in use-sites where foo
is used, and everything might end up in terms of (# (# #) | (# #) #)
.
Conclusion
Don’t bother.