5
votes

So I have an existential data type with a single strict field:

data Uncurry (a :: i -> j -> *) (z :: (i,j)) =
  forall x y. z ~ '(x,y) => Uncurry !(a x y) 

Experimentation using unsafeSizeof (stolen from this answer) leads me to believe that it can be zero memory-overhead:

λ p = (0, '\0') :: (Int, Char)
λ q = Uncurry p
λ unsafeSizeof p
10
λ unsafeSizeof q
10

So it seems like Uncurry is sort of acting like a newtype, being used only at compile time.

This makes sense to me, as the equality assertion doesn't require a dictionary to be carted about.

Is that a valid interpretation? Do I have any guarantees of that from GHC (or the Haskell report), or did I just luck out?

2
It would be very nice to have this optimization. See ghc.haskell.org/trac/ghc/wiki/NewtypeOptimizationForGADTS. Unfortunately, the experts haven't figured out how to make everything work right yet.dfeuer

2 Answers

6
votes

data is never transformed to newtype. Uncurry does add a new closure, and a pointer for the ~ dictionary is actually also carted around, as of GHC 8.0.2. Hence, Uncurry has a closure with three words.

unsafeSizeof is incorrect, since Array# stores its size in words, while ByteArray# stores its size in bytes, so sizeofByteArray# (unsafeCoerce# ptrs) returns the number of words rather than the intended number of bytes. A correct version would look like this on 64 bit systems:

unsafeSizeof :: a -> Int
unsafeSizeof !a =
  case unpackClosure# a of
    (# x, ptrs, nptrs #) ->
      I# (8# +# sizeofArray# nptrs *# 8# +# sizeofByteArray# ptrs)

But note that unsafeSizeof only gives us the size of the topmost closure. So, the closure size of any boxed tuple will be 24, which coincides with the closure size of Uncurry t, since Uncurry has an info pointer, a useless pointer for ~, and a pointer for the tuple field. This coincidence also holds with the previous buggy unsafeSizeof implementation. But the total size of Uncurry t is greater than that of t.

5
votes

Edited to fix some details re: quads being 8 bytes and explaining the static link field.

I think unsafeSizeOf is inaccurate and you're misinterpreting its output. Note that it is intended to show the memory usage for the top-level closure only, not the total space usage of the object. What you're seeing, I think, is that q requires 10 bytes in addition to the tuple p (while p requires 10 bytes in addition to the boxed Char and boxed Int). Moreover, my tests indicate that the top-level constructors actually require 24 bytes each (on a 64-bit architecture), even though unsafeSizeOf reports 10 for me, too.

In particular, if I compile the following test program with stack ghc -- -fforce-recomp -ddump-asm -dsuppress-all -O2 ZeroMemory.hs using GHC 8.0.2:

{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}

module ZeroMemory where

data Uncurry (a :: i -> j -> *) (z :: (i, j)) =
  forall x y . z ~ '(x,y) => Uncurry !(a x y)

q :: Uncurry (,) '(Int, Char)
q = Uncurry (0, '\0')

r :: Uncurry (,) '(Int, Char)
r = Uncurry (1, '\1')

then the memory footprint for the top-level q closure looks like:

q_closure:
    .quad   Uncurry_static_info
    .quad   $s$WUncurry_$d~~_closure+1
    .quad   q1_closure+1
    .quad   3

Note that each .quad here is actually 8 bytes; it's a "quad" of old-style 16-bit "words". I believe the final quad here, with value 3, is the "static link field" described in the GHC implementation commentary and so doesn't apply to "typical" heap allocation objects.

So, ignoring this final field, the total size of the top-level q closure is 24 bytes, and it refers to the q1_closure which represents the contained tuple:

q1_closure:
    .quad   (,)_static_info
    .quad   q3_closure+1
    .quad   q2_closure+1
    .quad   3

for another 24 bytes.

The q2 and q3 closures are the boxed Int and Char and so take up two quads (16 bytes) each. So, q takes up a total of 10 quads, or 80 bytes. (I included r as a sanity check to make sure I wasn't misidentifying any shared info.)

A p tuple by itself would have a memory footprint equivalent to the q1_closure, so 7 quads or 56 bytes.