What's the best way to use a number at compile time and run time?

Question

I am just starting to learn Idris, and I figured a nice little project to start off would be to implement finite sequences as 2-3 finger trees. Each internal node in a tree needs to be annotated at run time with the total number of elements stored below it in order to support fast splitting and indexing. This size information also needs to be managed at compile time in order to (eventually) prove that splitting with appropriate indices and zipping a sequence with another sequence are total operations.

I can think of two ways to deal with this:

What I'm currently doing, having written a tiny fraction of the total necessary code: handle the sizes entirely in the types, and then use something like proof {intros; exact s;} to get at them. I don't know what, if any, horrible efficiency consequences this might have. Among potentials in my mind: a) Unnecessarily storing a size with each leaf node. b) I think this unlikely, but it would be real bad if it insisted on calculating sizes from the bottom up rather than lazily from the top down.
Include an explicit size field in each node constructor, along with a proof that the number in the size field matches the size the type system requires. This approach seems extremely awkward. On the plus side, I should be able to be quite certain that the type-level numbers and the equality proofs get erased, leaving just one number per internal node at run time.

Which of these, if either, is the right way to go?

The current code

Please feel free to give style tips, and maybe to explain how to inline the size code. I could only figure out what to do interactively, but it seems a bit weird to have proofs at the bottom for such simple things.

data Tree23 : Nat -> Nat -> Type -> Type where
    Elem : a -> Tree23 0 1 a
    Node2 : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) ->
            Tree23 (S d) (s1 + s2) a
    Node3 : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) -> Lazy (Tree23 d s3 a) ->
              Tree23 (S d) (s1 + s2 + s3) a

size23 : Tree23 d s a -> Nat
size23 t = ?size23RHS

data Digit : Nat -> Nat -> Type -> Type where
  One : Lazy (Tree23 d s a) -> Digit d s a
  Two : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) -> Digit d (s1+s2) a
  Three : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) ->
          Lazy (Tree23 d s3 a) -> Digit d (s1+s2+s3) a
  Four : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) ->
          Lazy (Tree23 d s3 a) -> Lazy (Tree23 d s4 a) -> Digit d (s1+s2+s3+s4) a

sizeDig : Digit d s a -> Nat
sizeDig t = ?sizeDigRHS

data FingerTree : Nat -> Nat -> Type -> Type where
  Empty : FingerTree d 0 a
  Single : Tree23 d s a -> FingerTree d s a
  Deep : Digit d spr a -> Lazy (FingerTree (S d) sm a) -> Digit d ssf a ->
         FingerTree d (spr + sm + ssf) a

data Seq' : Nat -> Type -> Type where
  MkSeq' : FingerTree 0 n a -> Seq' n a

Seq : Type -> Type
Seq a = (n ** Seq' n a)

---------- Proofs ----------

try.sizeDigRHS = proof
  intros
  exact s

try.size23RHS = proof
  intros
  exact s

Edit

Another option I've explored a bit is to try to separate the data structure from its validity. This leads to the following:

data Tree23 : Nat -> Type -> Type where
    Elem : a -> Tree23 0 a
    Node2 : Nat -> Lazy (Tree23 d a) -> Lazy (Tree23 d a) ->
            Tree23 (S d) a
    Node3 : Nat -> Lazy (Tree23 d a) -> Lazy (Tree23 d a) -> Lazy (Tree23 d a) ->
              Tree23 (S d) a

size23 : Tree23 d a -> Nat
size23 (Elem x) = 1
size23 (Node2 s _ _) = s
size23 (Node3 s _ _ _) = s

data Valid23 : Tree23 d a -> Type where
  ElemValid : Valid23 (Elem x)
  Node2Valid : Valid23 x -> Valid23 y -> Valid23 (Node2 (size23 x + size23 y) x y)
  Node3Valid : Valid23 x -> Valid23 y -> Valid23 z
    -> Valid23 (Node3 (size23 x + size23 y + size23 z) x y z)

data Digit : Nat -> Type -> Type where
  One : Lazy (Tree23 d a) -> Digit d a
  Two : Lazy (Tree23 d a) -> Lazy (Tree23 d a) -> Digit d a
  Three : Lazy (Tree23 d a) -> Lazy (Tree23 d a) ->
          Lazy (Tree23 d a) -> Digit d a
  Four : Lazy (Tree23 d a) -> Lazy (Tree23 d a) ->
          Lazy (Tree23 d a) -> Lazy (Tree23 d a) -> Digit d a

data ValidDig : Digit d a -> Type where
  OneValid : Valid23 x -> ValidDig (One x)
  TwoValid : Valid23 x -> Valid23 y -> ValidDig (Two x y)
  ThreeValid : Valid23 x -> Valid23 y -> Valid23 z -> ValidDig (Three x y z)
  FourValid : Valid23 x -> Valid23 y -> Valid23 z -> Valid23 w -> ValidDig (Four x y z w)

sizeDig : Digit d a -> Nat
sizeDig (One x) = size23 x
sizeDig (Two x y) = size23 x + size23 y
sizeDig (Three x y z) = size23 x + size23 y + size23 z
sizeDig (Four x y z w) = (size23 x + size23 y) + (size23 z + size23 w)

data FingerTree : Nat -> Type -> Type where
  Empty : FingerTree d a
  Single : Tree23 d a -> FingerTree d a
  Deep : Nat -> Digit d a -> Lazy (FingerTree (S d) a) -> Digit d a ->
         FingerTree d a

sizeFT : FingerTree d a -> Nat
sizeFT Empty = 0
sizeFT (Single x) = size23 x
sizeFT (Deep k x y z) = k

data ValidFT : FingerTree d a -> Type where
  ValidEmpty : ValidFT Empty
  ValidSingle : Valid23 x -> ValidFT (Single x)
  ValidDeep : ValidDig pr -> ValidFT m -> ValidDig sf ->
              ValidFT (Deep (sizeDig pr + sizeFT m + sizeDig sf) pr m sf)

record Seq : Type -> Type where
  MkSeq : FingerTree 0 a -> Seq a

data ValidSeq : Seq a -> Type where
  MkValidSeq : ValidFT t -> ValidSeq (MkSeq t)

Then each function is accompanied by a (separate) proof of its validity.

I kind of like the way this approach separates "code" from "proofs", but I've run into a couple problems with it:

While the "code" gets easier, the proofs seem to get a good bit harder to construct. I imagine a large part of that is probably a result of my not being familiar with the system.
I haven't actually gotten near the point where I will write this code, but the indexing, splitting, and zipping functions will all have to insist on getting a proof of the validity of their input(s). It seems a bit weird for some functions to work with just sequences, and others to insist on proofs, but maybe that's just me.

Cactus Cactus · Accepted Answer · 2015-03-27T05:07:58

Your problem can be simplified to having a type like

data Steps : Nat -> Type where
  Nil : Steps 0
  Cons : Steps n -> Steps (S n)

and wanting to write

size : Steps n -> Nat

This is very easy to do since the implicitly-quantified arguments (n in this case) are passed to size as implicit arguments! So the above type of size is the same as

size : {n : _} -> Steps n -> Nat

which means it can be defined as

size {n} _ = n

What's the best way to use a number at compile time and run time?

The current code

Edit

1 Answers