I am just starting to learn Idris, and I figured a nice little project to start off would be to implement finite sequences as 2-3 finger trees. Each internal node in a tree needs to be annotated at run time with the total number of elements stored below it in order to support fast splitting and indexing. This size information also needs to be managed at compile time in order to (eventually) prove that splitting with appropriate indices and zipping a sequence with another sequence are total operations.
I can think of two ways to deal with this:
What I'm currently doing, having written a tiny fraction of the total necessary code: handle the sizes entirely in the types, and then use something like
proof {intros; exact s;}
to get at them. I don't know what, if any, horrible efficiency consequences this might have. Among potentials in my mind: a) Unnecessarily storing a size with each leaf node. b) I think this unlikely, but it would be real bad if it insisted on calculating sizes from the bottom up rather than lazily from the top down.Include an explicit size field in each node constructor, along with a proof that the number in the size field matches the size the type system requires. This approach seems extremely awkward. On the plus side, I should be able to be quite certain that the type-level numbers and the equality proofs get erased, leaving just one number per internal node at run time.
Which of these, if either, is the right way to go?
The current code
Please feel free to give style tips, and maybe to explain how to inline the size code. I could only figure out what to do interactively, but it seems a bit weird to have proofs at the bottom for such simple things.
data Tree23 : Nat -> Nat -> Type -> Type where
Elem : a -> Tree23 0 1 a
Node2 : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) ->
Tree23 (S d) (s1 + s2) a
Node3 : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) -> Lazy (Tree23 d s3 a) ->
Tree23 (S d) (s1 + s2 + s3) a
size23 : Tree23 d s a -> Nat
size23 t = ?size23RHS
data Digit : Nat -> Nat -> Type -> Type where
One : Lazy (Tree23 d s a) -> Digit d s a
Two : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) -> Digit d (s1+s2) a
Three : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) ->
Lazy (Tree23 d s3 a) -> Digit d (s1+s2+s3) a
Four : Lazy (Tree23 d s1 a) -> Lazy (Tree23 d s2 a) ->
Lazy (Tree23 d s3 a) -> Lazy (Tree23 d s4 a) -> Digit d (s1+s2+s3+s4) a
sizeDig : Digit d s a -> Nat
sizeDig t = ?sizeDigRHS
data FingerTree : Nat -> Nat -> Type -> Type where
Empty : FingerTree d 0 a
Single : Tree23 d s a -> FingerTree d s a
Deep : Digit d spr a -> Lazy (FingerTree (S d) sm a) -> Digit d ssf a ->
FingerTree d (spr + sm + ssf) a
data Seq' : Nat -> Type -> Type where
MkSeq' : FingerTree 0 n a -> Seq' n a
Seq : Type -> Type
Seq a = (n ** Seq' n a)
---------- Proofs ----------
try.sizeDigRHS = proof
intros
exact s
try.size23RHS = proof
intros
exact s
Edit
Another option I've explored a bit is to try to separate the data structure from its validity. This leads to the following:
data Tree23 : Nat -> Type -> Type where
Elem : a -> Tree23 0 a
Node2 : Nat -> Lazy (Tree23 d a) -> Lazy (Tree23 d a) ->
Tree23 (S d) a
Node3 : Nat -> Lazy (Tree23 d a) -> Lazy (Tree23 d a) -> Lazy (Tree23 d a) ->
Tree23 (S d) a
size23 : Tree23 d a -> Nat
size23 (Elem x) = 1
size23 (Node2 s _ _) = s
size23 (Node3 s _ _ _) = s
data Valid23 : Tree23 d a -> Type where
ElemValid : Valid23 (Elem x)
Node2Valid : Valid23 x -> Valid23 y -> Valid23 (Node2 (size23 x + size23 y) x y)
Node3Valid : Valid23 x -> Valid23 y -> Valid23 z
-> Valid23 (Node3 (size23 x + size23 y + size23 z) x y z)
data Digit : Nat -> Type -> Type where
One : Lazy (Tree23 d a) -> Digit d a
Two : Lazy (Tree23 d a) -> Lazy (Tree23 d a) -> Digit d a
Three : Lazy (Tree23 d a) -> Lazy (Tree23 d a) ->
Lazy (Tree23 d a) -> Digit d a
Four : Lazy (Tree23 d a) -> Lazy (Tree23 d a) ->
Lazy (Tree23 d a) -> Lazy (Tree23 d a) -> Digit d a
data ValidDig : Digit d a -> Type where
OneValid : Valid23 x -> ValidDig (One x)
TwoValid : Valid23 x -> Valid23 y -> ValidDig (Two x y)
ThreeValid : Valid23 x -> Valid23 y -> Valid23 z -> ValidDig (Three x y z)
FourValid : Valid23 x -> Valid23 y -> Valid23 z -> Valid23 w -> ValidDig (Four x y z w)
sizeDig : Digit d a -> Nat
sizeDig (One x) = size23 x
sizeDig (Two x y) = size23 x + size23 y
sizeDig (Three x y z) = size23 x + size23 y + size23 z
sizeDig (Four x y z w) = (size23 x + size23 y) + (size23 z + size23 w)
data FingerTree : Nat -> Type -> Type where
Empty : FingerTree d a
Single : Tree23 d a -> FingerTree d a
Deep : Nat -> Digit d a -> Lazy (FingerTree (S d) a) -> Digit d a ->
FingerTree d a
sizeFT : FingerTree d a -> Nat
sizeFT Empty = 0
sizeFT (Single x) = size23 x
sizeFT (Deep k x y z) = k
data ValidFT : FingerTree d a -> Type where
ValidEmpty : ValidFT Empty
ValidSingle : Valid23 x -> ValidFT (Single x)
ValidDeep : ValidDig pr -> ValidFT m -> ValidDig sf ->
ValidFT (Deep (sizeDig pr + sizeFT m + sizeDig sf) pr m sf)
record Seq : Type -> Type where
MkSeq : FingerTree 0 a -> Seq a
data ValidSeq : Seq a -> Type where
MkValidSeq : ValidFT t -> ValidSeq (MkSeq t)
Then each function is accompanied by a (separate) proof of its validity.
I kind of like the way this approach separates "code" from "proofs", but I've run into a couple problems with it:
While the "code" gets easier, the proofs seem to get a good bit harder to construct. I imagine a large part of that is probably a result of my not being familiar with the system.
I haven't actually gotten near the point where I will write this code, but the indexing, splitting, and zipping functions will all have to insist on getting a proof of the validity of their input(s). It seems a bit weird for some functions to work with just sequences, and others to insist on proofs, but maybe that's just me.