14
votes

To help teach myself C++, I'm working on a red-black tree implementation. Coming from Haskell, I, thought it'd be interesting to see if I could enforce the properties of a red-black tree statically in C++'s type system:

  1. A node is either red or black.
  2. The root is black [...]
  3. All leaves (NIL) are black.
  4. If a node is red, then both its children are black.
  5. Every path from a given node to any of its descendant NIL nodes contains the same number of black nodes. [...]

I figured out how to craft the types for the nodes of the tree to satisfy constraints 1, 3, 4, and 5 using templates:

template <typename Key, typename Value>
class RedBlackTree {
private:
  enum class color { Black, Red };

  // [1. A node is either red or black]
  template <color Color, size_t Height>
  struct Node {};

  // [3. All leaves are black]
  struct Leaf : public Node<color::Black, 0> {};

  template <color Left, color Right, size_t ChildHeight>
  struct Branch {
  public:
    template <color ChildColor>
    using Child = unique_ptr<Node<ChildColor, ChildHeight>>;

    Key key;
    Value value;
    Child<Left> left;
    Child<Right> right;

    Branch(Key&& key, Value&& value, Child<Left> left, Child<Right> right) :
      key { key }, value { value }, left { left }, right { right } {}
  };

  // [4. If a node is red, then both its children are black.]
  // [5. Every path from a given node to any of its descendant NIL nodes contains 
  //     the same number of black nodes.]
  template <size_t Height>
  struct RedBranch: public Node<color::Red, Height>
                  , public Branch<color::Black, color::Black, Height> {
  public:
    using RedBlackTree::Branch;
  };

  // [5. Every path from a given node to any of its descendant NIL nodes contains 
  //     the same number of black nodes.]
  template <size_t Height, color Left, color Right>
  struct BlackBranch: public Node<color::Black, Height>
                    , public Branch<Left, Right, Height-1> {
  public:
    using RedBlackTree::Branch;
  };

  // ...
};

Where I'm stymied is giving the root pointer that will be stored in the RedBlackTree instance a type that satisfied property 2 but is still useful.

What I want is something like:

template <typename Key, typename Value>
class RedBlackTree {
  //...
  unique_ptr<Node<color::Black,?>> root = std::make_unique<Leaf>();
  //...
}

(to borrow syntax from Java), so I can wildcard over the height of the tree. This of course, doesn't work.

I could get my code to compile if I did

template <typename Key, typename Value, size_t TreeHeight>
class RedBlackTree {
  //...
  unique_ptr<Node<color::Black,TreeHeight>> root = std::make_unique<Leaf>();
  //...
}

But this isn't the type I want for the tree - I don't want the type of the tree itself to reflect its height, otherwise the type of my tree might change when I insert a key-value pair. I want to be able to update my root to contain a pointer to a black Node of any height.

Back in haskell-land, I would have solved this problem using existential quantification:

data Color = Black | Red

data Node (color :: Color) (height :: Nat) key value where
  Leaf :: Node 'Black 0 key value
  BlackBranch :: Branch left right height key value -> Node 'Black (height+1) key value
  RedBranch :: Branch 'Black 'Black height key value -> Node 'Red height key value

data Branch (left :: Color) (right :: Color) (childHeight :: Nat) key value = Branch
  { left  :: Node left childHeight key value
  , right :: Node right childHeight key value
  , key   :: key
  , value :: value
  }

data RedBlackTree key value where
  RedBlackTree :: { root :: Node 'Black height key value } -> RedBlackTree key value

Is there an equivalent concept in C++14 (or maybe C++17), or an alternative way I could write my struct defintions to be able to give root a useful and correct type?

2
Just to make sure I understand: do you want TreeHeight to be "erased" from the type of the parent node (to not be part of its type)? If so, you need some sort of run-time dispatchVittorio Romeo
I am not a C++ expert, but I think C++ needs to instantiate templates at compile time in all the forms that will possibly used at runtime. Since the tree height is (pragmatically) unbounded, we can not hope to instantiate all its possible values in the template argument.chi
@chi, it's pragmatically bounded by a few bits more than a pointer, right? Now that may end up trying to generate code for every possible tree shape and run the compiler right out of disk space... I don't know enough about what's going on to say.dfeuer
@dfeuer Yep, I'd regard that as theoretically bounded (it's a finite-size type) but practically the bound is so large that one can not hope to realistically enumerate all the cases.chi
IIRC, in the Okasaki book he doesn't need to use type-level numerals, at the cost of some efficiency (still log-complexity, though). To try something like this in C++, I'd try to see if OOP subtyping can somehow help. Also, we don't have algebraic types (let alone GADTs) in C++, yet we can encode them using a Church-like representation (the "visitor pattern" in OOP-speak).chi

2 Answers

4
votes
template<class K, class T>
struct NodeView {
  virtual NodeView const* left() const = 0;
  virtual NodeView const* right() const = 0;
  virtual K const& key() const = 0;
  virtual T const& value() const = 0;
private:
  ~NodeView() {} // no deleting it!
};

this is an interface.

Have your tree nodes implement this interface. They are permitted and encouraged to return nullptr when they have no child on one side or the other.

In the base structure, take a root node as a template parameter. Check that it is black using template tomfoolery.

Use make_shared to store it in a std::shared_ptr

auto tree = std::make_shared<std::decay_t<decltype(tree)>>(decltype(tree)(tree));
std::shared_ptr<NodeView const> m_tree = std::move(tree);

Where the m_tree member is a member of your root management structure.

You now have read-only access to a generic tree. It is guaranteed to be a balanced red-black tree at compile time by the code that stores it. At run time, it is just guaranteed to be a tree.

You could leak more guarantee information into the interface I wrote above, but that would clutter the interface beyond what a reader usually needs to know. (like, have a different Red and Black interface node type).

Now if the body of one short function is too much to trust, and you'd rather trust a wall of template code, we can do this:

template<template<class...>class Test, class T>
struct restricted_shared_ptr {
  template<class U,
    std::enable_if_t< Test<U>{}, int> = 0
  >
  restricted_shared_ptr( std::shared_ptr<U> pin ):ptr(std::move(pin)) {}
  restricted_shared_ptr(restricted_shared_ptr const&)=default;
  restricted_shared_ptr(restricted_shared_ptr &&)=default;
  restricted_shared_ptr& operator=(restricted_shared_ptr const&)=default;
  restricted_shared_ptr& operator=(restricted_shared_ptr &&)=default;
  restricted_shared_ptr() = default;
  T* get() const { return ptr.get(); }
  explicit operator bool() const { return (bool)ptr; }
  T& operator*() const { return *ptr.get(); }
  T* operator->() const { return ptr.get(); }
private:
  std::shared_ptr<T> ptr;
};

Now we just write an arbitrary template check that says "this is good enough to satisfy me".

And store a restricted_shared_ptr< MyCheck, NodeView<K,T> const >. There is no way to store the a type within this shared pointer that does not pass MyCheck without undefined behavior.

Here, you need to trust MyCheck's constructor to do what it says it does.

template<class T>
struct IsBlackNode:std::false_type{};

template<class K, class V, std::size_t Height, class Left, class Right>
struct IsBlackNode< BlackNode<K, V, Height, Left, Right> >:std::true_type{};

which is a requirement that only BlackNodes can pass.

So a restricted_shared_ptr< IsBlackNode, NodeView<K, T> const > is a shared pointer to something that passes the IsBlackNode test and implements the NodeView<K,T> interface.

4
votes

Note

Yakk's answer is more idiomatic C++ - this answer shows how to write (or at least begin writing) something more similar to the Haskell version.

When you see how much C++ is required to emulate Haskell, you may choose to use the native idioms instead.

tl;dr

most of your Haskell invariants (and properties) aren't really expressed statically in the type at all, but in the (runtime) code of the various constructors. The type system helps by guaranteeing one of those constructors really ran, and tracking which one it was for pattern-matching dispatch.


IIUC, your Haskell Node type doesn't have four type parameters, but two:

data Node (color :: Color) (height :: Nat) key value where

fixes the types of the color and height - only the key and value types are undetermined. All four are records, but two of those records have fixed types.

So, the closest simple translation would be

template <typename Key, typename Value>
struct Node {
    Color color_;
    size_t height_;
    Key key_;
    Value val_;
};

The tricky part is there's no direct support for your different constructors - that's runtime information Haskell tracks for you.

So, a Leaf is a Node, whose constructor initialized the color and height fields for you, but the constructor used is also tracked as a part of the object created.

The closest equivalent to this, and the pattern-matching it gives you, would be a variant type like Boost.Variant, giving you something like:

// underlying record storage
template <typename Key, typename Value>
struct NodeBase {
    Color color_;
    size_t height_;
    Key key_;
    Value val_;
};

template <typename Key, typename Value>
struct Leaf: public NodeBase<Key,Value> {
    Leaf(Key k, Value v) : NodeBase{Color::Black, 0, k, v} {}
    // default other ctors here
};
// similarly for your BlackBranch and RedBranch constructors

template <typename Key, typename Value>
using Node = boost::variant<Leaf<Key,Value>,
                            RedBranch<Key,Value>,
                            BlackBranch<Key,Value>>;

note again that your Branch type has records for leftColor, rightColor, childHeight, and that only the key and value create type parameters.

Finally, where you'd use pattern matching to write a function on your different Node constructors in Haskell, you'd use

template <typename Key, typename Value>
struct myNodeFunction {
    void operator() (Leaf &l) {
        // use l.key_, l.value_, confirm l.height_==0, etc.
    }
    void operator() (RedBranch &rb) {
        // use rb.key_, rb.value_, confirm rb.color_==Color::Red, etc.
    }
    void operator() (BlackBranch &bb) {
        // you get the idea
    }
};

and apply it like:

boost::apply_visitor(myNodeFunction<K,V>(), myNode);

or, if you're using this pattern a lot, you can wrap it up as

template <typename Key, typename Value,
          template Visitor<typename,typename> >
void apply(Node<Key,Value> &node)
{
    boost::apply_visitor(Visitor<Key,Value>{}, node);
}