This is perhaps related to functional data structures, but I found no tags about this topic.
Say I have a syntax tree type Tree
, which is organised as a DAG by simply sharing common sub expressions. For example,
data Tree = Val Int | Plus Tree Tree
example :: Tree
example = let x = Val 42 in Plus x x
Then, on this syntax tree type, I have a pure function simplify :: Tree -> Tree
, which, when given the root node of a Tree
, simplifies the whole tree by first simplifying the children of that root node, and then handle the operation of the root node itself.
Since simplify
is a pure function, and some nodes are shared, we expect not to call simplify
multiple times on those shared nodes.
Here comes the problem. The whole data structure is invariant, and the sharing is transparent to the programmer, so it seems impossible to determine whether or not two nodes are in fact the same nodes.
The same problem happens when handling the so-called “tying-the-knot” structures. By tying the knot, we produce a finite data representation for an otherwise infinite data structure, e.g. let xs = 1 : xs in xs
. Here xs
itself is finite, but calling map succ
on it does not necessarily produce a finite representation.
These problems can be concluded as such: when the data is organised in an invariant directed graph, how do we avoid revisiting the same node, doing duplicated work, or even resulting in non-termination when the graph happened to be cyclic?
Some ideas that I have thought of:
- Extend the
Tree
type toTree a
, making every nodes hold an extraa
. When generating the graph, associate each node with a uniquea
value. The memory address should have worked here, despite that the garbage collector may move any heap object at any time. - For the syntax tree example, we may store a
STRef (Maybe Tree)
in every node for the simplified version, but this might not be extensible, and injects some implementation detail of a specific operation to the whole data structure itself.
fmap f
), while (in general) preserving its shape? Or I would not be able to take aTree ()
with shared nodes in it, and replace the()
s withSTRef
s without changing its shape. Then I would have to carry all thoseSTRef
s ever since the tree is created, which means I will need to wrap the whole stuff in theST
monad. – Ruifeng Xiefmap
in the libraries that, relying on unsafe pointer equality, preserves the sharing (or at leats some of it) in the original tree (and callsf
fewer times). I don't think there is such a thing at the moment. Ditto for other traversals. – chi