130
votes

It's easy enough to represent a tree or list in haskell using algebraic data types. But how would you go about typographically representing a graph? It seems that you need to have pointers. I'm guessing you could have something like

type Nodetag = String
type Neighbours = [Nodetag]
data Node a = Node a Nodetag Neighbours

And that would be workable. However it feels a bit decoupled; The links between different nodes in the structure don't really "feel" as solid as the links between the current previous and next elements in a list, or the parents and children of a node in a tree. I have a hunch that doing algebraic manipulations on the graph as I defined it would be somewhat hindered by the level of indirection introduced through the tag system.

It is primarily this feeling of doubt and perception of inelegance that causes me to ask this question. Is there a better/more mathematically elegant way of defining graphs in Haskell? Or have I stumbled upon something inherently hard/fundamental? Recursive data structures are sweet, but this seems to be something else. A self referential data structure in a different sense to how trees and lists are self referential. It's like lists and trees are self referential at the type level, but graphs are self referential at the value level.

So what's really going on?

8
You might be interested in Martin Erwig's paper on functional graph algorithms: web.engr.oregonstate.edu/~erwig/papers/abstracts.html#JFP01. The fgl package developed out of this.John L
The 99 Haskell problems page shows some examples of graphs used in a problem solving context. It also has a short intro on different representations.dopamane

8 Answers

48
votes

I also find it awkward to try to represent data structures with cycles in a pure language. It's the cycles that are really the problem; because values can be shared any ADT that can contain a member of the type (including lists and trees) is really a DAG (Directed Acyclic Graph). The fundamental issue is that if you have values A and B, with A containing B and B containing A, then neither can be created before the other exists. Because Haskell is lazy you can use a trick known as Tying the Knot to get around this, but that makes my brain hurt (because I haven't done much of it yet). I've done more of my substantial programming in Mercury than Haskell so far, and Mercury is strict so knot-tying doesn't help.

Usually when I've run into this before I've just resorted to additional indirection, as you're suggesting; often by using a map from ids to the actual elements, and having elements contain references to the ids instead of to other elements. The main thing I didn't like about doing that (aside from the obvious inefficiency) is that it felt more fragile, introducing the possible errors of looking up an id that doesn't exist or trying to assign the same id to more than one element. You can write code so that these errors won't occur, of course, and even hide it behind abstractions so that the only places where such errors could occur are bounded. But it's still one more thing to get wrong.

However, a quick google for "Haskell graph" led me to http://www.haskell.org/haskellwiki/The_Monad.Reader/Issue5/Practical_Graph_Handling, which looks like a worthwhile read.

63
votes

In shang's answer you can see how to represent a graph using laziness. The problem with these representations is that they are very difficult to change. The knot-tying trick is useful only if you're going to build a graph once, and afterward it never changes.

In practice, should I actually want to do something with my graph, I use the more pedestrian representations:

  • Edge list
  • Adjacency list
  • Give a unique label to each node, use the label instead of a pointer, and keep a finite map from labels to nodes

If you're going to be changing or editing the graph frequently, I recommend using a representation based on Huet's zipper. This is the representation used internally in GHC for control-flow graphs. You can read about it here:

37
votes

As Ben mentioned, cyclic data in Haskell is constructed by a mechanism called "tying the knot". In practice, it means that we write mutually recursive declarations using let or where clauses, which works because the mutually recursive parts are lazily evaluated.

Here's an example graph type:

import Data.Maybe (fromJust)

data Node a = Node
    { label    :: a
    , adjacent :: [Node a]
    }

data Graph a = Graph [Node a]

As you can see, we use actual Node references instead of indirection. Here's how to implement a function that constructs the graph from a list of label associations.

mkGraph :: Eq a => [(a, [a])] -> Graph a
mkGraph links = Graph $ map snd nodeLookupList where

    mkNode (lbl, adj) = (lbl, Node lbl $ map lookupNode adj)

    nodeLookupList = map mkNode links

    lookupNode lbl = fromJust $ lookup lbl nodeLookupList

We take in a list of (nodeLabel, [adjacentLabel]) pairs and construct the actual Node values via an intermediate lookup-list (which does the actual knot-tying). The trick is that nodeLookupList (which has the type [(a, Node a)]) is constructed using mkNode, which in turn refers back to the nodeLookupList to find the adjacent nodes.

37
votes

It's true, graphs are not algebraic. To deal with this problem, you have a couple of options:

  1. Instead of graphs, consider infinite trees. Represent cycles in the graph as their infinite unfoldings. In some cases, you may use the trick known as "tying the knot" (explained well in some of the other answers here) to even represent these infinite trees in finite space by creating a cycle in the heap; however, you will not be able to observe or detect these cycles from within Haskell, which makes a variety of graph operations difficult or impossible.
  2. There are a variety of graph algebras available in the literature. The one that comes to mind first is the collection of graph constructors described in section two of Bidirectionalizing Graph Transformations. The usual property guaranteed by these algebras is that any graph can be represented algebraically; however, critically, many graphs will not have a canonical representation. So checking equality structurally isn't enough; doing it correctly boils down to finding graph isomorphism -- known to be something of a hard problem.
  3. Give up on algebraic datatypes; explicitly represent node identity by giving them each unique values (say, Ints) and referring to them indirectly rather than algebraically. This can be made significantly more convenient by making the type abstract and providing an interface that juggles the indirection for you. This is the approach taken by, e.g., fgl and other practical graph libraries on Hackage.
  4. Come up with a brand new approach that fits your use case exactly. This is a very difficult thing to do. =)

So there are pros and cons to each of the above choices. Pick the one that seems best for you.

21
votes

A few others have briefly mentioned fgl and Martin Erwig's Inductive Graphs and Functional Graph Algorithms, but it's probably worth writing up an answer that actually gives a sense of the data types behind the inductive representation approach.

In his paper, Erwig presents the following types:

type Node = Int
type Adj b = [(b, Node)]
type Context a b = (Adj b, Node, a, Adj b)
data Graph a b = Empty | Context a b & Graph a b

(The representation in fgl is slightly different, and makes good use of typeclasses - but the idea is essentially the same.)

Erwig is describing a multigraph in which nodes and edges have labels, and in which all edges are directed. A Node has a label of some type a; an edge has a label of some type b. A Context is simply (1) a list of labeled edges pointing to a particular node, (2) the node in question, (3) the node's label, and (4) the list of labeled edges pointing from the node. A Graph can then be conceived of inductively as either Empty, or as a Context merged (with &) into an existing Graph.

As Erwig notes, we can't freely generate a Graph with Empty and &, as we might generate a list with the Cons and Nil constructors, or a Tree with Leaf and Branch. Too, unlike lists (as others have mentioned), there's not going to be any canonical representation of a Graph. These are crucial differences.

Nonetheless, what makes this representation so powerful, and so similar to the typical Haskell representations of lists and trees, is that the Graph datatype here is inductively defined. The fact that a list is inductively defined is what allows us to so succinctly pattern match on it, process a single element, and recursively process the rest of the list; equally, Erwig's inductive representation allows us to recursively process a graph one Context at a time. This representation of a graph lends itself to a simple definition of a way to map over a graph (gmap), as well as a way to perform unordered folds over graphs (ufold).

The other comments on this page are great. The main reason I wrote this answer, however, is that when I read phrases such as "graphs are not algebraic," I fear that some readers will inevitably come away with the (erroneous) impression that no one's found a nice way to represent graphs in Haskell in a way that permits pattern matching on them, mapping over them, folding them, or generally doing the sort of cool, functional stuff that we're used to doing with lists and trees.

14
votes

I always liked Martin Erwig's approach in "Inductive Graphs and Functional Graph Algorithms", which you can read here. FWIW, I once wrote a Scala implementation as well, see https://github.com/nicolast/scalagraphs.

5
votes

Any discussion of representing graphs in Haskell needs a mention of Andy Gill's data-reify library (here is the paper).

The "tying-the-knot" style representation can be used to make very elegant DSLs (see example below). However, the data structure is of limited use. Gill's library allows you the best of both worlds. You can use a "tying the knot" DSL, but then convert the pointer-based graph into a label-based graph so you can run your algorithms of choice on it.

Here is a simple example:

-- Graph we want to represent:
--    .----> a <----.
--   /               \
--  b <------------.  \
--   \              \ / 
--    `----> c ----> d

-- Code for the graph:
a = leaf
b = node2 a c
c = node1 d
d = node2 a b
-- Yes, it's that simple!



-- If you want to convert the graph to a Node-Label format:
main = do
    g <- reifyGraph b   --can't use 'a' because not all nodes are reachable
    print g

To run the above code you will need the following definitions:

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeFamilies #-}
import Data.Reify
import Control.Applicative
import Data.Traversable

--Pointer-based graph representation
data PtrNode = PtrNode [PtrNode]

--Label-based graph representation
data LblNode lbl = LblNode [lbl] deriving Show

--Convenience functions for our DSL
leaf      = PtrNode []
node1 a   = PtrNode [a]
node2 a b = PtrNode [a, b]


-- This looks scary but we're just telling data-reify where the pointers are
-- in our graph representation so they can be turned to labels
instance MuRef PtrNode where
    type DeRef PtrNode = LblNode
    mapDeRef f (PtrNode as) = LblNode <$> (traverse f as)

I want to stress that this is a simplistic DSL, but the sky's the limit! I designed a very featureful DSL, including a nice tree-like syntax for having a node broadcast an initial value to some of its children, and many convenience functions for constructing specific node types. Of course, the Node data type and mapDeRef definitions were much more involved.

2
votes

I like this implementation of a graph taken from here

import Data.Maybe
import Data.Array

class Enum b => Graph a b | a -> b where
    vertices ::  a -> [b]
    edge :: a -> b -> b -> Maybe Double
    fromInt :: a -> Int -> b