95
votes

There are three ways to store a graph in memory:

  1. Nodes as objects and edges as pointers
  2. A matrix containing all edge weights between numbered node x and node y
  3. A list of edges between numbered nodes

I know how to write all three, but I'm not sure I've thought of all of the advantages and disadvantages of each.

What are the advantages and disadvantages of each of these ways of storing a graph in memory?

7
I'd consider the matrix only if the graph were very-connected or very small. For sparsely connected graphs, the object/pointer or list of edges approaches would both give much better memory use. I'm curious what besides storage I've overlooked. ;)sarnold
They differ in time complexity also, the matrix is O(1), and the other representations can vary widely depending on what you are looking for.msw
I recall reading an article a while back describing the hardware advantages of implementing a graph as a matrix over a list of pointers. I can't remember much about it except that, as you're dealing with a contiguous block of memory, at any given time much of your working set may very well be in L2 cache. A list of nodes/pointers on the other hand may be shotgunned through memory and may will likely require a fetch that doesn't hit the cache. I'm not sure I agree but it's an interesting thought.nerraga
@Dean J: just a question about "nodes as objects and edges as pointers representation". Which data structure do you use to store pointers in the object? Is it a list?Timofey
The common names are: (1) equivalent to adjacency list, (2) adjacency matrix, (3) edge list.Evgeni Sergeev

7 Answers

53
votes

One way to analyze these is in terms of memory and time complexity (which depends on how you want to access the graph).

Storing nodes as objects with pointers to one another

  • The memory complexity for this approach is O(n) because you have as many objects as you have nodes. The number of pointers (to nodes) required is up to O(n^2) as each node object may contain pointers for up to n nodes.
  • The time complexity for this data structure is O(n) for accessing any given node.

Storing a matrix of edge weights

  • This would be a memory complexity of O(n^2) for the matrix.
  • The advantage with this data structure is that the time complexity to access any given node is O(1).

Depending on what algorithm you run on the graph and how many nodes there are, you'll have to choose a suitable representation.

11
votes

A couple more things to consider:

  1. The matrix model lends itself more easily to graphs with weighted edges, by storing the weights in the matrix. The object/pointer model would need to store edge weights in a parallel array, which requires synchronization with the pointer array.

  2. The object/pointer model works better with directed graphs than undirected graphs because the pointers would need to be maintained in pairs, which can become unsynchronized.

9
votes

The objects-and-pointers method suffers from difficulty of search, as some have noted, but are pretty natural for doing things like building binary search trees, where there's a lot of extra structure.

I personally love adjacency matrices because they make all kinds of problems a lot easier, using tools from algebraic graph theory. (The kth power of the adjacency matrix give the number of paths of length k from vertex i to vertex j, for example. Add an identity matrix before taking the kth power to get the number of paths of length <=k. Take a rank n-1 minor of the Laplacian to get the number of spanning trees... And so on.)

But everyone says adjacency matrices are memory expensive! They're only half-right: You can get around this using sparse matrices when your graph has few edges. Sparse matrix data structures do exactly the work of just keeping an adjacency list, but still have the full gamut of standard matrix operations available, giving you the best of both worlds.

7
votes

I think your first example is a little ambiguous — nodes as objects and edges as pointers. You could keep track of these by storing only a pointer to some root node, in which case accessing a given node may be inefficient (say you want node 4 — if the node object isn't provided, you may have to search for it). In this case, you'd also lose portions of the graph that aren't reachable from the root node. I think this is the case f64 rainbow is assuming when he says the time complexity for accessing a given node is O(n).

Otherwise, you could also keep an array (or hashmap) full of pointers to each node. This allows O(1) access to a given node, but increases memory usage a bit. If n is the number of nodes and e is the number of edges, the space complexity of this approach would be O(n + e).

The space complexity for the matrix approach would be along the lines of O(n^2) (assuming edges are unidirectional). If your graph is sparse, you will have a lot of empty cells in your matrix. But if your graph is fully connected (e = n^2), this compares favorably with the first approach. As RG says, you may also have fewer cache misses with this approach if you allocate the matrix as one chunk of memory, which could make following a lot of edges around the graph faster.

The third approach is probably the most space efficient for most cases — O(e) — but would make finding all the edges of a given node an O(e) chore. I can't think of a case where this would be very useful.

5
votes

Take a look at comparison table on wikipedia. It gives a pretty good understanding of when to use each representation of graphs.

4
votes

There is another option: nodes as objects, edges as objects too, each edge being at the same time in two doubly-linked lists: the list of all edges coming out from the same node and the list of all edges going into the same node.

struct Node {
    ... node payload ...
    Edge *first_in;    // All incoming edges
    Edge *first_out;   // All outgoing edges
};

struct Edge {
    ... edge payload ...
    Node *from, *to;
    Edge *prev_in_from, *next_in_from; // dlist of same "from"
    Edge *prev_in_to, *next_in_to;     // dlist of same "to"
};

The memory overhead is big (2 pointers per node and 6 pointers per edge) but you get

  • O(1) node insertion
  • O(1) edge insertion (given pointers to "from" and "to" nodes)
  • O(1) edge deletion (given the pointer)
  • O(deg(n)) node deletion (given the pointer)
  • O(deg(n)) finding neighbors of a node

The structure also can represent a rather general graph: oriented multigraph with loops (i.e. you can have multiple distinct edges between the same two nodes including multiple distinct loops - edges going from x to x).

A more detailed explanation of this approach is available here.

3
votes

Okay, so if edges don't have weights, the matrix can be a binary array, and using binary operators can make things go really, really fast in that case.

If the graph is sparse, the object/pointer method seems a lot more efficient. Holding the object/pointers in a data structure specifically to coax them into a single chunk of memory might also be a good plan, or any other method of getting them to stay together.

The adjacency list - simply a list of connected nodes - seems by far the most memory efficient, but probably also the slowest.

Reversing a directed graph is easy with the matrix representation, and easy with the adjacency list, but not so great with the object/pointer representation.