Non-Recursive DFS Implementation

Question

Recently I needed to implement non-recursive DFS as part of a more complicated algorithm, Tarjan's algorithm to be precise. The recursive implementation is very elegant, but not suitable for large graphs. When I implemented the iterative version, I was shocked at how inelegant it finally ended up being, and I was wondering if I had done something wrong.

There's two basic approaches to iterative DFS. First, you can push all the children of a node at once onto the stack (seems by far more common). Or you can just push one. I will focus on the first one as that seems how everyone does it.

I had various problems with this algorithm, and eventually I realized that to do it efficiently, I needed not 1, not 2, but 3 boolean flags (I don't necessarily mean you need three explicit boolean variables, you might store the information indirectly via special values of variables that are usually integers, but you need to access those 3 pieces of information one way or another. The three flags were: 1) visited. This was to prevent children from being pushed onto the stack very redundantly. 2) Done. To prevent redundant processing of the same node. 3) Ascending/descending. To indicate whether the children had already been pushed onto the stack. The pseudocode looks something like this:

while(S)
    if S.peek().done == True
        S.pop()
        continue

    S.peek().visited = True

    if S.peek().descending == True
        S.peek().descending = False
        for c in S.peek().children
            if c.visited == False
                S.push(c)
        doDescendingStuff()    
    else
        w = S.pop()
        w.done = True
        doAscendingStuff()

Some notes: 1) You don't need ascending/descending technically, as you could just see if the children are all done or not. But it's pretty inefficient in a dense graph.

2), The main kicker: The visited/done thing might not seem necessary. Here's why (I think) you need it. You can't mark things visited until you visit them on the stack. If you do, you can process things in the wrong order. E.g. suppose A is linked to B and C, B links to D, and D links to C. Then from A, you will push B and C on the stack. From B you push D on the stack... and then what? If you are marking things visited when you push them on the stack, you won't push C on the stack here. But this is wrong, C should be visited from D, not from A in this graph (assuming that A visits B before C). So, you don't mark things visited until you process them. But then, you will have C on the stack twice. So you need another flag to show you are completely done with it, so you don't process C a second time.

I don't see how to avoid all this to having a perfectly correct non-recursive DFS that supports actions both winding and unwinding. But instinctively it feels crufty. Is there a better way? Almost every place that I have consulted online really glosses over how to actually implement non-recursive DFS, saying that it can be done and providing a very basic algorithm. When the algorithm is correct (in terms of properly supporting multiple paths to the same node) which is rare, it rarely properly supports doing stuff both on winding and unwinding.

I haven't done a lot of them, but I find stack-based solutions to recursive problems to be generally messy. I'd just have been glad to have gotten it working. — Bernhard Barker
I don't really see why you need visited + done (just replace if S.peek().done == True with if S.peek().visited == True). In your example, you wouldn't process C twice, since you'd set visited = True when processing C from D. — Bernhard Barker
Can I ask, why do you want to avoid recursion? Many modern CPUs have optization that allow some recursive algorithms to outperform their non-recursive counterparts. — 500 - Internal Server Error
Dukeling, if you only set visited to True at that point, then you can actually get infinite loops. In my algorithm, you only set done to True when ascending, so if you only set visited to True at that point and you have a loop in the graph, you will keep loading things on the stack (because the children will only be marked visited once all their children are ascended, but their children can't ascend until their children ascend... etc). — Nir Friedman
500, the problem is that not so much speed as storage. If you have an explicit stack, then without breaking a sweat you can have a depth equal to the size of your RAM. The maximum stack size on recursion is much smaller, I've heard numbers like 8mb. Languages like Python have a default maximum recursion depth of 1000 (can be changed). For a DFS your max stack size is often the number of nodes in the graph, for which say 10 000 isn't particularly large. Whereas in say merge sort, you are guaranteed max recursion depth of log(n), where n is size of sorted array. So you're safe. — Nir Friedman

Bernhard Barker Bernhard Barker · Accepted Answer · 2013-02-03T02:36:40

I think the most elegant stack-based implementation would have iterators of children on the stack, rather than nodes. Think of an iterator just as storing a node and a position in its children.

while (!S.empty)
  Iterator i = S.pop()
  bool found = false
  Iterator temp = null
  while (i.hasNext())
    Node n = i.next()
    if (n.visited == false)
      n.visited = true
      doDescendingStuff(n)
      temp = n.getChildrenIterator()
      break
  if (!i.hasNext())
    doAscendingStuff(i.getNode())
  else
    S.push(i)
  if (temp != null)
    S.push(temp)

The above could be optimised i.t.o storage space by separating the node and position onto 2 stacks.

Non-Recursive DFS Implementation

7 Answers