3
votes

What is the easiest way to implement lowest common ancestor in Python? I have a tree which is represented by each node having a pointer to its parent, and I want to be able to find the first common ancestor given two nodes. I've come up with several ideas but none are particularly appealing

  1. Have each node contain a list of its bases, and to perform a join, find the longest common prefix and then take the last element. Unfortunately, I don't know of any builtin way to do longest common prefix so this requires manual looping.

  2. Have each node contain a set of its bases and perform a set intersection, and take the maximum element. But this requires defining custom comparison operators, and I'm not even sure if it would work.

What should I do? I'm looking for something that favors simplicity over performance, so solutions requiring complicated processing are out.

Edit: I discovered that while there isn't a builtin way, you can do longest common prefix in one line using zip, so it's still fairly simple.

common = [x for x in zip(*baselists) if len(set(x)) == 1][-1]
6

6 Answers

6
votes

Under the assumption that you cannot amend your tree to include depth, you can do the following:

For each Node, recursively traverse the tree upwards until you hit the root. At each parent node, insert the node into a list. This should give you list_a and list_b. Iterate over the shortest list, comparing elements from each list. When you find one that doesn't match, the previous entry is your largest parent element.

5
votes

Take the depth of each node (the distance from the root). If one is lower than the other, go up the tree from the lower node until the depths are equal. Then check for identity, moving each up on each side every time the check fails.

You can do this with a single while loop. Once you have chosen ancestors with equal depth:

while (ancestorA !== ancestorB) {
    ancestorA = ancestorA.parent();
    ancestorB = ancestorB.parent();
}

When the while loop terminates, ancestorA and ancestorB will each be your common ancestor.

This should be not only quite simple, but also fairly fast.

2
votes

Python has build-in sets. Why not using something along the lines of (pseudo-code):

a_ancestors = set()
while a:
  a_ancestors.add(a)
  a = a.parent

while b:
  if b in a_ancestors:
    return b
  b = b.parent

return None # <- can't reach that for a *tree* !!!

This will build the (unordered) set of all ancestors of node a (incl. a itself).

Then, in a second time, we loop over all ancestors of b. By definition, the first ancestor of b that is an ancestor of a will be the first common ancestor. This works in O(n) (in space and time)


You can potentially speed up the process (eventually at the expense of the space occupation) by concurrently collecting both the set of the ancestors of a and b -- stopping a soon as you find a common node. The code is a little bit contrived as you have to deal with one of the branch having reached the root before the other :

visited = set()
while a or b:
  if a:
    if a in visited:
      return a
    visited.add(a)
    a = a.parent

  if b:
    if b in visited:
      return b
    visited.add(b)
    b = b.parent

return None # <- can't reach that for a *tree* !!!
0
votes

I suppose it depends on your tree, and how many objects it will contain. If the number will be reasonable memory-wise (probably fewer than a few million nodes, but that's just a wild guess on my part), I'd use your suggestion #2. In the set just keep a string representation of each base, so built-in comparisons will work. Should be very fast, and I'd imagine you could implement it in just a few lines of code. If the string representation isn't practical, or if you need the object itself and can't implement a master dict of all objects, just define the comparison methods in your node objects (eq and neq if i recall).

0
votes

Idea it to maintain parents collection, it's better to use hash map with = , because in this case you won't spend logn on searching in parents list. So, on each iteration check this map and if parent for current node is already present in the map, then this parent is your result. In worst case scenario it gives O(n), but if you'll start analysis for both nodes at the time in some cases your will be able to find it faster.

0
votes

Since you already have a pointer to the parent nodes in both cases why not do: (similar to what weirdcanada said but...)

Create a list of each node's parents, building the list sequentially at each stage. So list_a and list_b grows as you go higher. Compare the last item added of each list with the items of the other. As soon as an item in list_a matches something in list_b, you have the lowest common ancestor.

while (parent_a not in list_b) or (parent_b not in list_a):
    ...

You don't need to re-construct the chain all the way till root. In any case, you will have to check each parent sequentially (forwards or backwards).