0
votes

Given a binary tree that might contain duplicate values, you need to find minimum distance between two given values. Note that the binary tree can be large.

For example:

        5
      /   \
    1       7
   / \     / \
  4   3   8   2
 / \
1   2

The function should return 2 for (1 and 2 as input).
(If duplicates are not present, we can find LCA and then calculate the distance.)

I've written the following code but I couldn't handle cases when the values are present in different subtrees and in the below cases:

  1. root = 1, root.left = 4, root.left.left = 3, root.left.right = 2, root.left.left.left = 1
  2. root = 1, root.left = 4, root.left.left = 3, root.left.left.left = 1, root.left.left.right = 2
void dist(struct node* root,int& min,int n1,int n2,int pos1,int pos2,int level) {
    if(!root)
        return;
    if(root->data==n1){
        pos1 = level;
        if(pos2>=0)
            if(pos1-pos2 < min)
                min = pos1-pos2;
    }
    else if(root->data==n2){
        pos2 = level;
        if(pos1>=0)
            if(pos2-pos1 < min)
                min = pos2-pos1;
    }
    dist(root->left,min,n1,n2,pos1,pos2,level+1);
    dist(root->right,min,n1,n2,pos1,pos2,level+1);
}

I think at each node we can find if that node is the LCA of the values or not. If that node is LCA then find the distance and update min accordingly, but this would take O(n2).

3
This question isn't suitable for SO.theharshest
What have you tried so far? This is an interesting question, but unless you've demonstrated that you've actually tried to solve it yourself it's not appropriate to post the question here.templatetypedef
@templatetypedef The main problem is when the two values are in different subtrees and when root= 1, root.left= 4, root.left.left=3, root.left.right=2, root.left.left.left=1 and root= 1, root.left= 4, root.left.left=3, root.left.left.left=1, root.left.left.right=2. In both the cases answer should be 2.da3m0n
I think identifying an approach (e.g. LCA) and its weaknesses counts as trying to solve it.mcdowella

3 Answers

2
votes

Following is an algorithm to solve the problem:-

traverse all of the tree and calculate paths for each node using binary strings representation and store into hash map

eg. For your tree the hashmap will be

1 => 0,000
2 => 001,11
3 => 01
...

When query for distance between (u,v) check for each pair and calculate distance between them. Remove common prefix from strings and then sum the remaining lengths

eg. u=1 and v=2

distance(0,001) = 2
distance(0,11) = 3
distance(000,001) = 2
distance(000,11) = 5 

min = 2

Note: I think the second step can be made more efficient but need to do more research

1
votes

You can compute the LCA of a set of nodes by computing LCA(x1, LCA(x2, LCA(x3... and all the nodes in the set will be somewhere below this LCA. If you compare the LCAs of two sets of nodes and one is not directly beneath the other then the minimum distance between any two nodes in different sets will be at least the distance between the LCAs. If one LCA is above the other then the minimum distance could be zero.

This allows a sort of branch and bound approach. At each point you have a best minimum distance so far (initialized as infinity). Given two sets of nodes, use their LCAs to work out a lower bound on their minimum distance and discard them if this is no better than the best answer so far. If not discarded, split each set into two plus a possible single depending on whether each node in the set is to the left of the LCA, to the right of the LCA, or is the LCA. Recursively check for the minimum distance in the (up to nine) pairs of split sets. If both splits in a pair are below some minimum size, just work out the LCAs and minimum distances of each pair of nodes across the two sets - at this point may find out that you have a new best answer and can update the best answer so far.

Looking at the example at the top of the question, the LCA of the 2s is the root of the tree, and the LCA of the 1s is the highest 1. So the minimum distance between these two sets could be close to zero. Now split each set in two. The left hand 2 is distance two from both of the two 1s. The LCA of the right hand 2 is itself, on the right hand branch of the tree, and the LCA of each of the two 1s is down on the left hand branch of the tree. So the distance between the two is at least two, and we could tell this even if we had a large number of 2s anywhere below the position of the existing right-hand two, and a large number of 1s anywhere on the left hand subtree.

0
votes

Do a pre-order traversal of the tree (or any traversal should work).

During this process, simply keep track of the closest 1 and 2, and update the distance whenever you find a 2 and the closest 1 is closer than the closest distance so far, or vice versa.

Code (C++, untested first draft): (hardcoded 1 and 2 for simplicity)

int getLeastDistance(Node *n, int *distTo1, int *distTo2)
{
  if (n == NULL)
    return;

  int dist = LARGE_VALUE;

  // process current node
  if (n->data == 1)
  {
    dist = *distTo2;
    *distTo1 = 0;
  }
  else if (n->data == 2)
  {
    dist = *distTo1;
    *distTo2 = 0;
  }

  // go left
  int newDistTo1 = *distTo1 + 1,
      newDistTo2 = *distTo2 + 1;

  dist = min(dist, getLeastDistance(n->left, &newDistTo1, &newDistTo2));

  // update distances
  *distTo1 = min(*distTo1, newDistTo1 + 1);
  *distTo2 = min(*distTo2, newDistTo2 + 1);

  // go right
  newDistTo1 = *distTo1 + 1;
  newDistTo2 = *distTo2 + 1;

  dist = min(dist, getLeastDistance(n->right, &newDistTo1, &newDistTo2));
}

Caller:

Node root = ...;
int distTo1 = LARGE_VALUE, distTo2 = LARGE_VALUE;
int dist = getLeastDistance(&root, &distTo1, &distTo2);

Just be sure to make LARGE_VALUE far enough from the maximum value for int such that it won't overflow if incremented (-1 is probably safer, but it requires more complex code).