2
votes

This excerpt is from Cracking the Coding Interview 5th Edition

Numbers are randomly generated and stored in an (expanding) array. How would you keep track of the median

The author walked us through a solution based on heaps:

A heap is really good at basic ordering and keeping track of max and mins. This is actually interesting - if you could keep track of the bigger half and the smaller half of the elements. The bigger half is kept in a min heap, such that the smallest element in the bigger half is at the root. The smaller half is kept in a max heap such that the biggest element of the smallest half is at the root. Now with these data structures, you have the potential median elements at the root." If the heaps are no longer the same size, you can quickly "rebalance" the the heaps by popping an element off the heap and pushing it to the onto the other

I have a couple questions about this approach by I ask them one at a time to keep it organized. First of all, using this approach, if you iterate through the elements in the array, how would you know a specific element belongs in the min heap or the max heap as described by the algorithm? Say our data elements are [20, 39, 14, 7, 86, 90]. When you iterate the array, what heap would you put, lets say 20?

2

2 Answers

2
votes

Each time you're inserting an element, check the minimum of the min heap and the maximum of the max heap to see which heap each element belongs in.

When you see 20, both heaps are empty, so it doesn't matter -- let's say in case of a tie we'll put elements in the max heap of smaller elements.

[20] []

When you see 39 it's bigger than 20, so it goes in the upper heap

[20] [39]

14 is lower than 20, so it goes in the lower heap

[14, 20] [39]

7 is lower than 20, so it goes in the lower heap, but the lower heap is too big now, so 20 comes out of the lower heap and into the upper heap.

[7, 14] [20, 39]

86 > 20, so it goes into the upper heap

[7, 14] [20, 39, 86]

90 > 20, so it goes into the upper heap, but the upper heap is too big now, so 20 comes out of the upper heap and back into the lower heap

[7, 14, 20] [39, 86, 90]

Which way things get balanced doesn't matter -- perhaps you should stick with the lower heap has size <= the upper heap size, or vice versa -- but you will need to keep it balanced.

0
votes

Here's one way to fill in the details.

We maintain the following invariants.

  1. The multiset union of the contents of the min heap and the max heap are the input so far.

  2. All elements in the max heap are not greater than than all elements in the min heap.

  3. The size of the min heap minus the size of the max heap is zero or one.

To scan a new element, we insert it into the min heap. The difference in sizes is now one or two. If it's two, we pop the min element from the min heap and insert it into the max heap, restoring invariant 3.

The median is either the min element in the min heap or the mean of that element and the max element in the max heap, depending on whether the size difference is one or zero.