2
votes

When I studied the Data Structures course in the university, I learned the following axioms:

  1. Insertion of a new number to the heap takes O(logn) in worst case (depending on how high in the tree it reaches when inserted as a leaf)

  2. Building a heap of n nodes, using n insertions, starting from an empty heap, is summed to O(n) time, using amortized analysis

  3. Removal of the minimum takes O(logn) time in worst case (depending on how low the new top node reaches, after it was swapped with the last leaf)

  4. Removal of all the minimums one by one, until the heap is empty, takes O(nlogn) time complexity


Reminder: The steps of "heapsort" algorithm are:

  • Add all the array values to a heap: summed to O(n) time complexity using the amortized-analysis trick
  • Pop the minimum out of the heap n times and place the i-th value in the i-th index of the array : O(nlogn) time complexity, as the amortized-analysis trick does not work when popping the minimum


My question is: Why the amortized-analysis trick does not work when emptying the heap, causing heap-sort algorithm to take O(nlogn) time and not O(n) time?

2
Is your question why the existing algorithm takes time O(n log n) or why there can't be a different way of doing all the dequeues in time O(n)?templatetypedef
It is actually the same question from my point of view - as if I could do all the dequeues in O(n), I could sort an array in O(n) - because I could create and empty a heap in O(n)SomethingSomething

2 Answers

1
votes

Assuming you're only allowed to learn about the relative ranking of two objects by comparing them, then there's no way to dequeue all elements from a binary heap in time O(n). If you could do this, then you could sort a list in time O(n) by building a heap in time O(n) and then dequeuing everything in time O(n). However, the sorting lower bound says that comparison sorts, in order to be correct, must have a runtime of Ω(n log n) on average. In other words, you can't dequeue from a heap too quickly or you'd break the sorting barrier.

There's also the question about why dequeuing n elements from a binary heap takes time O(n log n) and not something faster. This is a bit tricky to show, but here's the basic idea. Consider the first half of the dequeues you make on the heap. Look at the values that actually got dequeued and think about where they were in the heap to begin with. Excluding the ones on the bottom row, everything else that was dequeued had to percolate up to the top of the heap one swap at a time in order to be removed. You can show that there are enough elements in the heap to guarantee that this alone takes time Ω(n log n) because roughly half of those nodes will be deep in the tree. This explains why the amortized argument doesn't work - you're constantly pulling deep nodes up the heap, so the total distance the nodes have to travel is large. Compare this to the heapify operation, where most nodes travel very little distance.

1
votes

Let me show you "mathematically" how we can compute the complexity of transforming an arbitrary array into an heap (let me call this "heap build") and then sorting it with heapsort.

Heap build time analysis

In order to transform the array into an heap, we have to look at each node with children and "heapify" (sink) that node. You should ask yourself how many compares we perform; if you think about it, you see that (h = tree height):

  • For each node at level i, we make h-i compares: #comparesOneNode(i) = h-i
  • At level i, we have 2^i nodes: #nodes(i) = 2^i
  • So, generally T(n,i) = #nodes(i) * #comparesOneNode(i) = 2^i *(h-i), is the time spent for "compares" at level "i"

Let's make an example. Suppose to have an array of 15 elements, i.e., the height of the tree would be h = log2(15) = 3:

  • At level i=3, we have 2^3=8 nodes and we make 3-3 compares for each node: correct, since at level 3 we have only nodes without children, i.e., leaves. T(n, 3) = 2^3*(3-3) = 0
  • At level i=2, we have 2^2=4 nodes and we make 3-2 compares for each node: correct, since at level 2 we have only level 3 with which we can compare. T(n, 2) = 2^2*(3-2) = 4 * 1
  • At level i=1, we have 2^1=2 nodes and we make 3-1 compares for each node: T(n, 1) = 2^1*(3-1) = 2 * 2
  • At level i=0, we have 2^0=1 node, the root, and we make 3-0 compares: T(n, 0) = 2^0*(3-0) = 1 * 3

Ok, generally:

T(n) = sum(i=0 to h) 2^i * (h-i)

but if you remember that h = log2(n), we have

T(n) = sum(i=0 to log2(n)) 2^i * (log2(n) - i) =~ 2n

Heapsort time analysis

Now, here the analysis is really similar. Every time we "remove" the max element (root), we move to root the last leaf in the tree, heapify it and repeat till the end. So, how many compares do we perform here?

  • At level i, we have 2^i nodes: #nodes(i) = 2^i
  • For each node at level "i", heapify, in the worst case, will always do the same number of compares that is exactly equal to the level "i" (we take one node from level i, move it to root, call heapify, and heapify in worst case will bring back the node to level i, performing"i" compares): #comparesOneNode(i) = i
  • So, generally T(n,i) = #nodes(i) * #comparesOneNode(i) = 2^i*i, is the time spent for removing the first 2^i roots and bring back to the correct position the temporary roots.

Let's make an example. Suppose to have an array of 15 elements, i.e., the height of the tree would be h = log2(15) = 3:

  • At level i=3, we have 2^3=8 nodes and we need to move each one of them to the root place and then heapify each of them. Each heapify will perform in worst case "i" compares, because the root could sink down to the still existent level "i". T(n, 3) = 2^3 * 3 = 8*3
  • At level i=2, we have 2^2=4 nodes and we make 2 compares for each node: T(n, 2) = 2^2*2 = 4 * 2
  • At level i=1, we have 2^1=2 nodes and we make 1 compare for each node: T(n, 1) = 2^1*1 = 2 * 1
  • At level i=0, we have 2^0=1 node, the root, and we make 0 compares: T(n, 0) = 0

Ok, generally:

T(n) = sum(i=0 to h) 2^i * i

but if you remember that h = log2(n), we have

T(n) = sum(i=0 to log2(n)) 2^i * i =~ 2nlogn

Heap build VS heapsort

Intuitively, you can see that heapsort is not able to "amortise" his cost because every time we increase the number of nodes, more compares we have to do, while we have exactly the opposite in the heap build functionality! You can see here:

  • Heap build: T(n, i) ~ 2^i * (h-i), if i increases, #nodes increases, but #compares decreases
  • Heapsort: T(n, i) ~ 2^i * i, if i increases, #nodes increases and #compares increases

So:

  • Level i=3, #nodes(3)=8, Heap build does 0 compares, heapsort does 8*3 = 24 compares
  • Level i=2, #nodes(2)=4, Heap build does 4 compares, heapsort does 4*2 = 8 compares
  • Level i=1, #nodes(1)=2, Heap build does 4 compares, heapsort does 2*1 = 2 compares
  • Level i=0, #nodes(0)=1, Heap build does 3 compares, heapsort does 1*0 = 1 compares