Let me show you "mathematically" how we can compute the complexity of transforming an arbitrary array into an heap (let me call this "heap build") and then sorting it with heapsort.
Heap build time analysis
In order to transform the array into an heap, we have to look at each node with children and "heapify" (sink) that node. You should ask yourself how many compares we perform; if you think about it, you see that (h = tree height):
- For each node at level i, we make h-i compares: #comparesOneNode(i) = h-i
- At level i, we have 2^i nodes: #nodes(i) = 2^i
- So, generally T(n,i) = #nodes(i) * #comparesOneNode(i) = 2^i *(h-i), is the time spent for "compares" at level "i"
Let's make an example. Suppose to have an array of 15 elements, i.e., the height of the tree would be h = log2(15) = 3:
- At level i=3, we have 2^3=8 nodes and we make 3-3 compares for each node: correct, since at level 3 we have only nodes without children, i.e., leaves. T(n, 3) = 2^3*(3-3) = 0
- At level i=2, we have 2^2=4 nodes and we make 3-2 compares for each node: correct, since at level 2 we have only level 3 with which we can compare. T(n, 2) = 2^2*(3-2) = 4 * 1
- At level i=1, we have 2^1=2 nodes and we make 3-1 compares for each node: T(n, 1) = 2^1*(3-1) = 2 * 2
- At level i=0, we have 2^0=1 node, the root, and we make 3-0 compares: T(n, 0) = 2^0*(3-0) = 1 * 3
Ok, generally:
T(n) = sum(i=0 to h) 2^i * (h-i)
but if you remember that h = log2(n), we have
T(n) = sum(i=0 to log2(n)) 2^i * (log2(n) - i) =~ 2n
Heapsort time analysis
Now, here the analysis is really similar. Every time we "remove" the max element (root), we move to root the last leaf in the tree, heapify it and repeat till the end. So, how many compares do we perform here?
- At level i, we have 2^i nodes: #nodes(i) = 2^i
- For each node at level "i", heapify, in the worst case, will always do the same number of compares that is exactly equal to the level "i" (we take one node from level i, move it to root, call heapify, and heapify in worst case will bring back the node to level i, performing"i" compares): #comparesOneNode(i) = i
- So, generally T(n,i) = #nodes(i) * #comparesOneNode(i) = 2^i*i, is the time spent for removing the first 2^i roots and bring back to the correct position the temporary roots.
Let's make an example. Suppose to have an array of 15 elements, i.e., the height of the tree would be h = log2(15) = 3:
- At level i=3, we have 2^3=8 nodes and we need to move each one of them to the root place and then heapify each of them. Each heapify will perform in worst case "i" compares, because the root could sink down to the still existent level "i". T(n, 3) = 2^3 * 3 = 8*3
- At level i=2, we have 2^2=4 nodes and we make 2 compares for each node: T(n, 2) = 2^2*2 = 4 * 2
- At level i=1, we have 2^1=2 nodes and we make 1 compare for each node: T(n, 1) = 2^1*1 = 2 * 1
- At level i=0, we have 2^0=1 node, the root, and we make 0 compares: T(n, 0) = 0
Ok, generally:
T(n) = sum(i=0 to h) 2^i * i
but if you remember that h = log2(n), we have
T(n) = sum(i=0 to log2(n)) 2^i * i =~ 2nlogn
Heap build VS heapsort
Intuitively, you can see that heapsort is not able to "amortise" his cost because every time we increase the number of nodes, more compares we have to do, while we have exactly the opposite in the heap build functionality! You can see here:
- Heap build: T(n, i) ~ 2^i * (h-i), if i increases, #nodes increases, but #compares decreases
- Heapsort: T(n, i) ~ 2^i * i, if i increases, #nodes increases and #compares increases
So:
- Level i=3, #nodes(3)=8, Heap build does 0 compares, heapsort does 8*3 = 24 compares
- Level i=2, #nodes(2)=4, Heap build does 4 compares, heapsort does 4*2 = 8 compares
- Level i=1, #nodes(1)=2, Heap build does 4 compares, heapsort does 2*1 = 2 compares
- Level i=0, #nodes(0)=1, Heap build does 3 compares, heapsort does 1*0 = 1 compares