2
votes

I'm studying the c++ stl algorithms and am trying to understand the heap ones (make_heap, sort_heap, push_heap, etc). I understand the binary tree it creates and how this would be useful for finding the highest priority item at any given time efficiently. It's not clear to me though how this would be better than maintaining a sorted container in which the highest valued item could simply be popped off the top.

Here's the comparisons, as I see it:

For heaps:

  • Create the heap: make_heap: O(nlogn) ...wrong! O(n)
  • Insertion: push_heap: O(logn)
  • Remove highest: pop_heap: O(logn)
  • Get sorted list: sort_heap: O(nlogn)

For a sorted container:

  • Initial sort: sort() O(nlogn)
  • Insertion: ? O(logn)
  • Remove highest: pop constant
  • Get sorted list: already sorted constant

Insertion would require finding the element with the same value or the two elements on either side of the value. In a sorted list this should be O(logn). I don't know a particular algorithm that would do this yet.

In any case, what am I missing? It seems from my analysis above that just maintaining a sorted container is no worse in all cases and better in some.

Thanks!


Edit:

I want to point this out so no one reads this and becomes confused by my question. As rici pointed out, I was mistaken about the complexity of make_heap. I initially thought it was O(n log n), but instead it's O(n). That difference, for me, explains why make_heap would be better than simply maintaining a sorted list.

3
By "sorted container", do you mean something like a std::vector that has been sorted? Because then insertion is O(n). Or do you mean a container like std::set that internally maintains order by using a tree structure in the background?j_random_hacker

3 Answers

3
votes

make_heap is O(n), not O(n log n). For large datasets, it can be quite a lot faster than sorting.

The O(log n) operations (push and remove_min) are probably only a little faster than a balanced binary tree implementation, but their storage overhead is a lot lower, which also improves their cache friendliness. For small objects, like integers, a heap uses around a quarter of the memory that a std::set uses.

2
votes

Insertion for a sorted list is not O(log n), its worst case O(n) because you have to shift elements in the array.

1
votes
  1. You may not want the entire sort sequence.
  2. You may want to get hold of the first item quickly and spread the computational load for sorting among all the entries on retrieval, instead of paying it all up front.
  3. You may want to process a collection that is larger than can fit into memory. A heap can produce a sorted run of on average 2N its own size. This technique is used in sort-merge programs, when the number of initial runs dominates the time to sort each run.

In these situations, heaps win.