QuickSort and MergeSort performance on Sequential data fit in memory vs Slow to Access Sequential data on disk

Question

The following quote is from "Comparison with other sort algorithms" section from Wikipedia Merge Sort page

On typical modern architectures, efficient quicksort implementations generally outperform mergesort for sorting RAM-based arrays.[citation needed] On the other hand, merge sort is a stable sort and is more efficient at handling slow-to-access sequential media.

My questions:

Why does Quicksort outperform Mergesort when the data to be sorted can all fit into memory? If all data needed are cached or in memory wouldn't it be fast for both Quicksort and Mergesort to access?
Why is Mergesort more efficient at handling slow-to-access sequential data (such as from disk in the case where the data to be sorted can't all fit into memory)?
(move from my comments below to here)In an array arr of primitives (data are sequential) of n elements. The pair of elements that has to be read and compared in MergeSort is arr[0] and arr[n/2] (happens in the final merge). Now think the pair of elements that has to be read and compared in QuickSort is arr[1] and arr[n] (happens in the first partition, assume we swap the randomly chosen pivot with the first element). We know data are read in blocks and load into cache, or disk to memory (correct me if I am wrong) then isn't there a better chance for the needed data gets load together in one block when using MergeSort? It just seems to me MergeSort would always have the upperhand because it is likely comparing elements that are closer together. I know this is False (see graph below) because QuickSort is obviously faster...... I know MergeSort is not in place and requires extra memory and that is likely to slow things down. Other than that what pieces am I missing in my analysis?

images are from Princeton CS MergeSort and QuickSort slides

My Motive:

I want to understand these above concepts because they are one of the main reasons of why mergeSort is preferred when sorting LinkedList,or none sequential data and quickSort is preferred when sorting Array, or sequential data. And why mergeSort is used to sort Object in Java and quickSort is used to sort primitive type in java.

update: Java 7 API actually uses TimSort to sort Object, which is a hybrid of MergeSort and InsertionSort. For primitives Dual-Pivot QuickSort. These changes were implemented starting in Java SE 7. This has to do with the stability of the sorting algorithm. Why does Java's Arrays.sort method use two different sorting algorithms for different types?

Edit:

I will appreciate an answer that addresses the following aspects:

I know the two sorting algorithms differ in the number of moves, read, and comparisons. If those are that reasons contribute to the behaviors I see listed in my questions (I suspected it) then a thorough explanation of how the steps and process of the sorting algorithm results it having advantages or disadvantages seeking data from disk or memory will be much appreciated.
Examples are welcome. I learn better with examples.

note: if you are reading @rcgldr's answer. check out our conversation in the chat room it has lots of good explanations and details. https://chat.stackoverflow.com/rooms/161554/discussion-between-rcgldr-and-oliver-koo

On a typical PC, quick sort is not going to be 3 times as fast as merge sort, more like 10% to 20% faster, depending on checks in quick sort to avoid worst case behaviors. — rcgldr

rcgldr rcgldr · Accepted Answer · 2017-12-19T02:28:11

The main difference is that merge sort does more moves, but fewer compares than quick sort. Even in the case of sorting an array of native types, quick sort is only around 15% faster, at least when I've tested it on large arrays of pseudo random 64 bit unsigned integers, which should be quick sort's best case, on my system (Intel 3770K 3.5ghz, Windows 7 Pro 64 bit, Visual Studio 2015, sorting 16 million pseudo random 64 bit unsigned integers, 1.32 seconds for quick sort, 1.55 seconds for merge sort, 1.32/1.55 ~= 0.85, so quick sort was about 15% faster than merge sort). My test was with a quick sort that had no checks to avoid worst case O(n^2) time or O(n) space. As checks are added to quick sort to reduce or prevent worst case behavior (like fall back to heap sort if recursion becomes too deep), the speed advantage decreases to less than 10% (which is the difference I get between VS2015's implementation of std::sort (modified quick sort) versus std::stable_sort (modified merge sort).

If sorting "strings", it's more likely that what is being sorted is an array of pointers (or references) to those strings. This is where merge sort is faster, because the moves involve pointers, while the compares involve a level of indirection and comparison of strings.

The main reason for choosing quick sort over merge sort is not speed, but space requirement. Merge sort normally uses a second array the same size as the original. Quick sort and top down merge sort also need log(n) stack frames for recursion, and for quick sort limiting stack space to log(n) stack frames is done by only recursing on the smaller partition, and looping back to handle the larger partition.

In terms of cache issues, most recent processors have 4 or 8 way associative caches. For merge sort, during a merge, the two input runs will end up in 2 of the cache lines, and the one output run in a 3rd cache line. Quick sort scans the data before doing swaps, so the scanned data will be in cache, although in separate lines if the two elements being compared / swapped are located far enough from each other.

For an external sort, some variation of bottom up merge sort is used. This because merge sort merge operations are sequential (the only random access occurs when starting up a new pair of runs), which is fast in the case of hard drives, or in legacy times, tape drives (a minimum of 3 tapes drives is needed). Each read or write can be for very large blocks of data, reducing average access time per element in the case of a hard drive, since a large number of elements are read or written at a time with each I/O.

It should also be noted that most merge sorts in libraries are also some variation of bottom up merge sort. Top down merge sort is mostly a teaching environment implementation.

If sorting an array of native types on a processor with 16 registers, such as an X86 in 64 bit mode, 8 of the registers used as start + end pointers (or references) for 4 runs, then a 4-way merge sort is often about the same or a bit faster than quick sort, assuming a compiler optimizes the pointers or references to be register based. It's a similar trade off, like quick sort, 4-way merge sort does more compares (1.5 x compares), but fewer moves (0.5 x moves) than traditional 2-way merge sort.

It should be noted that these sorts are cpu bound, not memory bound. I made a multi-threaded version of a bottom up merge sort, and in the case of using 4 threads, the sort was 3 times faster. Link to Windows example code using 4 threads:

https://codereview.stackexchange.com/questions/148025/multithreaded-bottom-up-merge-sort

QuickSort and MergeSort performance on Sequential data fit in memory vs Slow to Access Sequential data on disk

1 Answers