Merge Sort - Analysis

Analysis

In sorting n objects, merge sort has an average and worst-case performance of O(n log n). If the running time of merge sort for a list of length n is T(n), then the recurrence T(n) = 2T(n/2) + n follows from the definition of the algorithm (apply the algorithm to two lists of half the size of the original list, and add the n steps taken to merge the resulting two lists). The closed form follows from the master theorem.

In the worst case, the number of comparisons merge sort makes is equal to or slightly smaller than (n ⌈lg n⌉ - 2⌈lg n⌉ + 1), which is between (n lg n - n + 1) and (n lg n + n + O(lg n)).

For large n and a randomly ordered input list, merge sort's expected (average) number of comparisons approaches α·n fewer than the worst case where

In the worst case, merge sort does about 39% fewer comparisons than quicksort does in the average case; merge sort always makes fewer comparisons than quicksort, except in extremely rare cases, when they tie, where merge sort's worst case is found simultaneously with quicksort's best case. In terms of moves, merge sort's worst case complexity is O(n log n)—the same complexity as quicksort's best case, and merge sort's best case takes about half as many iterations as the worst case.

Recursive implementations of merge sort make 2n − 1 method calls in the worst case, compared to quicksort's n, thus merge sort has roughly twice as much recursive overhead as quicksort. However, iterative, non-recursive implementations of merge sort, avoiding method call overhead, are not difficult to code. Merge sort's most common implementation does not sort in place; therefore, the memory size of the input must be allocated for the sorted output to be stored in (see below for versions that need only n/2 extra spaces).

Stable sorting in-place is possible but is more complicated, and usually a bit slower, even if the algorithm also runs in O(n log n) time (Katajainen, Pasanen & Teuhola 1996). One way to sort in-place is to merge the blocks recursively. Like the standard merge sort, in-place merge sort is also a stable sort. Stable sorting of linked lists is simpler. In this case the algorithm does not use more space than that the already used by the list representation, but the O(log(k)) used for the recursion trace.

Merge sort is more efficient than quicksort for some types of lists if the data to be sorted can only be efficiently accessed sequentially, and is thus popular in languages such as Lisp, where sequentially accessed data structures are very common. Unlike some (efficient) implementations of quicksort, merge sort is a stable sort as long as the merge operation is implemented properly.

Merge sort also has some demerits. One is its use of 2n locations; the additional n locations are commonly used because merging two sorted sets in place is more complicated and would need more comparisons and move operations. But despite the use of this space the algorithm still does a lot of work: The contents of m are first copied into left and right and later into the list result on each invocation of merge_sort (variable names according to the pseudocode above). An alternative to this copying is to associate a new field of information with each key (the elements in m are called keys). This field will be used to link the keys and any associated information together in a sorted list (a key and its related information is called a record). Then the merging of the sorted lists proceeds by changing the link values; no records need to be moved at all. A field which contains only a link will generally be smaller than an entire record so less space will also be used.

Another alternative for reducing the space overhead to n/2 is to maintain left and right as a combined structure, copy only the left part of m into temporary space, and to direct the merge routine to place the merged output into m. With this version it is better to allocate the temporary space outside the merge routine, so that only one allocation is needed. The excessive copying mentioned in the previous paragraph is also mitigated, since the last pair of lines before the return result statement (function merge in the pseudo code above) become superfluous.

Merge sort can also be done with merging more than two sublists at a time, using the n-way merge algorithm. However, the number of operations is approximately the same. Consider merging k sublists at a time, where for simplicity k is a power of 2. The recurrence relation becomes T(n) = k T(n/k) + O(n log k). (The last part comes from the merge algorithm, which when implemented optimally using a heap or self-balancing binary search tree, takes O (log k) time per element.) If you take the recurrence relation for regular merge sort (T(n) = 2T(n/2) + O(n)) and expand it out log2k times, you get the same recurrence relation. This is true even if k is not a constant.

Read more about this topic:  Merge Sort

Famous quotes containing the word analysis:

    ... the big courageous acts of life are those one never hears of and only suspects from having been through like experience. It takes real courage to do battle in the unspectacular task. We always listen for the applause of our co-workers. He is courageous who plods on, unlettered and unknown.... In the last analysis it is this courage, developing between man and his limitations, that brings success.
    Alice Foote MacDougall (1867–1945)

    Whatever else American thinkers do, they psychologize, often brilliantly. The trouble is that psychology only takes us so far. The new interest in families has its merits, but it will have done us all a disservice if it turns us away from public issues to private matters. A vision of things that has no room for the inner life is bankrupt, but a psychology without social analysis or politics is both powerless and very lonely.
    Joseph Featherstone (20th century)