Quicksort - Formal Analysis - Average-case Analysis Using Discrete Probability

Average-case Analysis Using Discrete Probability

Quicksort takes O(n log n) time on average, when the input is a random permutation. Why? For a start, it is not hard to see that the partition operation takes O(n) time.

In the most unbalanced case, each time we perform a partition we divide the list into two sublists of size 0 and (for example, if all elements of the array are equal). This means each recursive call processes a list of size one less than the previous list. Consequently, we can make nested calls before we reach a list of size 1. This means that the call tree is a linear chain of nested calls. The th call does work to do the partition, and, so in that case Quicksort take time. That is the worst case: given knowledge of which comparisons are performed by the sort, there are adaptive algorithms that are effective at generating worst-case input for quicksort on-the-fly, regardless of the pivot selection strategy.

In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. This means each recursive call processes a list of half the size. Consequently, we can make only nested calls before we reach a list of size 1. This means that the depth of the call tree is . But no two calls at the same level of the call tree process the same part of the original list; thus, each level of calls needs only O(n) time all together (each call has some constant overhead, but since there are only O(n) calls at each level, this is subsumed in the O(n) factor). The result is that the algorithm uses only O(n log n) time.

In fact, it's not necessary to be perfectly balanced; even if each pivot splits the elements with 75% on one side and 25% on the other side (or any other fixed fraction), the call depth is still limited to, so the total running time is still O(n log n).

So what happens on average? If the pivot has rank somewhere in the middle 50 percent, that is, between the 25th percentile and the 75th percentile, then it splits the elements with at least 25% and at most 75% on each side. If we could consistently choose a pivot from the two middle 50 percent, we would only have to split the list at most times before reaching lists of size 1, yielding an O(n log n) algorithm.

When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when we start from a random permutation, in each recursive call the pivot has a random rank in its list, and so it is in the middle 50 percent about half the time. That is good enough. Imagine that you flip a coin: heads means that the rank of the pivot is in the middle 50 percent, tail means that it isn't. Imagine that you are flipping a coin over and over until you get k heads. Although this could take a long time, on average only 2k flips are required, and the chance that you won't get heads after flips is highly improbable (this can be made rigorous using Chernoff bounds). By the same argument, Quicksort's recursion will terminate on average at a call depth of only . But if its average call depth is O(log n), and each level of the call tree processes at most elements, the total amount of work done on average is the product, O(n log n). Note that the algorithm does not have to verify that the pivot is in the middle half—if we hit it any constant fraction of the times, that is enough for the desired complexity.

Read more about this topic:  Quicksort, Formal Analysis

Famous quotes containing the words analysis, discrete and/or probability:

    The spider-mind acquires a faculty of memory, and, with it, a singular skill of analysis and synthesis, taking apart and putting together in different relations the meshes of its trap. Man had in the beginning no power of analysis or synthesis approaching that of the spider, or even of the honey-bee; but he had acute sensibility to the higher forces.
    Henry Brooks Adams (1838–1918)

    The mastery of one’s phonemes may be compared to the violinist’s mastery of fingering. The violin string lends itself to a continuous gradation of tones, but the musician learns the discrete intervals at which to stop the string in order to play the conventional notes. We sound our phonemes like poor violinists, approximating each time to a fancied norm, and we receive our neighbor’s renderings indulgently, mentally rectifying the more glaring inaccuracies.
    W.V. Quine (b. 1908)

    Crushed to earth and rising again is an author’s gymnastic. Once he fails to struggle to his feet and grab his pen, he will contemplate a fact he should never permit himself to face: that in all probability books have been written, are being written, will be written, better than anything he has done, is doing, or will do.
    Fannie Hurst (1889–1968)