Average-case Analysis Using Discrete Probability
Quicksort takes O(n log n) time on average, when the input is a random permutation. Why? For a start, it is not hard to see that the partition operation takes O(n) time.
In the most unbalanced case, each time we perform a partition we divide the list into two sublists of size 0 and (for example, if all elements of the array are equal). This means each recursive call processes a list of size one less than the previous list. Consequently, we can make nested calls before we reach a list of size 1. This means that the call tree is a linear chain of nested calls. The th call does work to do the partition, and, so in that case Quicksort take time. That is the worst case: given knowledge of which comparisons are performed by the sort, there are adaptive algorithms that are effective at generating worst-case input for quicksort on-the-fly, regardless of the pivot selection strategy.
In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. This means each recursive call processes a list of half the size. Consequently, we can make only nested calls before we reach a list of size 1. This means that the depth of the call tree is . But no two calls at the same level of the call tree process the same part of the original list; thus, each level of calls needs only O(n) time all together (each call has some constant overhead, but since there are only O(n) calls at each level, this is subsumed in the O(n) factor). The result is that the algorithm uses only O(n log n) time.
In fact, it's not necessary to be perfectly balanced; even if each pivot splits the elements with 75% on one side and 25% on the other side (or any other fixed fraction), the call depth is still limited to, so the total running time is still O(n log n).
So what happens on average? If the pivot has rank somewhere in the middle 50 percent, that is, between the 25th percentile and the 75th percentile, then it splits the elements with at least 25% and at most 75% on each side. If we could consistently choose a pivot from the two middle 50 percent, we would only have to split the list at most times before reaching lists of size 1, yielding an O(n log n) algorithm.
When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when we start from a random permutation, in each recursive call the pivot has a random rank in its list, and so it is in the middle 50 percent about half the time. That is good enough. Imagine that you flip a coin: heads means that the rank of the pivot is in the middle 50 percent, tail means that it isn't. Imagine that you are flipping a coin over and over until you get k heads. Although this could take a long time, on average only 2k flips are required, and the chance that you won't get heads after flips is highly improbable (this can be made rigorous using Chernoff bounds). By the same argument, Quicksort's recursion will terminate on average at a call depth of only . But if its average call depth is O(log n), and each level of the call tree processes at most elements, the total amount of work done on average is the product, O(n log n). Note that the algorithm does not have to verify that the pivot is in the middle half—if we hit it any constant fraction of the times, that is enough for the desired complexity.
Read more about this topic: Quicksort, Formal Analysis
Famous quotes containing the words analysis, discrete and/or probability:
“... the big courageous acts of life are those one never hears of and only suspects from having been through like experience. It takes real courage to do battle in the unspectacular task. We always listen for the applause of our co-workers. He is courageous who plods on, unlettered and unknown.... In the last analysis it is this courage, developing between man and his limitations, that brings success.”
—Alice Foote MacDougall (18671945)
“We have good reason to believe that memories of early childhood do not persist in consciousness because of the absence or fragmentary character of language covering this period. Words serve as fixatives for mental images. . . . Even at the end of the second year of life when word tags exist for a number of objects in the childs life, these words are discrete and do not yet bind together the parts of an experience or organize them in a way that can produce a coherent memory.”
—Selma H. Fraiberg (20th century)
“Crushed to earth and rising again is an authors gymnastic. Once he fails to struggle to his feet and grab his pen, he will contemplate a fact he should never permit himself to face: that in all probability books have been written, are being written, will be written, better than anything he has done, is doing, or will do.”
—Fannie Hurst (18891968)