Average-case Analysis Using Discrete Probability
Quicksort takes O(n log n) time on average, when the input is a random permutation. Why? For a start, it is not hard to see that the partition operation takes O(n) time.
In the most unbalanced case, each time we perform a partition we divide the list into two sublists of size 0 and (for example, if all elements of the array are equal). This means each recursive call processes a list of size one less than the previous list. Consequently, we can make nested calls before we reach a list of size 1. This means that the call tree is a linear chain of nested calls. The th call does work to do the partition, and, so in that case Quicksort take time. That is the worst case: given knowledge of which comparisons are performed by the sort, there are adaptive algorithms that are effective at generating worst-case input for quicksort on-the-fly, regardless of the pivot selection strategy.
In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. This means each recursive call processes a list of half the size. Consequently, we can make only nested calls before we reach a list of size 1. This means that the depth of the call tree is . But no two calls at the same level of the call tree process the same part of the original list; thus, each level of calls needs only O(n) time all together (each call has some constant overhead, but since there are only O(n) calls at each level, this is subsumed in the O(n) factor). The result is that the algorithm uses only O(n log n) time.
In fact, it's not necessary to be perfectly balanced; even if each pivot splits the elements with 75% on one side and 25% on the other side (or any other fixed fraction), the call depth is still limited to, so the total running time is still O(n log n).
So what happens on average? If the pivot has rank somewhere in the middle 50 percent, that is, between the 25th percentile and the 75th percentile, then it splits the elements with at least 25% and at most 75% on each side. If we could consistently choose a pivot from the two middle 50 percent, we would only have to split the list at most times before reaching lists of size 1, yielding an O(n log n) algorithm.
When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when we start from a random permutation, in each recursive call the pivot has a random rank in its list, and so it is in the middle 50 percent about half the time. That is good enough. Imagine that you flip a coin: heads means that the rank of the pivot is in the middle 50 percent, tail means that it isn't. Imagine that you are flipping a coin over and over until you get k heads. Although this could take a long time, on average only 2k flips are required, and the chance that you won't get heads after flips is highly improbable (this can be made rigorous using Chernoff bounds). By the same argument, Quicksort's recursion will terminate on average at a call depth of only . But if its average call depth is O(log n), and each level of the call tree processes at most elements, the total amount of work done on average is the product, O(n log n). Note that the algorithm does not have to verify that the pivot is in the middle half—if we hit it any constant fraction of the times, that is enough for the desired complexity.
Read more about this topic: Quicksort, Formal Analysis
Famous quotes containing the words analysis, discrete and/or probability:
“Ask anyone committed to Marxist analysis how many angels on the head of a pin, and you will be asked in return to never mind the angels, tell me who controls the production of pins.”
—Joan Didion (b. 1934)
“One can describe a landscape in many different words and sentences, but one would not normally cut up a picture of a landscape and rearrange it in different patterns in order to describe it in different ways. Because a photograph is not composed of discrete units strung out in a linear row of meaningful pieces, we do not understand it by looking at one element after another in a set sequence. The photograph is understood in one act of seeing; it is perceived in a gestalt.”
—Joshua Meyrowitz, U.S. educator, media critic. The Blurring of Public and Private Behaviors, No Sense of Place: The Impact of Electronic Media on Social Behavior, Oxford University Press (1985)
“Legends of prediction are common throughout the whole Household of Man. Gods speak, spirits speak, computers speak. Oracular ambiguity or statistical probability provides loopholes, and discrepancies are expunged by Faith.”
—Ursula K. Le Guin (b. 1929)