Long Tail - Statistical Meaning

Statistical Meaning

The long tail is the name for a long-known feature of some statistical distributions (such as Zipf, power laws, Pareto distributions and general Lévy distributions). The feature is also known as heavy tails, fat tails, power-law tails, or Pareto tails. In "long-tailed" distributions a high-frequency or high-amplitude population is followed by a low-frequency or low-amplitude population which gradually "tails off" asymptotically. The events at the far end of the tail have a very low probability of occurrence.

As a rule of thumb, for such population distributions the majority of occurrences (more than half, and where the Pareto principle applies, 80%) are accounted for by the first 20% of items in the distribution. What is unusual about a long-tailed distribution is that the most frequently occurring 20% of items represent less than 50% of occurrences; or in other words, the least frequently occurring 80% of items are more important as a proportion of the total population.

Power law distributions or functions characterize an important number of behaviors from nature and human endeavor. This fact has given rise to a keen scientific and social interest in such distributions, and the relationships that create them. The observation of such a distribution often points to specific kinds of mechanisms, and can often indicate a deep connection with other, seemingly unrelated systems. Examples of behaviors that exhibit long-tailed distribution are the occurrence of certain words in a given language, the income distribution of a business or the intensity of earthquakes (see: Gutenberg-Richter law).

Chris Anderson's and Clay Shirky's articles highlight special cases in which we are able to modify the underlying relationships and evaluate the impact on the frequency of events. In those cases the infrequent, low-amplitude (or low-revenue) events – the long tail, represented here by the portion of the curve to the right of the 20th percentile – can become the largest area under the line. This suggests that a variation of one mechanism (internet access) or relationship (the cost of storage) can significantly shift the frequency of occurrence of certain events in the distribution. The shift has a crucial effect in probability and in the customer demographics of businesses like mass media and online sellers.

However, the long tails characterizing distributions such as the Gutenberg-Richter law or the words-occurrence Zipf's law, and those highlighted by Anderson and Shirky are of very different, if not opposite, nature: Anderson and Shirky refer to frequency-rank relations, whereas the Gutenberg-Richter law and the Zipf's law are probability distributions. Therefore, in these latter cases "tails" correspond to large-intensity events such as large earthquakes and most popular words, who dominate the distributions. By contrast, the long tails in the frequency-rank plots highlighted by Anderson and Shirky would rather correspond to short tails in the associated probability distributions, and therefore illustrate an opposite phenomenon compared to the Gutenberg-Richter and the Zipf's laws.

Read more about this topic:  Long Tail

Famous quotes containing the word meaning:

    A good education ought to help people to become both more receptive to and more discriminating about the world: seeing, feeling, and understanding more, yet sorting the pertinent from the irrelevant with an ever finer touch, increasingly able to integrate what they see and to make meaning of it in ways that enhance their ability to go on growing.
    Laurent A. Daloz (20th century)