Long Tail - Statistical Meaning

Statistical Meaning

The long tail is the name for a long-known feature of some statistical distributions (such as Zipf, power laws, Pareto distributions and general Lévy distributions). The feature is also known as heavy tails, fat tails, power-law tails, or Pareto tails. In "long-tailed" distributions a high-frequency or high-amplitude population is followed by a low-frequency or low-amplitude population which gradually "tails off" asymptotically. The events at the far end of the tail have a very low probability of occurrence.

As a rule of thumb, for such population distributions the majority of occurrences (more than half, and where the Pareto principle applies, 80%) are accounted for by the first 20% of items in the distribution. What is unusual about a long-tailed distribution is that the most frequently occurring 20% of items represent less than 50% of occurrences; or in other words, the least frequently occurring 80% of items are more important as a proportion of the total population.

Power law distributions or functions characterize an important number of behaviors from nature and human endeavor. This fact has given rise to a keen scientific and social interest in such distributions, and the relationships that create them. The observation of such a distribution often points to specific kinds of mechanisms, and can often indicate a deep connection with other, seemingly unrelated systems. Examples of behaviors that exhibit long-tailed distribution are the occurrence of certain words in a given language, the income distribution of a business or the intensity of earthquakes (see: Gutenberg-Richter law).

Chris Anderson's and Clay Shirky's articles highlight special cases in which we are able to modify the underlying relationships and evaluate the impact on the frequency of events. In those cases the infrequent, low-amplitude (or low-revenue) events – the long tail, represented here by the portion of the curve to the right of the 20th percentile – can become the largest area under the line. This suggests that a variation of one mechanism (internet access) or relationship (the cost of storage) can significantly shift the frequency of occurrence of certain events in the distribution. The shift has a crucial effect in probability and in the customer demographics of businesses like mass media and online sellers.

However, the long tails characterizing distributions such as the Gutenberg-Richter law or the words-occurrence Zipf's law, and those highlighted by Anderson and Shirky are of very different, if not opposite, nature: Anderson and Shirky refer to frequency-rank relations, whereas the Gutenberg-Richter law and the Zipf's law are probability distributions. Therefore, in these latter cases "tails" correspond to large-intensity events such as large earthquakes and most popular words, who dominate the distributions. By contrast, the long tails in the frequency-rank plots highlighted by Anderson and Shirky would rather correspond to short tails in the associated probability distributions, and therefore illustrate an opposite phenomenon compared to the Gutenberg-Richter and the Zipf's laws.

Read more about this topic:  Long Tail

Famous quotes containing the word meaning:

    The meaning of the Street in all ways and at all times is the need for sharing life with others and the search for community.
    Virginia Hamilton (b. 1936)