The Heavy-tail Distribution
Heavy-tail distributions have properties that are qualitatively different from commonly used (memoryless) distributions such as the Poisson distribution.
The Hurst parameter H is a measure of the level of self-similarity of a time series that exhibits long-range dependence, to which the heavy-tail distribution can be applied. H takes on values from 0.5 to 1. A value of 0.5 indicates the data is uncorrelated or has only short-range correlations. The closer H is to 1, the greater the degree of persistence or long-range dependence .
Typical values of the Hurst parameter, H:
- Any pure random process has H = 0.5
- Phenomena with H > 0.5 typically have a complex process structure.
A distribution is said to be heavy-tailed if:
This means that regardless of the distribution for small values of the random variable, if the asymptotic shape of the distribution is hyperbolic, it is heavy-tailed. The simplest heavy-tail distribution is the Pareto distribution which is hyperbolic over its entire range. Complementary distribution functions for the exponential and Pareto distributions are shown below. Shown on the left is a graph of the distributions shown on linear axes, spanning a large domain . To its right is a graph of the complementary distribution functions over a smaller domain, and with a logarithmic range .
If the logarithm of the range of an exponential distribution is taken, the resulting plot is linear. In contrast, that of the heavy-tail distribution is still curvilinear. These characteristics can be clearly seen on the graph above to the right. A characteristic of long-tail distributions is that if the logarithm of both the range and the domain is taken, the tail of the long-tail distribution is approximately linear over many orders of magnitude . In the graph above left, the condition for the existence of a heavy-tail distribution, as previously presented, is not met by the curve labelled "Gamma-Exponential Tail".
The probability mass function of a heavy-tail distribution is given by:
and its cumulative distribution function is given by:
where k represents the smallest value the random variable can take.
Readers interested in a more rigorous mathematical treatment of the subject are referred to the external links section.
Read more about this topic: Long-tail Traffic
Famous quotes containing the word distribution:
“Classical and romantic: private language of a family quarrel, a dead dispute over the distribution of emphasis between man and nature.”
—Cyril Connolly (19031974)