Power Law - Power-law Probability Distributions - Graphical Methods For Identification

Graphical Methods For Identification

Although more sophisticated and robust methods have been proposed, the most frequently used graphical methods of identifying power-law probability distributions using random samples are Pareto quantile-quantile plots (or Pareto Q-Q plots), mean residual life plots and log-log plots. Another, more robust graphical method uses bundles of residual quantile functions. (Please keep in mind that power-law distributions are also called Pareto-type distributions.) It is assumed here that a random sample is obtained from a probability distribution, and that we want to know if the tail of the distribution follows a power law (in other words, we want to know if the distribution has a "Pareto tail"). Here, the random sample is called "the data".

Pareto Q-Q plots compare the quantiles of the log-transformed data to the corresponding quantiles of an exponential distribution with mean 1 (or to the quantiles of a standard Pareto distribution) by plotting the former versus the latter. If the resultant scatterplot suggests that the plotted points " asymptotically converge" to a straight line, then a power-law distribution should be suspected. A limitation of Pareto Q-Q plots is that they behave poorly when the tail index (also called Pareto index) is close to 0, because Pareto Q-Q plots are not designed to identify distributions with slowly varying tails.

On the other hand, in its version for identifying power-law probability distributions, the mean residual life plot consists of first log-transforming the data, and then plotting the average of those log-transformed data that are higher than the i-th order statistic versus the i-th order statistic, for all i=1,...,n, where n is the size of the random sample. If the resultant scatterplot suggests that the plotted points tend to "stabilize" about a horizontal straight line, then a power-law distribution should be suspected. Since the mean residual life plot is very sensitive to outliers (it is not robust), it usually produces plots that are difficult to interpret; for this reason, such plots are usually called Hill horror plots

Log-log plots are an alternative way of graphically examining the tail of a distribution using a random sample. This method consists of plotting the logarithm of an estimator of the probability that a particular number of the distribution occurs versus the logarithm of that particular number. Usually, this estimator is the proportion of times that the number occurs in the data set. If the points in the plot tend to "converge" to a straight line for large numbers in the x axis, then the researcher concludes that the distribution has a power-law tail. Examples of the application of these types of plot have been published. A disadvantage of these plots is that, in order for them to provide reliable results, they require huge amounts of data. In addition, they are appropriate only for discrete (or grouped) data.

Another graphical method for the identification of power-law probability distributions using random samples has been proposed. This methodology consists of plotting a bundle for the log-transformed sample. Originally proposed as a tool to explore the existence of moments and the moment generation function using random samples, the bundle methodology is based on residual quantile functions (RQFs), also called residual percentile functions, which provide a full characterization of the tail behavior of many well-known probability distributions, including power-law distributions, distributions with other types of heavy tails, and even non-heavy-tailed distributions. Bundle plots do not have the disadvantages of Pareto Q-Q plots, mean residual life plots and log-log plots mentioned above (they are robust to outliers, allow visually identifying power laws with small values of, and do not demand the collection of much data). In addition, other types of tail behavior can be identified using bundle plots.

Read more about this topic:  Power Law, Power-law Probability Distributions

Famous quotes containing the word methods:

    We are lonesome animals. We spend all our life trying to be less lonesome. One of our ancient methods is to tell a story begging the listener to say—and to feel—”Yes, that’s the way it is, or at least that’s the way I feel it. You’re not as alone as you thought.”
    John Steinbeck (1902–1968)