Data Transformation (statistics) - Common Transformations

Common Transformations

The logarithm and square root transformations are commonly used for positive data, and the multiplicative inverse (reciprocal) transformation can be used for non-zero data. The power transform is a family of transformations parametrized by a non-negative value λ that includes the logarithm, square root, and multiplicative inverse as special cases. To approach data transformation systematically, it is possible to use statistical estimation techniques to estimate the parameter λ in the power transform, thereby identifying the transform that is approximately the most appropriate in a given setting. Since the power transform family also includes the identity transform, this approach can also indicate whether it would be best to analyze the data without a transformation. In regression analysis, this approach is known as the Box-Cox technique.

The reciprocal and some power transformations can be meaningfully applied to data that include both positive and negative values (the power transform is invertible over all real numbers if λ is an odd integer). However when both negative and positive values are observed, it is more common to begin by adding a constant to all values, producing a set of non-negative data to which any power transform can be applied.

A common situation where a data transformation is applied is when a value of interest ranges over several orders of magnitude. Many physical and social phenomena exhibit such behavior — incomes, species populations, galaxy sizes, and rainfall volumes, to name a few. Power transforms, and in particular the logarithm, can often be used to induce symmetry in such data. The logarithm is often favored because it is easy to interpret its result in terms of "fold changes."

The logarithm also has a useful effect on ratios. If we are comparing positive quantities X and Y using the ratio X / Y, then if X < Y, the ratio is in the unit interval (0,1), whereas if X > Y, the ratio is in the half-line (1,∞), where the ratio of 1 corresponds to equality. In an analysis where X and Y are treated symmetrically, the log-ratio log(X / Y) is zero in the case of equality, and it has the property that if X is K times greater than Y, the log-ratio is the equidistant from zero as in the situation where Y is K times greater than X (the log-ratios are log(K) and −log(K) in these two situations).

If values are naturally restricted to be in the range 0 to 1, not including the end-points, then a logit transformation may be appropriate: this yields values in the range (−∞,∞).

Read more about this topic:  Data Transformation (statistics)

Famous quotes containing the word common:

    But genius is religious. It is a larger imbibing of the common heart.
    Ralph Waldo Emerson (1803–1882)