Mahalanobis Distance - Applications

Applications

Mahalanobis' discovery was prompted by the problem of identifying the similarities of skulls based on measurements in 1927.

Mahalanobis distance is widely used in cluster analysis and classification techniques. It is closely related to Hotelling's T-square distribution used for multivariate statistical testing and Fisher's Linear Discriminant Analysis that is used for supervised classification.

In order to use the Mahalanobis distance to classify a test point as belonging to one of N classes, one first estimates the covariance matrix of each class, usually based on samples known to belong to each class. Then, given a test sample, one computes the Mahalanobis distance to each class, and classifies the test point as belonging to that class for which the Mahalanobis distance is minimal.

Mahalanobis distance and leverage are often used to detect outliers, especially in the development of linear regression models. A point that has a greater Mahalanobis distance from the rest of the sample population of points is said to have higher leverage since it has a greater influence on the slope or coefficients of the regression equation. Mahalanobis distance is also used to determine multivariate outliers. Regression techniques can be used to determine if a specific case within a sample population is an outlier via the combination of two or more variable scores. A point can be a multivariate outlier even if it is not a univariate outlier on any variable (consider a probability density similar to a hollow cube in three dimensions, for example).

Mahalanobis distance was also widely used in biology, such as predicting protein structural class, predicting membrane protein type, predicting protein subcellular localization, as well as predicting many other attributes of proteins through their pseudo amino acid composition.

Read more about this topic:  Mahalanobis Distance