In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying exclusively relative information.
This definition, given by John Aitchison (1986) has several consequences:
- A compositional data point, or composition for short, can be represented by a positive real vector with as many parts as considered. Sometimes, if the total amount is fixed and known, one component of the vector can be omitted.
- As compositions only carry relative information, the only information is given by the ratios between components. Consequently, a composition multiplied by any positive constant contains the same information as the former. Therefore, proportional positive vectors are equivalent when considered as compositions.
- As usual in mathematics, equivalent classes are represented by some element of the class, called a representative. Thus, equivalent compositions can be represented by positive vectors whose components add to a given constant . The vector operation assigning the constant sum representative is called closure and is denoted by :
where D is the number of parts (components) and denotes a row vector.
- Compositional data can be represented by constant sum real vectors with positive components, and this vectors span a simplex, defined as
This is the reason why is considered to be the sample space of compositional data. The positive constant is arbitrary. Frequent values for are 1 (per unit), 100 (percent, %), 1000, 106 (ppm), 109 (ppb), ...
- In statistics, compositional data is frequently considered to be data in which each data point is an D-tuple of nonnegative numbers whose sum is 1. Typically each of the D components xi of each data point says what proportion (or "percentage") of a statistical unit falls into the ith category in a list of D categories. Very often ternary plots are used in analysis of compositional data to represent a three part composition.
- An alternative nomenclatures for compositional analysis is simplicial analysis, motivated by the concept of simplicial sets.
Remarks on the definition of the simplex:
- In mathematical frameworks, the superscript of, accounting for the number of parts, is often changed to D − 1, describing the dimension.
- The components of the vector are assumed to be positive. However, in some definitions of the simplex, non-negative components are admitted. Here null components are avoided, because ratios between components of which some are zero are meaningless.
Read more about Compositional Data: Examples
Famous quotes containing the word data:
“Mental health data from the 1950s on middle-aged women showed them to be a particularly distressed group, vulnerable to depression and feelings of uselessness. This isnt surprising. If society tells you that your main role is to be attractive to men and you are getting crows feet, and to be a mother to children and yours are leaving home, no wonder you are distressed.”
—Grace Baruch (20th century)