Compositional Data

In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying exclusively relative information.

This definition, given by John Aitchison (1986) has several consequences:

  • A compositional data point, or composition for short, can be represented by a positive real vector with as many parts as considered. Sometimes, if the total amount is fixed and known, one component of the vector can be omitted.
  • As compositions only carry relative information, the only information is given by the ratios between components. Consequently, a composition multiplied by any positive constant contains the same information as the former. Therefore, proportional positive vectors are equivalent when considered as compositions.
  • As usual in mathematics, equivalent classes are represented by some element of the class, called a representative. Thus, equivalent compositions can be represented by positive vectors whose components add to a given constant . The vector operation assigning the constant sum representative is called closure and is denoted by :

where D is the number of parts (components) and denotes a row vector.

  • Compositional data can be represented by constant sum real vectors with positive components, and this vectors span a simplex, defined as

This is the reason why is considered to be the sample space of compositional data. The positive constant is arbitrary. Frequent values for are 1 (per unit), 100 (percent, %), 1000, 106 (ppm), 109 (ppb), ...

  • In statistics, compositional data is frequently considered to be data in which each data point is an D-tuple of nonnegative numbers whose sum is 1. Typically each of the D components xi of each data point says what proportion (or "percentage") of a statistical unit falls into the ith category in a list of D categories. Very often ternary plots are used in analysis of compositional data to represent a three part composition.
  • An alternative nomenclatures for compositional analysis is simplicial analysis, motivated by the concept of simplicial sets.

Remarks on the definition of the simplex:

  • In mathematical frameworks, the superscript of, accounting for the number of parts, is often changed to D − 1, describing the dimension.
  • The components of the vector are assumed to be positive. However, in some definitions of the simplex, non-negative components are admitted. Here null components are avoided, because ratios between components of which some are zero are meaningless.

Read more about Compositional Data:  Examples

Famous quotes containing the word data:

    To write it, it took three months; to conceive it three minutes; to collect the data in it—all my life.
    F. Scott Fitzgerald (1896–1940)