Dirichlet Process

In probability theory, a Dirichlet process is a random process that is a probability distribution whose domain is itself a random distribution.

Given a Dirichlet process, where (the base distribution or base measure) is an arbitrary distribution and (the concentration parameter) is a positive real number, a draw from will return a random distribution (the output distribution) containing values drawn from . That is, the support of the output distribution is the same as the base distribution. The output distribution will be discrete, meaning that individual values drawn from the distribution will sometimes repeat themselves even if the base distribution is continuous (i.e., if two different draws from the base distribution will be distinct with probability one). The extent to which values will repeat is determined by, with higher values causing less repetition. If the base distribution is continuous, so that separate draws from it always return distinct values, then the infinite set of probabilities corresponding to the frequency of each possible value that the output distribution can return are distributed according to a stick-breaking process.

Note that the Dirichlet process is a stochastic process, meaning that technically speaking it is an infinite sequence of random variables, rather than a single random distribution. The relation between the two is as follows. Consider the Dirichlet process as defined above, as a distribution over random distributions, and call this process . We can call this the distribution-centered view of the Dirichlet process. First, draw a random output distribution from this process, and then consider an infinite sequence of random variables representing values drawn from this distribution. Note that, conditioned on the output distribution, the variables are independent identically distributed. Now, consider instead the distribution of the random variables that results from marginalizing out (integrating over) the random output distribution. (This makes all the variables dependent on each other. However, they are still exchangeable, meaning that the marginal distribution of one variable is the same as that of all other variables. That is, they are "identically distributed" but not "independent".) The resulting infinite sequence of random variables with the given marginal distributions is another view onto the Dirichlet process, denoted here . We can call this the process-centered view of the Dirichlet process. The conditional distribution of one variable given all the others, or given all previous variables, is defined by the Chinese restaurant process (see below).

Another way to think of a Dirichlet process is as an infinite-dimensional generalization of the Dirichlet distribution. The Dirichlet distribution returns a finite-dimensional set of probabilities (for some size, specified by the parameters of the distribution), all of which sum to 1. This can be thought of as a finite-dimensional discrete distribution; i.e. a Dirichlet distribution can be thought of as a distribution over -dimensional discrete distributions. Imagine generalizing a symmetric Dirichlet distribution, defined by a dimension and concentration parameter, to an infinite set of probabilities; the resulting distribution over infinite-dimensional discrete distributions is called the stick-breaking process (see below). Imagine then using this set of probabilities to create an infinite-dimensional mixture model, with each separate probability from the set associated with a mixture component, and the value of each component drawn separately from a base distribution ; then draw an infinite number of samples from this mixture model. The infinite set of random variables corresponding to the marginal distribution of these samples is a Dirichlet process with parameters and .

The Dirichlet process was formally introduced by Thomas Ferguson in 1973.

Read more about Dirichlet Process:  Introduction, Formal Definition, The Chinese Restaurant Process, The Stick-breaking Process, The Polya Urn Scheme, Applications of The Dirichlet Process, Related Distributions

Famous quotes containing the word process:

    To exist as an advertisement of her husband’s income, or her father’s generosity, has become a second nature to many a woman who must have undergone, one would say, some long and subtle process of degradation before she sunk [sic] so low, or grovelled so serenely.
    Elizabeth Stuart Phelps (1844–1911)