Exponential Family - Table of Distributions

Table of Distributions

The following table shows how to rewrite a number of common distributions as exponential-family distributions with natural parameters.

For a scalar variable and scalar parameter, the form is as follows:

For a scalar variable and vector parameter:

For a vector variable and vector parameter:


The above formulas choose the functional form of the exponential-family with a log-partition function . The reason for this is so that the moments of the sufficient statistics can be calculated easily, simply by differentiating this function. Alternative forms involve either parameterizing this function in terms of the normal parameter instead of the natural parameter, and/or using a factor outside of the exponential. The relation between the latter and the former is:

To convert between the representations involving the two types of parameter, use the formulas below for writing one type of parameter in terms of the other.


Distribution Parameter(s) Natural parameter(s) Inverse parameter mapping Base measure Sufficient statistic Log-partition Log-partition
Bernoulli distribution p
  • This is the logit function.
  • This is the logistic function.
binomial distribution
with known number of trials n
p
Poisson distribution λ
negative binomial distribution
with known number of failures r
p
exponential distribution λ
Pareto distribution
with known minimum value xm
α
Weibull distribution
with known shape k
λ
Laplace distribution
with known mean μ
b
chi-squared distribution ν
normal distribution
known variance
μ
normal distribution μ,σ2
lognormal distribution μ,σ2
inverse Gaussian distribution μ,λ
gamma distribution α,β
k,θ
inverse gamma distribution α,β
scaled inverse chi-squared distribution ν,σ2
beta distribution α,β
multivariate normal distribution μ,Σ
categorical distribution (variant 1) p1,...,pk

where


where
  • is the Iverson bracket (1 if, 0 otherwise).
categorical distribution (variant 2) p1,...,pk

where



where

  • is the Iverson bracket (1 if, 0 otherwise).
categorical distribution (variant 3) p1,...,pk

where


  • This is the inverse softmax function, a generalization of the logit function.


  • This is the softmax function, a generalization of the logistic function.
  • is the Iverson bracket (1 if, 0 otherwise).
 \ln \left(\sum_{i=1}^{k} e^{\eta_i}\right) = \ln \left(1+\sum_{i=1}^{k-1} e^{\eta_i}\right)
multinomial distribution (variant 1)
with known number of trials n
p1,...,pk

where


where
multinomial distribution (variant 2)
with known number of trials n
p1,...,pk

where



where

multinomial distribution (variant 3)
with known number of trials n
p1,...,pk

where




Dirichlet distribution α1,...,αk  \sum_{i=1}^k \ln \Gamma(\eta_i+1) - \ln \Gamma\left(\sum_{i=1}^k\Big(\eta_i+1\Big)\right)
 \sum_{i=1}^k \ln \Gamma(\alpha_i) - \ln \Gamma\left(\sum_{i=1}^k\alpha_i\right)
Wishart distribution V,n




  • Three variants with different parameterizations are given, to facilitate computing moments of the sufficient statistics.
NOTE: Uses the fact that i.e. the trace of a matrix product is much like a dot product. The matrix parameters are assumed to be vectorized (laid out in a vector) when inserted into the exponential form. Also, V and X are symmetric, so e.g.
inverse Wishart distribution Ψ,m




The three variants of the categorical distribution and multinomial distribution are due to the fact that the parameters are constrained, such that Thus, there are only independent parameters.

  • Variant 1 uses natural parameters with a simple relation between the standard and natural parameters; however, only of the natural parameters are independent, and the set of natural parameters is nonidentifiable. The constraint on the usual parameters translates to a similar constraint on the natural parameters.
  • Variant 2 demonstrates the fact that the entire set of natural parameters is nonidentifiable: Adding any constant value to the natural parameters has no effect on the resulting distribution. However, by using the constraint on the natural parameters, the formula for the normal parameters in terms of the natural parameters can be written in a way that is independent on the constant that is added.
  • Variant 3 shows how to make the parameters identifiable in a convenient way by setting This effectively "pivots" around and causes the last natural parameter to have the constant value of 0. All the remaining formulas are written in a way that does not access so that effectively the model has only parameters, both of the usual and natural kind.

Note also that variants 1 and 2 are not actually standard exponential families at all. Rather they are curved exponential families, i.e. there are independent parameters embedded in a -dimensional parameter space. Many of the standard results for exponential families do not apply to curved exponential families. An example is the log-partition function A(x), which has the value of 0 in the curved cases. In standard exponential families, the derivatives of this function correspond to the moments (more technically, the cumulants) of the sufficient statistics, e.g. the mean and variance. However, a value of 0 suggests that the mean and variance of all the sufficient statistics are uniformly 0, whereas in fact the mean of the ith sufficient statistic should be (This does emerge correctly when using the form of A(x) in variant 3.)

Read more about this topic:  Exponential Family

Famous quotes containing the word table:

    Life is a thin narrowness of taken-for-granted, a plank over a canyon in a fog. There is something under our feet, the taken-for-granted. A table is a table, food is food, we are we—because we don’t question these things. And science is the enemy because it is the questioner. Faith saves our souls alive by giving us a universe of the taken-for-granted.
    Rose Wilder Lane (1886–1968)