Variational Bayesian Methods - Mathematical Derivation of The Mean-field Approximation

Mathematical Derivation of The Mean-field Approximation

In variational inference, the posterior distribution over a set of unobserved variables given some data is approximated by a variational distribution, :

The distribution is restricted to belong to a family of distributions of simpler form than, selected with the intention of making similar to the true posterior, . The lack of similarity is measured in terms of a dissimilarity function and hence inference is performed by selecting the distribution that minimizes .

The most common type of variational Bayes, known as mean-field variational Bayes, uses the Kullback–Leibler divergence (KL-divergence) of P from Q as the choice of dissimilarity function. This choice makes this minimization tractable. The KL-divergence is defined as

Note that Q and P are reversed from what one might expect. This use of reversed KL-divergence is conceptually similar to the expectation-maximization algorithm. (Using the KL-divergence in the other way produces the expectation propagation algorithm.)

The KL-divergence can be written as

or


\begin{align}
\log P(\mathbf{X}) & = D_{\mathrm{KL}}(Q||P) - \sum_\mathbf{Z} Q(\mathbf{Z}) \log \frac{Q(\mathbf{Z})}{P(\mathbf{Z},\mathbf{X})} \\
& = D_{\mathrm{KL}}(Q||P) + \mathcal{L}(Q).
\end{align}

As the log evidence is fixed with respect to, maximising the final term minimizes the KL divergence of from . By appropriate choice of, becomes tractable to compute and to maximize. Hence we have both an analytical approximation for the posterior, and a lower bound for the evidence . The lower bound is known as the (negative) variational free energy because it can also be expressed as an "energy" plus the entropy of .

Read more about this topic:  Variational Bayesian Methods

Famous quotes containing the word mathematical:

    What is history? Its beginning is that of the centuries of systematic work devoted to the solution of the enigma of death, so that death itself may eventually be overcome. That is why people write symphonies, and why they discover mathematical infinity and electromagnetic waves.
    Boris Pasternak (1890–1960)