Variational Bayesian Methods - Mathematical Derivation of The Mean-field Approximation | Mathematical Derivation Mean-field Approximation

Mathematical Derivation of The Mean-field Approximation

In variational inference, the posterior distribution over a set of unobserved variables given some data is approximated by a variational distribution, :

The distribution is restricted to belong to a family of distributions of simpler form than, selected with the intention of making similar to the true posterior, . The lack of similarity is measured in terms of a dissimilarity function and hence inference is performed by selecting the distribution that minimizes .

The most common type of variational Bayes, known as mean-field variational Bayes, uses the Kullback–Leibler divergence (KL-divergence) of P from Q as the choice of dissimilarity function. This choice makes this minimization tractable. The KL-divergence is defined as

Note that Q and P are reversed from what one might expect. This use of reversed KL-divergence is conceptually similar to the expectation-maximization algorithm. (Using the KL-divergence in the other way produces the expectation propagation algorithm.)

The KL-divergence can be written as

$\begin{align} \log P(\mathbf{X}) & = D_{\mathrm{KL}}(Q||P) - \sum_\mathbf{Z} Q(\mathbf{Z}) \log \frac{Q(\mathbf{Z})}{P(\mathbf{Z},\mathbf{X})} \\ & = D_{\mathrm{KL}}(Q||P) + \mathcal{L}(Q). \end{align}$

As the log evidence is fixed with respect to, maximising the final term minimizes the KL divergence of from . By appropriate choice of, becomes tractable to compute and to maximize. Hence we have both an analytical approximation for the posterior, and a lower bound for the evidence . The lower bound is known as the (negative) variational free energy because it can also be expressed as an "energy" plus the entropy of .

Read more about this topic: Variational Bayesian Methods