Variational Bayesian Methods

In Practice

The variational distribution is usually assumed to factorize over some partition of the latent variables, i.e. for some partition of the latent variables into ,

It can be shown using the calculus of variations (hence the name "variational Bayes") that the "best" distribution for each of the factors (in terms of the distribution minimizing the KL divergence, as described above) can be expressed as:

where is the expectation of the joint probability of the data and latent variables, taken over all variables not in the partition.

In practice, we usually work in terms of logarithms, i.e.:

The constant in the above expression is related to the normalizing constant (the denominator in the expression above for ) and is usually reinstated by inspection, as the rest of the expression can usually be recognized as being a known type of distribution (e.g. Gaussian, gamma, etc.).

Using the properties of expectations, the expression can usually be simplified into a function of the fixed hyperparameters of the prior distributions over the latent variables and of expectations (and sometimes higher moments such as the variance) of latent variables not in the current partition (i.e. latent variables not included in ). This creates circular dependencies between the parameters of the distributions over variables in one partition and the expectations of variables in the other partitions. This naturally suggests an iterative algorithm, much like EM (the expectation-maximization algorithm), in which the expectations (and possibly higher moments) of the latent variables are initialized in some fashion (perhaps randomly), and then the parameters of each distribution are computed in turn using the current values of the expectations, after which the expectation of the newly computed distribution is set appropriately according to the computed parameters. An algorithm of this sort is guaranteed to converge. Furthermore, if the distributions in question are part of the exponential family, which is usually the case, convergence will be to a global maximum, since the exponential family is convex.

In other words, for each of the partitions of variables, by simplifying the expression for the distribution over the partition's variables and examining the distribution's functional dependency on the variables in question, the family of the distribution can usually be determined (which in turn determines the value of the constant). The formula for the distribution's parameters will be expressed in terms of the prior distributions' hyperparameters (which are known constants), but also in terms of expectations of functions of variables in other partitions. Usually these expectations can be simplified into functions of expectations of the variables themselves (i.e. the means); sometimes expectations of squared variables (which can be related to the variance of the variables), or expectations of higher powers (i.e. higher moments) also appear. In most cases, the other variables' distributions will be from known families, and the formulas for the relevant expectations can be looked up. However, those formulas depend on those distributions' parameters, which depend in turn on the expectations about other variables. The result is that the formulas for the parameters of each variable's distributions can be expressed as a series of equations with mutual, nonlinear dependencies among the variables. Usually, it is not possible to solve this system of equations directly. However, as described above, the dependencies suggest a simple iterative algorithm, which in most cases is guaranteed to converge. An example will make this process clearer.

Read more about this topic: Variational Bayesian Methods

Famous quotes containing the word practice:

“Toddlers who don’t learn gradually about disappointment lose their resilience through lack of practice in give-and-take with other people’s needs. They can become self-centered, demanding, and difficult to like or to be with.”
—Alicia F. Lieberman (20th century)

“My paternal grandmother would not light a fire on the Sabbath and piled all Sunday’s washing-up in a bucket, to be dealt with on Monday morning, because the Sabbath was a day of rest—a practice that made my paternal grandfather, the village atheist, as mad as fire. Nevertheless, he willed five quid to the minister, just to be on the safe side.”
—Angela Carter (1940–1992)

Related Phrases

Multivariate Gaussian Distribution

Parameters Compute

Point Estimates

Posterior Distributions

Prior Distributions

Variational Bayes

Related Words