Boltzmann Machine - Training

Training

The units in the Boltzmann Machine are divided into "visible" units, V, and "hidden" units, H. The visible units are those which receive information from the "environment", i.e. our training set is a set of binary vectors over the set V. The distribution over the training set is denoted .

As is discussed above, the distribution over global states converges as the Boltzmann machine reaches thermal equilibrium. We denote this distribution, after we marginalize it over the hidden units, as .

Our goal is to approximate the "real" distribution using the which will be produced (eventually) by the machine. To measure how similar the two distributions are, we use the Kullback-Leibler divergence, :

Where the sum is over all the possible states of . is a function of the weights, since they determine the energy of a state, and the energy determines, as promised by the Boltzmann distribution. Hence, we can use a gradient descent algorithm over, so a given weight, is changed by subtracting the partial derivative of with respect to the weight.

There are two phases to Boltzmann machine training, and we switch iteratively between them. One is the "positive" phase where the visible units' states are clamped to a particular binary state vector sampled from the training set (according to ). The other is the "negative" phase where the network is allowed to run freely, i.e. no units have their state determined by external data. Surprisingly enough, the gradient with respect to a given weight, is given by the very simple equation (proved in Ackley et al.):

Where:

  • is the probability of units i and j both being on when the machine is at equilibrium on the positive phase.
  • is the probability of units i and j both being on when the machine is at equilibrium on the negative phase.
  • denotes the learning rate

This result follows from the fact that at thermal equilibrium the probability of any global state when the network is free-running is given by the Boltzmann distribution (hence the name "Boltzmann machine").

Remarkably, this learning rule is fairly biologically plausible because the only information needed to change the weights is provided by "local" information. That is, the connection (or synapse biologically speaking) does not need information about anything other than the two neurons it connects. This is far more biologically realistic than the information needed by a connection in many other neural network training algorithms, such as backpropagation.

The training of a Boltzmann machine does not use the EM algorithm, which is heavily used in machine learning. By minimizing the KL-divergence, it is equivalent as maximizing the log-likelihood of the data. Therefore, the training procedure performs gradient ascent on the log-likelihood of the observed data. This is in contrast to the EM algorithm, where the posterior distribution of the hidden nodes must be calculated before the maximization of the expected value of the complete data likelihood during the M-step.

Training the biases is similar, but uses only single node activity:

Read more about this topic:  Boltzmann Machine

Famous quotes containing the word training:

    ... the time will come when no servant will be hired without a diploma from some training school, and a girl will as much expect to fit herself for house-maid or cook, as for dressmaker or any trade.
    Lydia Hoyt Farmer (1842–1903)

    At present I feel like a caged animal, bound up by the luxury, comfort and respectability of my position. I can’t get the training that I want without neglecting my duty.
    Beatrice Potter Webb (1858–1943)

    Unfortunately, life may sometimes seem unfair to middle children, some of whom feel like an afterthought to a brilliant older sibling and unable to captivate the family’s attention like the darling baby. Yet the middle position offers great training for the real world of lowered expectations, negotiation, and compromise. Middle children who often must break the mold set by an older sibling may thereby learn to challenge family values and seek their own identity.
    Marianne E. Neifert (20th century)