Multi-armed Bandit - Variations

Variations

A common formulation is the Binary multi-armed bandit or Bernoulli multi-armed bandit, which issues a reward of one with probability p, and otherwise a reward of zero.

Another formulation of the multi-armed bandit has each arm representing an independent Markov machine. Each time a particular arm is played, the state of that machine advances to a new one, chosen according to the Markov state evolution probabilities. There is a reward depending on the current state of the machine. In a generalisation called the "restless bandit problem", the states of non-played arms can also evolve over time. There has also been discussion of systems where the number of choices (about which arm to play) increases over time.

Computer science researchers have studied multi-armed bandits under worst-case assumptions, obtaining positive results for finite numbers of trials with both stochastic and nonstochastic arm payoffs.

Read more about this topic: Multi-armed Bandit

Famous quotes containing the word variations:

“I may be able to spot arrowheads on the desert but a refrigerator is a jungle in which I am easily lost. My wife, however, will unerringly point out that the cheese or the leftover roast is hiding right in front of my eyes. Hundreds of such experiences convince me that men and women often inhabit quite different visual worlds. These are differences which cannot be attributed to variations in visual acuity. Man and women simply have learned to use their eyes in very different ways.”
—Edward T. Hall (b. 1914)