Markov Decision Process - Problem

Problem

The core problem of MDPs is to find a "policy" for the decision maker: a function that specifies the action that the decision maker will choose when in state . Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov chain.

The goal is to choose a policy that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon:

(where we choose )

where is the discount factor and satisfies . (For example, when the discount rate is r.) is typically close to 1.

Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of only, as assumed above.

Read more about this topic:  Markov Decision Process

Famous quotes containing the word problem:

    The problem of induction is not a problem of demonstration but a problem of defining the difference between valid and invalid
    predictions.
    Nelson Goodman (1906)

    Great speeches have always had great soundbites. The problem now is that the young technicians who put together speeches are paying attention only to the soundbite, not to the text as a whole, not realizing that all great soundbites happen by accident, which is to say, all great soundbites are yielded up inevitably, as part of the natural expression of the text. They are part of the tapestry, they aren’t a little flower somebody sewed on.
    Peggy Noonan (b. 1950)

    Consciousness is what makes the mind-body problem really intractable.
    Thomas Nagel (b. 1938)