Markov Decision Process - Problem

Problem

The core problem of MDPs is to find a "policy" for the decision maker: a function that specifies the action that the decision maker will choose when in state . Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov chain.

The goal is to choose a policy that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon:

(where we choose )

where is the discount factor and satisfies . (For example, when the discount rate is r.) is typically close to 1.

Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of only, as assumed above.

Read more about this topic:  Markov Decision Process

Famous quotes containing the word problem:

    Every reform was once a private opinion, and when it shall be a private opinion again, it will solve the problem of the age.
    Ralph Waldo Emerson (1803–1882)

    The problem of culture is seldom grasped correctly. The goal of a culture is not the greatest possible happiness of a people, nor is it the unhindered development of all their talents; instead, culture shows itself in the correct proportion of these developments. Its aim points beyond earthly happiness: the production of great works is the aim of culture.
    Friedrich Nietzsche (1844–1900)

    I used to be a discipline problem, which caused me embarrassment until I realized that being a discipline problem in a racist society is sometimes an honor.
    Ishmael Reed (b. 1938)