Markov Decision Process - Problem

Problem

The core problem of MDPs is to find a "policy" for the decision maker: a function that specifies the action that the decision maker will choose when in state . Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov chain.

The goal is to choose a policy that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon:

(where we choose )

where is the discount factor and satisfies . (For example, when the discount rate is r.) is typically close to 1.

Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of only, as assumed above.

Read more about this topic:  Markov Decision Process

Famous quotes containing the word problem:

    War is not a life: it is a situation,
    One which may neither be ignored nor accepted,
    A problem to be met with ambush and stratagem,
    Enveloped or scattered.
    —T.S. (Thomas Stearns)

    The problem of the novelist who wishes to write about a man’s encounter with God is how he shall make the experience—which is both natural and supernatural—understandable, and credible, to his reader. In any age this would be a problem, but in our own, it is a well- nigh insurmountable one. Today’s audience is one in which religious feeling has become, if not atrophied, at least vaporous and sentimental.
    Flannery O’Connor (1925–1964)

    Consciousness is what makes the mind-body problem really intractable.
    Thomas Nagel (b. 1938)