Problem
The core problem of MDPs is to find a "policy" for the decision maker: a function that specifies the action that the decision maker will choose when in state . Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov chain.
The goal is to choose a policy that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon:
- (where we choose )
where is the discount factor and satisfies . (For example, when the discount rate is r.) is typically close to 1.
Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of only, as assumed above.
Read more about this topic: Markov Decision Process
Famous quotes containing the word problem:
“If a problem is insoluble, it is Necessity. Leave it alone.”
—Mason Cooley (b. 1927)
“One thing in any case is certain: man is neither the oldest nor the most constant problem that has been posed for human knowledge.”
—Michel Foucault (19261984)
“Theology, I am persuaded, derives its initial impulse from a religious wavering; for there is quite as much, or more, that is mysterious and calculated to awaken scientific curiosity in the intercourse with God, and it [is] a problem quite analogous to that of theology.”
—Charles Sanders Peirce (18391914)