Problem
The core problem of MDPs is to find a "policy" for the decision maker: a function that specifies the action that the decision maker will choose when in state . Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov chain.
The goal is to choose a policy that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon:
- (where we choose )
where is the discount factor and satisfies . (For example, when the discount rate is r.) is typically close to 1.
Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of only, as assumed above.
Read more about this topic: Markov Decision Process
Famous quotes containing the word problem:
“The government is huge, stupid, greedy and makes nosy, officious and dangerous intrusions into the smallest corners of lifethis much we can stand. But the real problem is that government is boring. We could cure or mitigate the other ills Washington visits on us if we could only bring ourselves to pay attention to Washington itself. But we cannot.”
—P.J. (Patrick Jake)
“The disesteem into which moralists have fallen is due at bottom to their failure to see that in an age like this one the function of the moralist is not to exhort men to be good but to elucidate what the good is. The problem of sanctions is secondary.”
—Walter Lippmann (18891974)
“And just as there are no words for the surface, that is,
No words to say what it really is, that it is not
Superficial but a visible core, then there is
No way out of the problem of pathos vs. experience.”
—John Ashbery (b. 1927)