Problem
The core problem of MDPs is to find a "policy" for the decision maker: a function that specifies the action that the decision maker will choose when in state . Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov chain.
The goal is to choose a policy that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon:
- (where we choose )
where is the discount factor and satisfies . (For example, when the discount rate is r.) is typically close to 1.
Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of only, as assumed above.
Read more about this topic: Markov Decision Process
Famous quotes containing the word problem:
“A curious thing about the ontological problem is its simplicity. It can be put in three Anglo-Saxon monosyllables: What is there? It can be answered, moveover, in a wordEverything.”
—Willard Van Orman Quine (b. 1908)
“Any solution to a problem changes the problem.”
—R.W. (Richard William)
“If we parents accept that problems are an essential part of lifes challenges, rather than reacting to every problem as if something has gone wrong with universe thats supposed to be perfect, we can demonstrate serenity and confidence in problem solving for our kids....By telling them that we know they have a problem and we know they can solve it, we can pass on a realistic attitude as well as empower our children with self-confidence and a sense of their own worth.”
—Barbara Coloroso (20th century)