Problem
The core problem of MDPs is to find a "policy" for the decision maker: a function that specifies the action that the decision maker will choose when in state . Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov chain.
The goal is to choose a policy that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon:
- (where we choose )
where is the discount factor and satisfies . (For example, when the discount rate is r.) is typically close to 1.
Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of only, as assumed above.
Read more about this topic: Markov Decision Process
Famous quotes containing the word problem:
“I tell you, sir, the only safeguard of order and discipline in the modern world is a standardized worker with interchangeable parts. That would solve the entire problem of management.”
—Jean Giraudoux (18821944)
“It is commonplace that a problem stated is well on its way to solution, for statement of the nature of a problem signifies that the underlying quality is being transformed into determinate distinctions of terms and relations or has become an object of articulate thought.”
—John Dewey (18591952)
“Will women find themselves in the same position they have always been? Or do we see liberation as solving the conditions of women in our society?... If we continue to shy away from this problem we will not be able to solve it after independence. But if we can say that our first priority is the emancipation of women, we will become free as members of an oppressed community.”
—Ruth Mompati (b. 1925)