Problem
The core problem of MDPs is to find a "policy" for the decision maker: a function that specifies the action that the decision maker will choose when in state . Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov chain.
The goal is to choose a policy that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon:
- (where we choose )
where is the discount factor and satisfies . (For example, when the discount rate is r.) is typically close to 1.
Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of only, as assumed above.
Read more about this topic: Markov Decision Process
Famous quotes containing the word problem:
“From cradle to grave this problem of running order through chaos, direction through space, discipline through freedom, unity through multiplicity, has always been, and must always be, the task of education, as it is the moral of religion, philosophy, science, art, politics and economy; but a boys will is his life, and he dies when it is broken, as the colt dies in harness, taking a new nature in becoming tame.”
—Henry Brooks Adams (18381918)
“The problem of the novelist who wishes to write about a mans encounter with God is how he shall make the experiencewhich is both natural and supernaturalunderstandable, and credible, to his reader. In any age this would be a problem, but in our own, it is a well- nigh insurmountable one. Todays audience is one in which religious feeling has become, if not atrophied, at least vaporous and sentimental.”
—Flannery OConnor (19251964)
“Any solution to a problem changes the problem.”
—R.W. (Richard William)