Reinforcement Learning

Theory

The theory for small, finite MDPs is quite mature. Both the asymptotic and finite-sample behavior of most algorithms is well-understood. As mentioned beforehand, algorithms with provably good online performance (addressing the exploration issue) are known. The theory of large MDPs needs more work. Efficient exploration is largely untouched (except for the case of bandit problems). Although finite-time performance bounds appeared for many algorithms in the recent years, these bounds are expected to be rather loose and thus more work is needed to better understand the relative advantages, as well as the limitations of these algorithms. For incremental algorithm asymptotic convergence issues have been settled. Recently, new incremental, temporal-difference-based algorithms have appeared which converge under a much wider set of conditions than was previously possible (for example, when used with arbitrary, smooth function approximation).

Read more about this topic: Reinforcement Learning

Famous quotes containing the word theory:

“There could be no fairer destiny for any physical theory than that it should point the way to a more comprehensive theory in which it lives on as a limiting case.”
—Albert Einstein (1879–1955)

“The theory of truth is a series of truisms.”
—J.L. (John Langshaw)

“Hygiene is the corruption of medicine by morality. It is impossible to find a hygienest who does not debase his theory of the healthful with a theory of the virtuous.... The true aim of medicine is not to make men virtuous; it is to safeguard and rescue them from the consequences of their vices.”
—H.L. (Henry Lewis)

Related Phrases

Artificial Neural Network

Temporal Difference Methods

Related Words