Reinforcement Learning

Exploration

The reinforcement learning problem as described requires clever exploration mechanisms. Randomly selecting actions is known to give rise to very poor performance. The case of (small) finite MDPs is relatively well understood by now. However, due to the lack of algorithms that would provably scale well with the number of states (or scale to problems with infinite state spaces), in practice people resort to simple exploration methods. One such method is -greedy, when the agent chooses the action that it believes has the best long-term effect with probability, and it chooses an action uniformly at random, otherwise. Here, is a tuning parameter, which is sometimes changed, either according to a fixed schedule (making the agent explore less as time goes by), or adaptively based on some heuristics (Tokic & Palm, 2011).

Read more about this topic: Reinforcement Learning

Famous quotes containing the word exploration:

“The future author is one who discovers that language, the exploration and manipulation of the resources of language, will serve him in winning through to his way.”
—Thornton Wilder (1897–1975)

“Typography tended to alter language from a means of perception and exploration to a portable commodity.”
—Marshall McLuhan (1911–1980)

“For women who do not love us, as for the “disappeared”, knowing that we no longer have any hope does not prevent us form continuing to wait. We live on our guard, on watch; women whose son has gone asea on a dangerous exploration imagine at any minute, although it has long been certain that he has perished, that he will enter, miraculously saved, and healthy.”
—Marcel Proust (1871–1922)

Related Phrases

Action Values

Dynamic Programming

Function Approximation

Temporal Difference Methods

Related Words