Reinforcement Learning - Introduction

Introduction

The basic reinforcement learning model consists of:

a set of environment states ;
a set of actions ;
rules of transitioning between states;
rules that determine the scalar immediate reward of a transition; and
rules that describe what the agent observes.

The rules are often stochastic. The observation typically involves the scalar immediate reward associated with the last transition. In many works, the agent is also assumed to observe the current environmental state, in which case we talk about full observability, whereas in the opposing case we talk about partial observability. Sometimes the set of actions available to the agent is restricted (e.g., you cannot spend more money than what you possess).

A reinforcement learning agent interacts with its environment in discrete time steps. At each time, the agent receives an observation, which typically includes the reward . It then chooses an action from the set of actions available, which is subsequently sent to the environment. The environment moves to a new state and the reward associated with the transition is determined. The goal of a reinforcement learning agent is to collect as much reward as possible. The agent can choose any action as a function of the history and it can even randomize its action selection.

When the agent's performance is compared to that of an agent which acts optimally from the beginning, the difference in performance gives rise to the notion of regret. Note that in order to act near optimally, the agent must reason about the long term consequences of its actions: In order to maximize my future income I better go to school now, although the immediate monetary reward associated with this might be negative.

Thus, reinforcement learning is particularly well suited to problems which include a long-term versus short-term reward trade-off. It has been applied successfully to various problems, including robot control, elevator scheduling, telecommunications, backgammon and checkers (Sutton and Barto 1998, Chapter 11).

Two components make reinforcement learning powerful: The use of samples to optimize performance and the use of function approximation to deal with large environments. Thanks to these two key components, reinforcement learning can be used in large environments in any of the following situations:

A model of the environment is known, but an analytic solution is not available;
Only a simulation model of the environment is given (the subject of simulation-based optimization);
The only way to collect information about the environment is by interacting with it.

The first two of these problems could be considered planning problems (since some form of the model is available), while the last one could be considered as a genuine learning problem. However, under a reinforcement learning methodology both planning problems would be converted to machine learning problems.

Read more about this topic: Reinforcement Learning

Famous quotes containing the word introduction:

“Do you suppose I could buy back my introduction to you?”
—S.J. Perelman, U.S. screenwriter, Arthur Sheekman, Will Johnstone, and Norman Z. McLeod. Groucho Marx, Monkey Business, a wisecrack made to his fellow stowaway Chico Marx (1931)

“For the introduction of a new kind of music must be shunned as imperiling the whole state; since styles of music are never disturbed without affecting the most important political institutions.”
—Plato (c. 427–347 B.C.)

“We used chamber-pots a good deal.... My mother ... loved to repeat: “When did the queen reign over China?” This whimsical and harmless scatological pun was my first introduction to the wonderful world of verbal transformations, and also a first perception that a joke need not be funny to give pleasure.”
—Angela Carter (1940–1992)

Related Phrases

Action Values

Called Optimal

Function Approximation

Temporal Difference Methods

Related Words