Partially Observable Markov Decision Process - Belief MDP

Belief MDP

The policy maps a belief state space into the action space. The optimal policy can be understood as the solution of a continuous space Markov Decision Process (so-called belief MDP). It is defined as a tuple where

  • is the set of belief states over the POMDP states,
  • is the same finite set of action as for the original POMDP,
  • is the belief state transition function,
  • is the reward function on belief states. It writes :

.

Note that this MDP is defined over a continuous state space.

Read more about this topic:  Partially Observable Markov Decision Process

Famous quotes containing the word belief:

    The disaster ... is not the money, although the money will be missed. The disaster is the disrespect—this belief that the arts are dispensable, that they’re not critical to a culture’s existence.
    Twyla Tharp (b. 1941)