Belief MDP
The policy maps a belief state space into the action space. The optimal policy can be understood as the solution of a continuous space Markov Decision Process (so-called belief MDP). It is defined as a tuple where
- is the set of belief states over the POMDP states,
- is the same finite set of action as for the original POMDP,
- is the belief state transition function,
- is the reward function on belief states. It writes :
.
Note that this MDP is defined over a continuous state space.
Read more about this topic: Partially Observable Markov Decision Process
Famous quotes containing the word belief:
“My belief is that no being and no society composed of human beings ever did, or ever will, come to much unless their conduct was governed and guided by the love of some ethical ideal.”
—Thomas Henry Huxley (182595)