Belief MDP
The policy maps a belief state space into the action space. The optimal policy can be understood as the solution of a continuous space Markov Decision Process (so-called belief MDP). It is defined as a tuple where
- is the set of belief states over the POMDP states,
- is the same finite set of action as for the original POMDP,
- is the belief state transition function,
- is the reward function on belief states. It writes :
.
Note that this MDP is defined over a continuous state space.
Read more about this topic: Partially Observable Markov Decision Process
Famous quotes containing the word belief:
“Power-worship blurs political judgement because it leads, almost unavoidably, to the belief that present trends will continue. Whoever is winning at the moment will always seem to be invincible.”
—George Orwell (19031950)