Belief MDP
The policy maps a belief state space into the action space. The optimal policy can be understood as the solution of a continuous space Markov Decision Process (so-called belief MDP). It is defined as a tuple where
- is the set of belief states over the POMDP states,
- is the same finite set of action as for the original POMDP,
- is the belief state transition function,
- is the reward function on belief states. It writes :
.
Note that this MDP is defined over a continuous state space.
Read more about this topic: Partially Observable Markov Decision Process
Famous quotes containing the word belief:
“The disaster ... is not the money, although the money will be missed. The disaster is the disrespectthis belief that the arts are dispensable, that theyre not critical to a cultures existence.”
—Twyla Tharp (b. 1941)