Markov Decision Process
Markov decision processes (MDPs), named after Andrey Markov, provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s (cf. Bellman 1957). A core body of research on Markov decision processes resulted from Ronald A. Howard's book published in 1960, Dynamic Programming and Markov Processes. They are used in a wide area of disciplines, including robotics, automated control, economics, and manufacturing.
More precisely, a Markov Decision Process is a discrete time stochastic control process. At each time step, the process is in some state, and the decision maker may choose any action that is available in state . The process responds at the next time step by randomly moving into a new state, and giving the decision maker a corresponding reward .
The probability that the process moves into its new state is influenced by the chosen action. Specifically, it is given by the state transition function . Thus, the next state depends on the current state and the decision maker's action . But given and, it is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP possess the Markov property.
Markov decision processes are an extension of Markov chains; the difference is the addition of actions (allowing choice) and rewards (giving motivation). Conversely, if only one action exists for each state and all rewards are zero, a Markov decision process reduces to a Markov chain.
Read more about Markov Decision Process: Definition, Problem, Algorithms, Continuous-time Markov Decision Process, Alternative Notations
Famous quotes containing the words decision and/or process:
“The women of my mothers generation had, in the main, only one decision to make about their lives: who they would marry. From that, so much else followed: where they would live, in what sort of conditions, whether they would be happy or sad or, so often, a bit of both. There were roles and there were rules.”
—Anna Quindlen (20th century)
“Any balance we achieve between adult and parental identities, between childrens and our own needs, works only for a timebecause, as one father says, Its a new ball game just about every week. So we are always in the process of learning to be parents.”
—Joan Sheingold Ditzion, Dennie, and Palmer Wolf. Ourselves and Our Children, by Boston Womens Health Book Collective, ch. 2 (1978)