Feature Markov Decision Processes
- DOI
- 10.2991/agi.2009.30How to use a DOI?
- Abstract
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well- developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observa- tions, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main con- tribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Ex- tensions to more realistic dynamic Bayesian networks are de- veloped in the companion article [Hut09].
- Copyright
- © 2009, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Marcus Hutter PY - 2009/06 DA - 2009/06 TI - Feature Markov Decision Processes BT - Proceedings of the 2nd Conference on Artificial General Intelligence (2009) PB - Atlantis Press SP - 138 EP - 143 SN - 1951-6851 UR - https://doi.org/10.2991/agi.2009.30 DO - 10.2991/agi.2009.30 ID - Hutter2009/06 ER -