Bound Action Policy for Better Sample Efficiency
Authors
Junning Huang, Zhifeng Hao
Corresponding Author
Junning Huang
Available Online May 2018.
- DOI
- 10.2991/ncce-18.2018.131How to use a DOI?
- Keywords
- Reinforcement; policy gradient; action output.; locomotion policy; Gaussian distribution.
- Abstract
Reinforcement learning algorithm for solving robotic locomotion control problem has achieved great progress. Use a Gaussian distribution to represent the locomotion policy of the robot is a general way. A locomotion policy means the distribution of action output. However, in real-world control problems, the actions are bounded by physical constraints, which introduces a bias when Gaussian distribution is used as the policy. This paper proposes logistic gaussian policy, can reduce both the bias introducing by Gaussian distribution and the variance between policy gradient samples.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Junning Huang AU - Zhifeng Hao PY - 2018/05 DA - 2018/05 TI - Bound Action Policy for Better Sample Efficiency BT - Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018) PB - Atlantis Press SP - 794 EP - 799 SN - 1951-6851 UR - https://doi.org/10.2991/ncce-18.2018.131 DO - 10.2991/ncce-18.2018.131 ID - Huang2018/05 ER -