Stable Conservative Q-Learning for Offline Reinforcement Learning

Zhenyuan Ji

doi:10.2991/978-94-6463-300-9_18

<Previous Article In Volume

Next Article In Volume>

Stable Conservative Q-Learning for Offline Reinforcement Learning

Authors

Zhenyuan Ji¹^{, *}

¹Department of Automation, Shanghai Jiao Tong University, Shanghai, 200240, China

^*Corresponding author. Email: jizhenyuansjtu@sjtu.edu.cn

Corresponding Author

Zhenyuan Ji

Available Online 27 November 2023.

DOI: 10.2991/978-94-6463-300-9_18 How to use a DOI?
Keywords: Stable Conservative Q-Learning; Offline Reinforcement Learning
Abstract: Offline Reinforcement learning (RL) uses the collected data to train agents without interaction with the environment. However, due to the inconsistent distribution between the data set and the real world, the training samples collected in the real environment cannot be well applied to offline RL. In this paper, a model-free offline RL method is designed and is named Stable Conservative Q-Learning (SCQL). It uses a similar approach as Conservative Q-Learning (CQL) to limit the Q estimation of out-of-distribution (OOD) actions. The limitations on the estimation of OOD action is eliminated by combining Variational Autoencoders (VAE) with an estimation network that is trained without using OOD actions. It adopts a value-constrained approach to conservatively estimate the Q value, ensuring the stability of the algorithm's results while not affecting its generalization ability. Experimental results demonstrate that SCQL achieves conservative Q-function estimation while maintaining superior stability and generalization compared to baseline offline RL algorithms, including CQL. The proposed method effectively mitigates the negative impact of data distribution mismatch in offline RL, leading to improved performance and robustness.
Copyright: © 2023 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)
Series: Advances in Computer Science Research
Publication Date: 27 November 2023
ISBN: 978-94-6463-300-9
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-300-9_18 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Zhenyuan Ji
PY  - 2023
DA  - 2023/11/27
TI  - Stable Conservative Q-Learning for Offline Reinforcement Learning
BT  - Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)
PB  - Atlantis Press
SP  - 175
EP  - 184
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-300-9_18
DO  - 10.2991/978-94-6463-300-9_18
ID  - Ji2023
ER  -

download .riscopy to clipboard