Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023)

Comparison of Q-learning and SARSA Reinforcement Learning Models on Cliff Walking Problem

Authors
Lv Zhong1, *
1Brunel London School, North China University of Technology, Beijing, 100144, China
*Corresponding author. Email: 21190010330@mail.ncut.edu.cn
Corresponding Author
Lv Zhong
Available Online 14 February 2024.
DOI
10.2991/978-94-6463-370-2_23How to use a DOI?
Keywords
SARSA; Q-learning; Reinforcement learning
Abstract

In this work, the performance of two value-based reinforcement learning algorithms is evaluated in the cliff walking problem, including State-Action-Reward-State-Action (SARSA) and Q-learning. This paper uses Python language and Numpy library to implement SARSA and Q-learning algorithms, and compares and analyzes their policy graphs and reward curves. The experimental results show that SARSA is a conservative algorithm, which tends to choose a path away from the cliff, thus reducing the risk but also increasing the steps and time; Q-learning is a greedy algorithm, which tends to choose a path close to the cliff, thus increasing the reward but also increasing the fluctuation and instability. This paper discusses the balance between exploration and exploitation of these two algorithms, as well as their performance under different parameter settings, as well as their adaptability and generalization ability in complex environments. This paper also points out some shortcomings and prospects, such as only using a simple grid world as the experimental environment, without considering more complex or realistic environments; only using two value-based reinforcement learning algorithms, without considering other types of reinforcement learning algorithms; only using one exploration policy, namely ε-greedy policy, without considering other types of exploration policies. This paper provides some valuable contributions and innovations for the reinforcement learning field.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023)
Series
Advances in Intelligent Systems Research
Publication Date
14 February 2024
ISBN
978-94-6463-370-2
ISSN
1951-6851
DOI
10.2991/978-94-6463-370-2_23How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Lv Zhong
PY  - 2024
DA  - 2024/02/14
TI  - Comparison of Q-learning and SARSA Reinforcement Learning Models on Cliff Walking Problem
BT  - Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023)
PB  - Atlantis Press
SP  - 207
EP  - 213
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-370-2_23
DO  - 10.2991/978-94-6463-370-2_23
ID  - Zhong2024
ER  -