Investigating the Potential of Large Language Models for Automated Writing Scoring
- DOI
- 10.2991/978-94-6463-502-7_116How to use a DOI?
- Keywords
- Automated Writing Scoring; Large Language Models; GPT-4; Feedback Generation; Writing Assessment
- Abstract
This study investigates the potential of large language models (LLMs), specifically GPT-4, for automated writing scoring and feedback generation. Employing a mixed-methods approach, the research evaluates the accuracy and reliability of GPT-4 in predicting essay scores and the quality of its generated feedback. The results demonstrate a high level of agreement between GPT-4 scores and human raters, as evidenced by the confusion matrix and Quadratic Weighted Kappa metric. Qualitative analysis of GPT-4 feedback suggests its ability to provide constructive and comprehensive suggestions for improving student writing. However, there are still limitations surrounding LLM-based automated scoring and feedbacks. Thus, this study proposes the use of LLM-based systems as formative assessment tools to complement human judgment.
- Copyright
- © 2024 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Shan Wang PY - 2024 DA - 2024/08/31 TI - Investigating the Potential of Large Language Models for Automated Writing Scoring BT - Proceedings of the 2024 5th International Conference on Education, Knowledge and Information Management (ICEKIM 2024) PB - Atlantis Press SP - 1091 EP - 1098 SN - 2589-4900 UR - https://doi.org/10.2991/978-94-6463-502-7_116 DO - 10.2991/978-94-6463-502-7_116 ID - Wang2024 ER -