Improvement of Naive Bayes Text Classifier Based on Ensemble Technology and Feature Engineering
- DOI
- 10.2991/978-94-6463-300-9_57How to use a DOI?
- Keywords
- Ensemble Model; Naive Bayes Classifier; Feature Engineering
- Abstract
The performance of the Naive Bayes model in text classification is constrained by its assumption of feature independence, which does not hold true for textual data, as well as its reliance solely on word frequency information, disregarding word order and relationships and hindering its ability to capture text semantics effectively. Therefore, this study adopts ensemble learning and feature engineering methods to compensate for these deficiencies of the Naive Bayes model and improve its text classification accuracy. This study proposes a method to improve the performance of a Naive Bayes classifier by combining it with other classifiers, namely Random Forest, Support Vector Machines (SVM), and ensemble learning. The dataset for training and evaluation purposes utilized is the IMDB movie review dataset. The dataset is preprocessed by converting the integer sequences to text and then tokenizing and vectorizing the text using a CountVectorizer. Variousperformance indicators, such as accuracy, precision, and F1-score, are calculated for each classifier and the ensemble model. The results demonstrate that the ensemble model achieves the highest accuracy compared to the individual classifiers. The Naive Bayes classifier achieves an accuracy of 78.19%, Random Forest achieves 81.49%, SVM achieves 84.60%, and the ensemble model achieves an accuracy of 84.89%. These findings highlight the effectiveness of ensemble learning and feature engineering in improving the performance of a Naive Bayes text classifier.
- Copyright
- © 2023 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Dongyang Liu PY - 2023 DA - 2023/11/27 TI - Improvement of Naive Bayes Text Classifier Based on Ensemble Technology and Feature Engineering BT - Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023) PB - Atlantis Press SP - 557 EP - 563 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-300-9_57 DO - 10.2991/978-94-6463-300-9_57 ID - Liu2023 ER -