Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)

Improvement of Naive Bayes Text Classifier Based on Ensemble Technology and Feature Engineering

Authors
Dongyang Liu1, *
1Department of Computer Science and Technology, Beijing Institute of Technology, Beijing, 102488, China
*Corresponding author. Email: toyoliu@bit.edu.cn
Corresponding Author
Dongyang Liu
Available Online 27 November 2023.
DOI
10.2991/978-94-6463-300-9_57How to use a DOI?
Keywords
Ensemble Model; Naive Bayes Classifier; Feature Engineering
Abstract

The performance of the Naive Bayes model in text classification is constrained by its assumption of feature independence, which does not hold true for textual data, as well as its reliance solely on word frequency information, disregarding word order and relationships and hindering its ability to capture text semantics effectively. Therefore, this study adopts ensemble learning and feature engineering methods to compensate for these deficiencies of the Naive Bayes model and improve its text classification accuracy. This study proposes a method to improve the performance of a Naive Bayes classifier by combining it with other classifiers, namely Random Forest, Support Vector Machines (SVM), and ensemble learning. The dataset for training and evaluation purposes utilized is the IMDB movie review dataset. The dataset is preprocessed by converting the integer sequences to text and then tokenizing and vectorizing the text using a CountVectorizer. Variousperformance indicators, such as accuracy, precision, and F1-score, are calculated for each classifier and the ensemble model. The results demonstrate that the ensemble model achieves the highest accuracy compared to the individual classifiers. The Naive Bayes classifier achieves an accuracy of 78.19%, Random Forest achieves 81.49%, SVM achieves 84.60%, and the ensemble model achieves an accuracy of 84.89%. These findings highlight the effectiveness of ensemble learning and feature engineering in improving the performance of a Naive Bayes text classifier.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)
Series
Advances in Computer Science Research
Publication Date
27 November 2023
ISBN
10.2991/978-94-6463-300-9_57
ISSN
2352-538X
DOI
10.2991/978-94-6463-300-9_57How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Dongyang Liu
PY  - 2023
DA  - 2023/11/27
TI  - Improvement of Naive Bayes Text Classifier Based on Ensemble Technology and Feature Engineering
BT  - Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)
PB  - Atlantis Press
SP  - 557
EP  - 563
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-300-9_57
DO  - 10.2991/978-94-6463-300-9_57
ID  - Liu2023
ER  -