Research on Text Classification Based on Improved TF-IDF Algorithm
- DOI
- 10.2991/ncce-18.2018.79How to use a DOI?
- Keywords
- TF-IDF; text classification; Bayesian; evaluation index.
- Abstract
In solving the problem of feature weight calculation for automatic text classification, we use the most widely used TF-IDF algorithm. Although the algorithm is widely used, there is a problem that the feature categories have different weights when calculating the weights. This paper proposes an improved TF-IDF algorithm (TF-IDCRF) that takes into account the relationships between classes to complete the classification of texts. By modifying the calculation formulas of IDF to correct the problem of insufficient classification of feature categories, the naive Bayes classification algorithm is used to complete the classification. Finally, the proposed algorithm is compared with two other improved TFIDF algorithms. The results of the three text classification evaluation indicators show that the proposed algorithm has certain advantages in text classification.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Huilong Fan AU - Yongbin Qin PY - 2018/05 DA - 2018/05 TI - Research on Text Classification Based on Improved TF-IDF Algorithm BT - Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018) PB - Atlantis Press SP - 501 EP - 506 SN - 1951-6851 UR - https://doi.org/10.2991/ncce-18.2018.79 DO - 10.2991/ncce-18.2018.79 ID - Fan2018/05 ER -