Study on Term Weight Calculation Based on Information Gain and Entropy
- DOI
- 10.2991/icmmcce-15.2015.416How to use a DOI?
- Keywords
- TF-IDF, text classification, term weight calculation, information entropy, information gain
- Abstract
This paper first analyzes the advantages and disadvantages of TF-IDF, which is a traditional algorithm of term weight calculation. Then to overcome the disadvantages of the algorithm, this paper proposes a new method of term weight calculation based on information gain and information entropy, which can make the result of the term weight calculation more precise and improve the accuracy of text classification. Finally, the text data sets are downloaded from internet according to the web crawler and 7700 texts are selected randomly as the experimental data sets. The experimental results show that the method proposed in this paper overcomes the disadvantages of the traditional TF-IDF and performs better than the other two in precision, recall, F-measure of the text classification.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Ying Hong AU - Chao Lv PY - 2015/12 DA - 2015/12 TI - Study on Term Weight Calculation Based on Information Gain and Entropy BT - Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering 2015 PB - Atlantis Press SN - 2352-538X UR - https://doi.org/10.2991/icmmcce-15.2015.416 DO - 10.2991/icmmcce-15.2015.416 ID - Hong2015/12 ER -