Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering 2015

Study on Term Weight Calculation Based on Information Gain and Entropy

Authors
Ying Hong, Chao Lv
Corresponding Author
Ying Hong
Available Online December 2015.
DOI
10.2991/icmmcce-15.2015.416How to use a DOI?
Keywords
TF-IDF, text classification, term weight calculation, information entropy, information gain
Abstract

This paper first analyzes the advantages and disadvantages of TF-IDF, which is a traditional algorithm of term weight calculation. Then to overcome the disadvantages of the algorithm, this paper proposes a new method of term weight calculation based on information gain and information entropy, which can make the result of the term weight calculation more precise and improve the accuracy of text classification. Finally, the text data sets are downloaded from internet according to the web crawler and 7700 texts are selected randomly as the experimental data sets. The experimental results show that the method proposed in this paper overcomes the disadvantages of the traditional TF-IDF and performs better than the other two in precision, recall, F-measure of the text classification.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering 2015
Series
Advances in Computer Science Research
Publication Date
December 2015
ISBN
978-94-6252-133-9
ISSN
2352-538X
DOI
10.2991/icmmcce-15.2015.416How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Ying Hong
AU  - Chao Lv
PY  - 2015/12
DA  - 2015/12
TI  - Study on Term Weight Calculation Based on Information Gain and Entropy
BT  - Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering 2015
PB  - Atlantis Press
SN  - 2352-538X
UR  - https://doi.org/10.2991/icmmcce-15.2015.416
DO  - 10.2991/icmmcce-15.2015.416
ID  - Hong2015/12
ER  -