Improved term selection algorithm based on variance in text categorization
- DOI
- 10.2991/icsem.2013.157How to use a DOI?
- Keywords
- variance, text classification, term selection
- Abstract
This article improves the algorithm of term weighting in automated text classification. The traditional TFIDF algorithm is a common method that is used to measure term weighting in text classification However, the algorithm does not take the distribution of terms in inter-class. In order to solve the problem, variance which describes the distribution of terms in inter-class and intra-class is used to revise TFIDF algorithm. This article mainly researched about the construction of LFHW term sets and new approaches to term weighting, These new approaches are also applied to the hierarchical classification system Compared with traditional TFIDF algorithm the results of simulation experiment have demonstrated that the improved TFIDF algorithm can get better classification results
- Copyright
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Ran Li AU - Xianjiu Guo PY - 2013/04 DA - 2013/04 TI - Improved term selection algorithm based on variance in text categorization BT - Proceedings of the 2nd International Conference On Systems Engineering and Modeling (ICSEM 2013) PB - Atlantis Press SP - 765 EP - 768 SN - 1951-6851 UR - https://doi.org/10.2991/icsem.2013.157 DO - 10.2991/icsem.2013.157 ID - Li2013/04 ER -