A Method of Text Dimension Reduction Based on CHI and TF-IDF
- DOI
- 10.2991/icmmcce-15.2015.356How to use a DOI?
- Keywords
- Dimension extraction; CHI; TF-IDF; Text classification
- Abstract
In order to improve the result of text dimension extraction, a text dimension reduction method which is based on CHI and TF-IDF is designed and realized. News websites on the Internet provide news articles, through the page analysis based on SVM and application method designed based on CHI and TF-IDF, good results achieved by extracting text dimensions from news website. According to the news articles from NetEase and ChinaNews, the proposed algorithm model of the text dimension extraction is designed and achieved 81.2% accuracy of the text classification, which provided the data support for the method designed. This method can make up a low frequency word defect of CHI and get efficient result on text classification.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - HaiBo Tang AU - Lei Zhou AU - Xu Chengjie AU - Quanyin Zhu PY - 2015/12 DA - 2015/12 TI - A Method of Text Dimension Reduction Based on CHI and TF-IDF BT - Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering 2015 PB - Atlantis Press SN - 2352-538X UR - https://doi.org/10.2991/icmmcce-15.2015.356 DO - 10.2991/icmmcce-15.2015.356 ID - Tang2015/12 ER -