Text Categorization Based on a Similarity Approach
Authors
Cha Yang1, Jun Wen
1School of Computer Science & Engineering, University of Electronic Science and Technology of China
Corresponding Author
Cha Yang
Available Online October 2007.
- DOI
- 10.2991/iske.2007.138How to use a DOI?
- Keywords
- text classification; Term Frequency/Inverse Document frequency (TFIDF); feature selection; vector space model; word frequency; similarity
- Abstract
Text classification can efficiently enhance the text processing capability by automatically sorting out them according to defined collection of categories. This paper uses TFIDF method to represent documents, and set the NGramSize value to be 6. Word Frequency vector is used to measure and distinguish different features on documents. The Similarity Approach uses Cosine function to construct the classifier. The experiment results indicate that proposed algorithm yields good performance with the accuracy up to 98%
- Copyright
- © 2007, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Cha Yang AU - Jun Wen PY - 2007/10 DA - 2007/10 TI - Text Categorization Based on a Similarity Approach BT - Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007) PB - Atlantis Press SP - 807 EP - 811 SN - 1951-6851 UR - https://doi.org/10.2991/iske.2007.138 DO - 10.2991/iske.2007.138 ID - Yang2007/10 ER -