Proceedings of the 2015 International Conference on Electrical, Automation and Mechanical Engineering

An Improved Feature Selection Algorithm Utilizing the within Category Variance

Authors
P.J. Zhang, S.C. Gan
Corresponding Author
P.J. Zhang
Available Online July 2015.
DOI
10.2991/eame-15.2015.217How to use a DOI?
Keywords
text classification; feature selection; 2 statistics
Abstract

The 2 statistics is a commonly used and effective method of feature selection for corpus. However, it suffers several deficiencies. First, it only counts the document frequency for each feature. Secondly, this method does not distinguish among features that have different frequency distributions within a category. To overcome these shortcomings, two indexes, naming, the within category frequency and the within category variance, are introduced. Experiments are carried out to compare the traditional 2 statistics, some existing improvement, and the improved 2 statistics proposed in this paper using either naive Bayesian or SVM on the corpus collected by Fudan University and Sogou. Experimental results reveal that the improvement of this paper is effective and robust with respect to various classifiers and corpus.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International Conference on Electrical, Automation and Mechanical Engineering
Series
Advances in Engineering Research
Publication Date
July 2015
ISBN
978-94-62520-71-4
ISSN
2352-5401
DOI
10.2991/eame-15.2015.217How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - P.J. Zhang
AU  - S.C. Gan
PY  - 2015/07
DA  - 2015/07
TI  - An Improved Feature Selection Algorithm Utilizing the within Category Variance
BT  - Proceedings of the 2015 International Conference on Electrical, Automation and Mechanical Engineering
PB  - Atlantis Press
SP  - 808
EP  - 810
SN  - 2352-5401
UR  - https://doi.org/10.2991/eame-15.2015.217
DO  - 10.2991/eame-15.2015.217
ID  - Zhang2015/07
ER  -