A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization
- DOI
- 10.2991/bep-16.2017.14How to use a DOI?
- Keywords
- feature selection; samples localization; cancer classification
- Abstract
It is an important and hot topic for researchers to develop an efficient and robust feature selection method from gene expression profile data with thousands of genes and small sample size. At present, most of feature selection methods are constructed models to use all samples of gene expression data, but these methods are never considered the influence of outlier samples and the distribution of samples. Besides, it is well known that cancer is a kind of heterogeneous disease, and different cancer tissue samples of same organs have many different subtypes on molecular characteristics. So, we should select samples with the same genetic characteristics to construct models. Therefore, in this article, we proposed a novel and efficient feature selection approach based on localized samples to extract gene signatures more accurately. We picked out the nearest samples in a certain range for each target sample and obtained the best localized samples by constructing a sample-sample similarity network, which calculated Euclidean distance between the central samples with others by using gene expression values firstly. Secondly, we established the co-expression networks by selecting top nearest samples, and formed steady-state probability network applying to Random Walk with Restart (RWR) method. Finally, through dividing into this network and comparing five selection strategies, we got localized samples for best cancer classification. We applied our method on six datasets across different cancer types. The average accuracies of top 100 genes of the method by SVM classifiers in leave-one-out cross validation (LOOCV) are 95.46%, 94.01%, 96.20%, 99.79%, 99.08% and 99.37%, respectively. The results show that the proposed method obtains excellent performance on these datasets. It also indicates that the proposed method is effective and applicable.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Mingyue SHENG AU - Wei DU AU - Yuan TIAN AU - Yanchun LIANG PY - 2016/12 DA - 2016/12 TI - A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization BT - Proceedings of the 2016 International Conference on Biological Engineering and Pharmacy (BEP 2016) PB - Atlantis Press SP - 63 EP - 68 SN - 2468-5747 UR - https://doi.org/10.2991/bep-16.2017.14 DO - 10.2991/bep-16.2017.14 ID - SHENG2016/12 ER -