Data Preprocessing and Classification for Taproot site data sets of Panax notoginseng
- DOI
- 10.2991/mic-15.2015.29How to use a DOI?
- Keywords
- AdaBoost.M1, authentic-region herbs, Random Forest; data preprocessing
- Abstract
The herbs from different producing regions have differences in the active constituents and efficacy. The quality of the herb from the authentic region is better than other producing regions. Nowadays, many peddlers substitute non-authentic herbs for authentic-region herbs in order to make more money. So it is important to distinguish herbs between different producing regions. This paper studies the data preprocessing and classification of taproot site data sets of Panax notoginseng from three different producing regions. Compare the effect of data preprocessing includes data standardization, instance selection, attribute selection and try to find out the best method and parameter settings for the data sets. Finally, we use different classification algorithms to classify the preprocessed data and compare the classification performance to find the optimal classification algorithm for the data sets. The classification performance in the experiment was evaluated by Percent Correct (PC), Mean Squared Error (MSE), Kappa Statistics (KS), Area Under ROC (AUR), Mean Absolute Error (MAE). The results shows that using decimal scaling to standardize the data and choose the subset of attribute {1,2,4,6,7,8}is suitable for the data and Random Forest algorithm and AdaBoost.M1 algorithm are the optimal classification algorithm for this data sets which has better classification performance.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Huang Dao AU - He Jin PY - 2015/08 DA - 2015/08 TI - Data Preprocessing and Classification for Taproot site data sets of Panax notoginseng BT - Proceedings of the 2nd International Conference on Modelling, Identification and Control PB - Atlantis Press SP - 131 EP - 134 SN - 1951-6851 UR - https://doi.org/10.2991/mic-15.2015.29 DO - 10.2991/mic-15.2015.29 ID - Dao2015/08 ER -