A Combined-Learning Based Framework for Improved Software Fault Prediction
- DOI
- 10.2991/ijcis.2017.10.1.43How to use a DOI?
- Keywords
- Software Fault Prediction; Software Metrics; Feature Selection; Data Balancing; Machine Learning
- Abstract
Software Fault Prediction (SFP) is found to be vital to predict the fault-proneness of software modules, which allows software engineers to focus development activities on fault-prone modules, thereby prioritize and optimize tests, improve software quality and make better use of resources. In this regard, machine learning has been successfully applied to solve classification problems for SFP. Nevertheless, the presence of different software metrics, the redundant and irrelevant features and the imbalanced nature of software datasets have created more and more challenges for the classification problems. Therefore, the objective of this study is to independently examine software metrics with multiple Feature Selection (FS) combined with Data Balancing (DB) using Synthetic Minority Oversampling Techniques for improving classification performance. Accordingly, a new framework that efficiently handles those challenges in a combined form on both Object Oriented Metrics (OOM) and Static Code Metrics (SCM) datasets is proposed. The experimental results confirm that the prediction performance could be compromised without suitable Feature Selection Techniques (FST). To mitigate that, data must be balanced. Thus our combined technique assures the robust performance. Furthermore, a combination of Random Forts (RF) with Information Gain (IG) FS yields the highest Receiver Operating Characteristic (ROC) curve (0.993) value, which is found to be the best combination when SCM are used, whereas the combination of RF with Correlation-based Feature Selection (CFS) guarantees the highest ROC (0.909) value, which is found to be the best choice when OOM are used. Therefore, as shown in this study, software metrics used to predict the fault proneness of the software modules must be carefully examined and suitable FST for software metrics must be cautiously selected. Moreover, DB must be applied in order to obtain robust performance. In addition to that, dealing with the challenges mentioned above, the proposed framework ensures the remarkable classification performance and lays the pathway to quality assurance of software.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).
Download article (PDF)
View full text (HTML)
Cite this article
TY - JOUR AU - Chubato Wondaferaw Yohannese AU - Tianrui Li PY - 2017 DA - 2017/01/25 TI - A Combined-Learning Based Framework for Improved Software Fault Prediction JO - International Journal of Computational Intelligence Systems SP - 647 EP - 662 VL - 10 IS - 1 SN - 1875-6883 UR - https://doi.org/10.2991/ijcis.2017.10.1.43 DO - 10.2991/ijcis.2017.10.1.43 ID - Yohannese2017 ER -