A Combined-Learning Based Framework for Improved Software Fault Prediction

Chubato Wondaferaw Yohannese; Tianrui Li

doi:10.2991/ijcis.2017.10.1.43

<Previous Article In Issue

Next Article In Issue>

Volume 10, Issue 1, 2017, Pages 647 - 662

A Combined-Learning Based Framework for Improved Software Fault Prediction

Authors

Chubato Wondaferaw Yohannesefreewwin@yahoo.com, Tianrui Litrli@swjtu.edu.cn

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China

Received 21 August 2016, Accepted 10 January 2017, Available Online 25 January 2017.

DOI: 10.2991/ijcis.2017.10.1.43 How to use a DOI?
Keywords: Software Fault Prediction; Software Metrics; Feature Selection; Data Balancing; Machine Learning
Abstract: Software Fault Prediction (SFP) is found to be vital to predict the fault-proneness of software modules, which allows software engineers to focus development activities on fault-prone modules, thereby prioritize and optimize tests, improve software quality and make better use of resources. In this regard, machine learning has been successfully applied to solve classification problems for SFP. Nevertheless, the presence of different software metrics, the redundant and irrelevant features and the imbalanced nature of software datasets have created more and more challenges for the classification problems. Therefore, the objective of this study is to independently examine software metrics with multiple Feature Selection (FS) combined with Data Balancing (DB) using Synthetic Minority Oversampling Techniques for improving classification performance. Accordingly, a new framework that efficiently handles those challenges in a combined form on both Object Oriented Metrics (OOM) and Static Code Metrics (SCM) datasets is proposed. The experimental results confirm that the prediction performance could be compromised without suitable Feature Selection Techniques (FST). To mitigate that, data must be balanced. Thus our combined technique assures the robust performance. Furthermore, a combination of Random Forts (RF) with Information Gain (IG) FS yields the highest Receiver Operating Characteristic (ROC) curve (0.993) value, which is found to be the best combination when SCM are used, whereas the combination of RF with Correlation-based Feature Selection (CFS) guarantees the highest ROC (0.909) value, which is found to be the best choice when OOM are used. Therefore, as shown in this study, software metrics used to predict the fault proneness of the software modules must be carefully examined and suitable FST for software metrics must be cautiously selected. Moreover, DB must be applied in order to obtain robust performance. In addition to that, dealing with the challenges mentioned above, the proposed framework ensures the remarkable classification performance and lays the pathway to quality assurance of software.
Copyright: © 2017, the Authors. Published by Atlantis Press.
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

<Previous Article In Issue

Next Article In Issue>

Journal: International Journal of Computational Intelligence Systems
Volume-Issue: 10 - 1
Pages: 647 - 662
Publication Date: 2017/01/25
ISSN (Online): 1875-6883
ISSN (Print): 1875-6891
DOI: 10.2991/ijcis.2017.10.1.43 How to use a DOI?
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Chubato Wondaferaw Yohannese
AU  - Tianrui Li
PY  - 2017
DA  - 2017/01/25
TI  - A Combined-Learning Based Framework for Improved Software Fault Prediction
JO  - International Journal of Computational Intelligence Systems
SP  - 647
EP  - 662
VL  - 10
IS  - 1
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.2017.10.1.43
DO  - 10.2991/ijcis.2017.10.1.43
ID  - Yohannese2017
ER  -

download .riscopy to clipboard