Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, International Journal of Advanced Science and Technology
Software is a complex entity composed in various modules with varied range of defect occurrence possibility. Efficient and timely prediction of defect occurrence in software allows software project managers to effectively utilize people, cost, time for better quality assurance. The presence of defects in a software leads to a poor quality software and also responsible for the failure of a software project. Sometime it is not possible to identify the defects and fixing them at the time of development and it is required to handle such defects any time whenever they are noticed by the team members. So it is important to predict defect-prone software modules prior to deployment of software project in order to plan better maintenance strategy. Early knowledge of defect prone software module can also help to make efficient process improvement plan within justified period of time and cost. This can further lead to better software release as well as high customer satisfaction subsequently. Accurate measurement and prediction of defect is a crucial issue in any software because it is an indirect measurement and is based on several metrics. Therefore, instead of considering all the metrics, it would be more appropriate to find out a suitable set of metrics which are relevant and significant for prediction of defects in any software modules. This paper proposes a feature selection based Linear Twin Support Vector Machine (LSTSVM) model to predict defect prone software modules. F-score, a feature selection technique, is used to determine the significant metrics set which are prominently affecting the defect prediction in a software modules. The efficiency of predictive model could be enhanced with reduced metrics set obtained after feature selection and further used to identify defective modules in a given set of inputs. This paper evaluates the performance of proposed model and compares it against other existing machine learning models. The experiment has been performed on four PROMISE software engineering repository datasets. The experimental results indicate the effectiveness of the proposed feature selection based LSTSVM predictive model on the basis standard performance evaluation parameters.
International Journal of Embedded Systems
Software is a collection of computer programs written in a programming language. Software contains various modules which make it a complex entity and it can increase the defect probability at the time of development of the modules. In turn, cost and time to develop the software can be increased. Sometimes, these defects can lead to failure of entire software. It will lead to untimely delivery of the software to the customer. This untimely delivery can responsible for withdrawal or cancellation of project in future. Hence, in this research work, some machine learning algorithms are applied to ensure timely delivery and prediction of defects. Further, several feature selection techniques are also adopted to determine relevant features for defect prediction.
2020
Feature selection is a technique used to select an optimal feature subset from the original input features according to a specific criterion. The criterion is often formulated as an objective function that finds which features are most appropriate for some tasks at hand. The reason why it is interesting to find a subset of features is because that it always easier to solve a problem in a lower dimension. This helps in understanding the nonlinear mapping between input and output variables. This paper reviewed the basic Feature Selection Techniques for Software Defect Prediction Model and their domain applications. The Subsets selection are categorized into three distinct models and are discussed in a concise form to provide young researchers with the general methods of Subset Selection. Support Vector Machine with Recursive Feature Elimination for both Logistic Regression and Random Forest was introduced to evaluate the performance between filter, wrapper, and embedded feature select...
Journal of Physics: Conference Series, 2020
Advances in technology have increased the use and complexity of software. The complexity of the software can increase the possibility of defects. Defective software can cause high losses. Fixing defective software requires a high cost because it can spend up 50% of the project schedule. Most software developers don't document their work properly so that making it difficult to analyse software development history data. Software metrics which use in cross-project software defects prediction have many features. Software metrics usually consist of various measurement techniques, so there are possibilities for their features to be similar. It is possible that these features are similar or irrelevant so that they can cause a decrease in the performance of classifiers. In this study, several feature selection techniques were proposed to select the relevant features. The classification algorithm used is Naive Bayes. Based on the analysis using ANOVA, the SBS and SBFS models can significantly improve the performance of the Naïve Bayes model.
Symmetry
Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in the respective scope of studies. It is hence critical to conduct an extensive empirical study to address these contradictions to guide researchers and buttress the scientific tenacity of experimental conclusions. In this study, we investigated the impact of 46 FS methods using Naïve Bayes and Decision Tree classifiers over 25 software defect datasets from 4 software repositories (NASA, PROMISE, ReLink, and AEEEM). The ensuing prediction models were evaluated based on accuracy and AUC va...
Inteligencia Artificial, 2021
The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the...
International Journal of Computer Applications, 2014
Feature subset selection is the process of choosing a subset of good features with respect to the target concept. A clustering based feature subset selection algorithm has been applied over software defect prediction data sets. Software defect prediction domain has been chosen due to the growing importance of maintaining high reliability and high quality for any software being developed. A software quality prediction model is built using software metrics and defect data collected from a previously developed system release or similar software projects. Upon validation of such a model, it could be used for predicting the fault-proneness of program modules that are currently under development. The proposed clustering based algorithm for feature selection uses minimum spanning tree based method to cluster features. And then the algorithm is applied over four different data sets and its impact is analyzed.
International Journal on Electrical Engineering and Informatics
Software engineering activities comprise of several activities to ensure that the quality product will be achieved at the end. Some of these activities are software testing, inspection, formal verification and software defect prediction. Many researchers have been developed several models for defect prediction. These models are based on machine learning techniques and statistical analysis techniques. The main objective of these models are to identify the defects before the delivery of the software to the end user. This prediction helps project managers to effectively utilize the resources for better quality assurance. Sometimes, a single defect can cause the entire system failure and most of the time they drop the quality of the software system drastically. Early identification of defects can also help to make a better process plan which can handle the defects effectively and increase the customer satisfaction level. But the accurate prediction of defects in software is not an easy task because this is an indirect measure. Therefore, it is important to find suitable and significant measures which are most relevant for finding the defects in the software system. This paper presents a feature selection based model to predict the defects in a given software module. The most relevant features are extracted from all features with the help of seven feature selection techniques and eight classifiers are used to classify the modules. Six NASA software engineering defects prediction data sets are used in this work. Several performance parameters are also calculated for measuring the performance and validation of this work and the results of the experiments revealed that the proposed model has more capability to predict the software defects.
International Journal of Bio-Science and Bio-Technology, 2014
Software is a complex entity composed in various modules with varied range of defect occurrence possibility. Efficient and timely prediction of defect occurrence in software allows software project managers to effectively utilize people, cost, time for better quality assurance. The presence of defects in a software leads to a poor quality software and also responsible for the failure of a software project. Sometime it is not possible to identify the defects and fixing them at the time of development and it is required to handle such defects any time whenever they are noticed by the team members. So it is important to predict defect-prone software modules prior to deployment of software project in order to plan better maintenance strategy. Early knowledge of defect prone software module can also help to make efficient process improvement plan within justified period of time and cost. This can further lead to better software release as well as high customer satisfaction subsequently. Accurate measurement and prediction of defect is a crucial issue in any software because it is an indirect measurement and is based on several metrics. Therefore, instead of considering all the metrics, it would be more appropriate to find out a suitable set of metrics which are relevant and significant for prediction of defects in any software modules. This paper proposes a feature selection based Linear Twin Support Vector Machine (LSTSVM) model to predict defect prone software modules. F-score, a feature selection technique, is used to determine the significant metrics set which are prominently affecting the defect prediction in a software modules. The efficiency of predictive model could be enhanced with reduced metrics set obtained after feature selection and further used to identify defective modules in a given set of inputs. This paper evaluates the performance of proposed model and compares it against other existing machine learning models. The experiment has been performed on four PROMISE software engineering repository datasets. The experimental results indicate the effectiveness of the proposed feature selection based LSTSVM predictive model on the basis standard performance evaluation parameters.
Applied Sciences
Software Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the performance of SDP models. Feature selection (FS) is a proven method for addressing the dimensionality problem. However, the choice of FS method for SDP is still a problem, as most of the empirical studies on FS methods for SDP produce contradictory and inconsistent quality outcomes. Those FS methods behave differently due to different underlining computational characteristics. This could be due to the choices of search methods used in FS because the impact of FS depends on the choice of search method. It is hence imperative to comparatively analyze the FS methods performance based on different search methods in SDP. In this paper, four filter feature ranking (FFR) and fourteen filter feature su...
Software fault prediction plays a vital role in software quality assurance. Identifying the faulty modules helps to better concentrate on those modules and helps improve the quality of the software. With increasing complexity of software nowadays feature selection is important to remove the redundant, irrelevant and erroneous data from the dataset. In general, Feature selection is done mainly based on filter and wrapper. In this paper a hybrid feature selection method is proposed which gives a better prediction than the traditional methods. NASA’s public dataset KC1 available at promise software engineering repository is used. To evaluate the performance of the software fault prediction models Accuracy, Mean absolute error (MAE), Root mean squared error (RMSE) values are used.
Software: Practice and Experience, 2011
The selection of software metrics for building software quality prediction models is a search-based software engineering problem. An exhaustive search for such metrics is usually not feasible due to limited project resources, especially if the number of available metrics is large. Defect prediction models are necessary in aiding project managers for better utilizing valuable project resources for software quality improvement. The efficacy and usefulness of a fault-proneness prediction model is only as good as the quality of the software measurement data. This study focuses on the problem of attribute selection in the context of software quality estimation. A comparative investigation is presented for evaluating our proposed hybrid attribute selection approach, in which feature ranking is first used to reduce the search space, followed by a feature subset selection. A total of seven different feature ranking techniques are evaluated, while four different feature subset selection approaches are considered. The models are trained using five commonly used classification algorithms. The case study is based on software metrics and defect data collected from multiple releases of a large real-world software system. The results demonstrate that while some feature ranking techniques performed similarly, the automatic hybrid search algorithm performed the best among the feature subset selection methods. Moreover, performances of the defect prediction models either improved or remained unchanged when over 85% of the software metrics were eliminated.
Journal of Systems Engineering and Electronics, 2021
Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predict defects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and remove the redundant and irrelevant features in software defect datasets, we propose ReliefF-based clustering (RFC), a clusterbased feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. According to the correlation degree, RFC partitions features into k clusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defect prediction datasets in terms of area under curve (AUC) and Fvalue. The experimental results show that RFC can effectively improve the performance of SDP.
IEEE Access
In the traditional software defect prediction methodology, the historical record (dataset) of the same project is partitioned into training and testing data. In a practical situation where the project to be predicted is new, traditional software defect prediction cannot be employed. An alternative method is cross-project defect prediction, where the historical record of one project (source) is used to predict the defect status of another project (target). The cross-project defect prediction method solves the limitations of the historical records in the traditional software defect prediction method. However, the performance of cross-project defect prediction is relatively low because of the distribution differences between the source and target projects. Furthermore, the software defect dataset used for cross-project defect prediction is characterized by high-dimensional features, some of which are irrelevant and contribute to low performance. To resolve these two issues, this study proposes a transformation and feature selection approach to reduce the distribution difference and high-dimensional features in cross-project defect prediction. A comparative experiment was conducted on publicly available datasets from the AEEEM. Analysis of the results obtained shows that the proposed approach in conjugation with random forest as the classification model outperformed the other four state-of-the-art cross-project defect prediction methods based on the commonly used performance evaluation metric F1_score.
International Journal of Intelligent Systems and Applications, 2019
Machine Learning is a division of Artificial Intelligence which builds a system that learns from the data. Machine learning has the capability of taking the raw data from the repository which can do the computation and can predict the software bug. It is always desirable to detect the software bug at the earliest so that time and cost can be reduced. Feature selection technique wrapper and filter method is used to find the most optimal software metrics. The main aim of the paper is to find the best model for the software bug prediction. In this paper machine learning techniques linear Regression, Random Forest, Neural Network, Support Vector Machine, Decision Tree, Decision Stump are used and comparative analysis has been done using performance parameters such as correlation, R-squared, mean square error, accuracy for software modules named as ant, ivy, tomcat, berek, camel, lucene, poi, synapse and velocity. Support vector machine outperform as compare to other machine learning model.
2009 21st IEEE International Conference on Tools with Artificial Intelligence, 2009
Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics-we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) and our proposed hybrid feature selection (HFS) technique. Our case study consists of a very highdimensional (42 software attributes) software measurement data set obtained from a large telecommunications system. The empirical analysis indicates that HFS performs better than FRT; however, the Kolmogorov-Smirnov feature ranking technique demonstrates competitive performance. For the telecommunications system, it is found that only 10% of the software attributes are sufficient for effective software quality prediction.
2009 International Conference on Machine Learning and Applications, 2009
The dataset of software metrics in general are not balanced (unbalanced). An imbalance distribution of classes and attributes that are not relevant may decrease the performance of the model prediction software defect, because the majority of the class predictions tend to produce than minority class. This research uses a public dataset from NASA (National Aeronautics and Space Administration) MDP (Metrics Data Program) repository. This research aims to reduce the influence of class imbalance in the dataset, so that performance can be improved in the classification of defect prediction software. The model proposed in this research is applying the technique feature selection with Particle Swarm Optimization (PSO), approaches the level of data, by using Random Under Sampling (RUS) and SMOTE (Synthetic Minority Over-sampling Technique) and (Ensemble) Bagging with Naive Bayes Classifier. Research results show that the proposed model can improve the performance of Naive Bayes of the overall value of the AUC (Area Under Curve) reached > 0.8. Statistical tests indicate that there is a significant difference between a Naive Bayes model with the model proposed by the pvalue (0.043) smaller than the alpha values (0.05) which means there is a significant difference between the two models.
International Journal of Recent Technology and Engineering (IJRTE), 2019
software defect prediction (sdp) technique was projected to designate testing assets sanely, decide the testing want of assorted modules of the software system, and improve programming quality. By utilizing the implications of sdp, programming specialists will fruitfully pass judgment on it that software system modules area unit sure to be blemished, the conceivable range of imperfections in a very module or different information known with software system defects before testing the software system [1]. Existing sdp studies may be divided into four types: (1) classification, (2) regression, (3) mining association rules, (4) ranking. The primary aim of the primary class is to classification of the software system entities like functions, classes, files, etc into completely different levels of severity with the assistance of various applied math techniques like supply regression [2] and discriminant analysis [3] and techniques of machine learning like svm [4] and ann [5]. The second k...
Recently, software defect prediction is an important research topic in the software engineering field. The accurate prediction of defect prone software modules can help the software testing effort, reduce costs, and improve the software testing process by focusing on fault-prone module. Software defect data sets have an imbalanced nature with very few defective modules compared to defect-free ones. The software defect prediction performance also decreases significantly because the dataset contains noisy attributes. In this research, we propose the combination of genetic algorithm and bagging technique for improving the performance of the software defect prediction. Genetic algorithm is applied to deal with the feature selection, and bagging technique is employed to deal with the class imbalance problem. The proposed method is evaluated using the data sets from NASA metric data repository. Results have indicated that the proposed method makes an impressive improvement in prediction performance for most classifiers.
Electronics, 2022
By creating an effective prediction model, software defect prediction seeks to predict potential flaws in new software modules in advance. However, unnecessary and duplicated features can degrade the model's performance. Furthermore, past research has primarily used standard machine learning techniques for fault prediction, and the accuracy of the predictions has not been satisfactory. Extreme learning machines (ELM) and support vector machines (SVM) have been demonstrated to be viable in a variety of fields, although their usage in software dependability prediction is still uncommon. We present an SVM and ELM-based algorithm for software reliability prediction in this research, and we investigate factors that influence prediction accuracy. These worries incorporate, first, whether all previous disappointment information ought to be utilized and second, which type of disappointment information is more fitting for expectation precision. In this article, we also examine the accuracy and time of SVM and ELM-based software dependability prediction models. Then, after the comparison, we receive experimental results that demonstrate that the ELM-based reliability prediction model may achieve higher prediction accuracy with other parameters, such as specificity, recall, precision, and F1-measure. In this article, we also propose a model for how feature selection utilization with ELM and SVM. For testing, we used NASA Metrics datasets. Further, in both technologies, we are implementing feature selection techniques to get the best result in our experiment. Due to the imbalance in our dataset, we initially applied the resampling method before implementing feature selection techniques to obtain the highest accuracy.
Information Systems Frontiers, 2013
Two important problems which can affect the performance of classification models are highdimensionality (an overabundance of independent features in the dataset) and imbalanced data (a skewed class distribution which creates at least one class with many fewer instances than other classes). To resolve these problems concurrently, we propose an iterative feature selection approach, which repeated applies data sampling (in order to address class imbalance) followed by feature selection (in order to address high-dimensionality), and finally we perform an aggregation step which combines the ranked feature lists from the separate iterations of sampling. This approach is designed to find a ranked feature list which is particularly effective on the more balanced dataset resulting from sampling while minimizing the risk of losing data through the sampling step and missing important features. To demonstrate this technique, we employ 18 different feature selection algorithms and Random Undersampling with two post-sampling class distributions. We also investigate
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.