Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2017, Journal of Systems and Software
Background : Co-change prediction makes developers aware of which artifacts will change together with the artifact they are working on. In the past, researchers relied on structural analysis to build prediction models. More recently, hybrid approaches relying on historical information and textual analysis have been proposed. Despite the advances in the area, software developers still do not use these approaches widely, presumably because of the number of false recommendations. We conjecture that the contextual information of software changes collected from issues, developers' communication, and commit metadata captures the change patterns of software artifacts and can improve the prediction models. Objective : Our goal is to develop more accurate co-change prediction models by using contextual information from software changes. Method : We selected pairs of files based on relevant association rules and built a prediction model for each pair relying on their associated contextual information. We evaluated our approach on two open source projects, namely Apache CXF and Derby. Besides calculating model accuracy metrics, we also performed a feature selection analysis to identify the best predictors when characterizing co-changes and to reduce overfitting. Results : Our models presented low rates of false negatives (∼8% average rate) and false positives (∼11% average rate). We obtained prediction models with AUC values ranging from 0.89 to 1.00 and our models outperformed association rules, our baseline model, when we compared their precision values. Commit-related metrics were the most frequently selected ones for both projects. On average, 6 out of 23 metrics were necessary to build the classifiers. Conclusions : Prediction models based on contextual information from software changes are accurate and, consequently, they can be used to support software maintenance and evolution, warning developers when they miss relevant artifacts while performing a software change.
Software Quality Journal, 2019
Models that predict software artifact co-changes have been proposed to assist developers in altering a software system and they often rely on coupling. However, developers have not yet widely adopted these approaches, presumably because of the high number of false recommendations. In this work, we conjecture that the contextual information related to software changes, which is collected from issues (e.g., issue type and reporter), developers' communication (e.g., number of issue comments, issue discussants and words in the discussion), and commit metadata (e.g., number of lines added, removed, and modified), improves the accuracy of co-change prediction. We built customized prediction models for each co-change and evaluated the approach on 129 releases from a curated set of 10 Apache Software Foundation projects. Comparing our approach with the widely used association rules as a baseline, we found that contextual information models and association rules provide a similar number of cochange recommendations, but our models achieved a significantly higher F-measure. In particular, we found that contextual information significantly reduces the number of false recommendations compared to the baseline model. We conclude that contextual information is an important source for supporting change prediction and may be used to warn developers when they are about to miss relevant artifacts while performing a software change.
2019
Background: The importance of Software Change Prediction (SCP) has been emphasized by several studies. Numerous prediction models in literature claim to effectively predict change-prone classes in software products. These models help software managers in optimizing resource usage and in developing good quality, easily maintainable products. Aim: There is an urgent need to compare and assess these numerous SCP models in order to evaluate their effectiveness. Moreover, one also needs to assess the advancements and pitfalls in the domain of SCP to guide researchers and practitioners. Method: In order to fulfill the above stated aims, we conduct an extensive literature review of 38 primary SCP studies from January 2000 to June 2019. Results: The review analyzes the different set of predictors, experimental settings, data analysis techniques, statistical tests and the threats involved in the studies, which develop SCP models. Conclusion: Besides, the review also provides future guidelines to researchers in the SCP domain, some of which include exploring methods for dealing with imbalanced training data, evaluation of search-based algorithms and ensemble of algorithms for SCP amongst others.
IEEE Transactions on Software Engineering, 2005
We apply data mining to version histories in order to guide programmers along related changes: "Programmers who changed these functions also changed...." Given a set of existing changes, the mined association rules 1) suggest and predict likely further changes, 2) show up item coupling that is undetectable by program analysis, and 3) can prevent errors due to incomplete changes. After an initial change, our ROSE prototype can correctly predict further locations to be changed; the best predictive power is obtained for changes to existing software. In our evaluation based on the history of eight popular open source projects, ROSE's topmost three suggestions contained a correct location with a likelihood of more than 70 percent.
IEEE Transactions on Software Engineering, 2004
Software developers are often faced with modification tasks that involve source which is spread across a code base. Some dependencies between source code, such as those between source code written in different languages, are difficult to determine using existing static and dynamic analyses. To augment existing analyses and to help developers identify relevant source code during a modification task, we have developed an approach that applies data mining techniques to determine change patterns-sets of files that were changed together frequently in the past-from the change history of the code base. Our hypothesis is that the change patterns can be used to recommend potentially relevant source code to a developer performing a modification task. We show that this approach can reveal valuable dependencies by applying the approach to the Eclipse and Mozilla open source projects and by evaluating the predictability and interestingness of the recommendations produced for actual modification tasks on these systems.
Journal of Software, 2008
An estimation of change-proneness of parts of a software system is an active topic in the area of software engineering. Such estimates can be used to predict changes to different classes of a system from one release to the next. They can also be used to estimate and possibly reduce the effort required during the development and maintenance phase by balancing the amount of developers' time assigned to each part of a software system. This research work proposes a novel approach to predict changes in an object-oriented software system. The rationale behind this approach is that in a well-designed software system, feature enhancement or corrective maintenance should affect a limited amount of existing code. Our goal is to quantify this aspect of quality by assessing the probability that each class will change in a future generation. Our proposed probabilistic approach uses the dependencies obtained from the UML diagrams, as well as other code metrics extracted from source code of several releases of a software system using reverse engineering techniques. These measures, combined with the change log of the software system and the expected time of next release, are used in an automated manner to predict whether a class will change in the next release of the software system. The proposed systematic approach has been evaluated on a multiversion medium sized open source project namely JFlex, the Fast Scanner Generator for Java. The obtained results indicate the simplicity and accuracy of our approach in the comparison with existing methods referred in the literature.
Proceedings - International Conference on Software Engineering, 2010
Change prediction helps developers by recommending program entities that will have to be changed alongside the entities currently being changed. To evaluate their accuracy, current change prediction approaches use data from versioning systems such as CVS or SVN. These data sources provide a coarse-grained view of the development history that flattens the sequence of changes in a single commit. They are thus not a valid basis for evaluation in the case of developmentstyle prediction, where the order of the predictions has to match the order of the changes a developer makes.
2007
Abstract We present a tool that predicts whether the software under development inside an IDE has a bug. An IDE plugin performs this prediction, using the Change Classification technique to classify source code changes as buggy or clean during the editing session. Change Classification uses Support Vector Machines (SVM), a machine learning classifier algorithm, to classify changes to projects mined from their configuration management repository.
Journal of Information Technology Research, 2020
When changes are made to software applications often, defects can occur in software applications, and eventually leads to expensive operational faults. Comprehensive testing is not feasible with the limited time and resources available. There is a need for test case selection and prioritization so that testing can be completed with maximum confidence in a minimum time. Advance knowledge of co-changed classes in software applications can be very useful during the software maintenance phase. In this article, the authors have proposed a co-change prediction model based upon the combination of structural code measures and dynamic revision history from change repository. Univariate analysis is applied to identify the useful measures in co-change identification. The proposed model is validated using eight open source software applications. The results are promising and indicate that they can be very beneficial in maintenance of software applications.
ACM Symposium on Applied Computing, 2008
Source control systems permit developers to attach a free form message to every committed,change. The content of these change messages,can support software maintenance activities. We present an automated,approach to classify a change message as either a bug fix, a feature introduction, or a general maintenance,change. Researchers can study the evolution of project using our classification. For ex- ample, researchers
The importance of human-related factors in the introduction of bugs has recently been the subject of a number of empirical studies. However, such factors have not been captured yet in bug prediction models which simply exploit product metrics or process metrics based on the number and type of changes or on the number of developers working on a software component. Previous studies have demonstrated that focused developers are less prone to introduce defects than non focused developers. According to this observation, software components changed by focused developers should also be less error prone than software components changed by less focused developers. In this paper we capture this observation by measuring the structural and semantic scattering of changes performed by the developers working on a software component and use these two measures to build a bug prediction model. Such a model has been evaluated on five open source systems and compared with two competitive prediction mod...
The goal of Software Change Impact Analysis is to identify artifacts (typically source-code files) potentially affected by a change. Recently, there is an increased interest in mining software change impact based on evolutionary coupling. A particularly promising approach uses association rule mining to uncover potentially affected artifacts from patterns in the system's change history. Two main considerations when using this approach are the history length, the number of transactions from the change history used to identify the impact of a change, and history age, the number of transactions that have occurred since patterns were last mined from the history. Although history length and age can significantly affect the quality of mining results, few guidelines exist on how to best select appropriate values for these two parameters. In this paper, we empirically investigate the effects of history length and age on the quality of change impact analysis using mined evolutionary couplings. Specifically, we report on a series of systematic experiments involving the change histories of two large industrial systems and 17 large open source systems. In these experiments, we vary the length and age of the history used to mine software change impact, and assess how this affects precision and applicability. Results from the study are used to derive practical guidelines for choosing history length and age when applying association rule mining to conduct software change impact analysis.
2014 IEEE Biennial Congress of Argentina (ARGENCON), 2014
Determining the critical parts of a system is key to effectively conduct preventive software maintenance. To accomplish this task, information from the system history (i.e., past versions) can be helpful for identifying those software elements that are more likely to receive modifications in the near future. However, interpreting the large amount of data usually present in the history can be difficult. In this work, we propose an approach adapted from financial markets for analyzing the history of an object-oriented system and predicting the classes that might change. We have evaluated our approach by comparing it with existing approaches. The results, although preliminary, show that our approach makes more accurate predictions for class changes.
2004
Software systems contain entities, such as functions and variables, which are related to each other. As a software system evolves to accommodate new features and repair bugs, changes occur to these entities. Developers must ensure that related entities are updated to be consistent with these changes.
Software change impact analysis aims to find artifacts potentially affected by a change. Typical approaches apply language-specific static or dynamic dependence analysis, and are thus restricted to homogeneous systems. This restriction is a major drawback given today's increasingly heterogeneous software. Evolutionary coupling has been proposed as a language-agnostic alternative that mines relations between source-code entities from the system's change history. Unfortunately, existing evolutionary coupling based techniques fall short. For example, using Singular Value Decomposition (SVD) quickly becomes computationally expensive. An efficient alternative applies targeted association rule mining, but the most widely known approach (ROSE) has restricted applicability: experiments on two large industrial systems, and four large open source systems, show that ROSE can only identify dependencies about 25% of the time. To overcome this limitation, we introduce TARMAQ, a new algorithm for mining evolutionary coupling. Empirically evaluated on the same six systems, TARMAQ performs consistently better than ROSE and SVD, is applicable 100% of the time, and runs orders of magnitude faster than SVD. We conclude that the proposed algorithm is a significant step forward towards achieving robust change impact analysis for heterogeneous systems.
2010 17th Working Conference on Reverse Engineering, 2010
The paper presents an approach that combines conceptual and evolutionary techniques to support change impact analysis in source code. Information Retrieval (IR) is used to derive conceptual couplings from the source code in a single version (release) of a software system. Evolutionary couplings are mined from source code commits. The premise is that such combined methods provide improvements to the accuracy of impact sets. A rigorous empirical assessment on the changes of the open source systems Apache httpd, ArgoUML, iBatis, and KOffice is also reported. The results show that a combination of these two techniques, across several cut points, provides statistically significant improvements in accuracy over either of the two techniques used independently. Improvements in recall values of up to 20% over the conceptual technique in KOffice and up to 45% over the evolutionary technique in iBatis were reported.
Abstract� Data mining algorithms have been recently applied to software repositories to help on the maintenance of evolving software systems. In the past, information about what classes changed together, obtained by mining software repositories, were used to guide future changes. We use this information to measure the possible impacts of a proposed change. In this paper we propose and compare two approaches for sorting impact analysis results that use two different data mining algorithms: Apriori and DAR. Even though Apriori is a classic and largely used algorithm, the case study shows that the approach with DAR is less complex and more suitable for measuring the impacts of a change.
IEEE Transactions on Software Engineering, 2019
As new requirements are introduced and implemented in a software system, developers must identify the set of source code classes which need to be changed. Therefore, past effort has focused on predicting the set of classes impacted by a requirement. In this paper, we introduce and evaluate a new type of information based on the intuition that the set of requirements which are associated with historical changes to a specific class are likely to exhibit semantic similarity to new requirements which impact that class. This new Requirements to Requirements Set (R2RS) family of metrics captures the semantic similarity between a new requirement and the set of existing requirements previously associated with a class. The aim of this paper is to present and evaluate the usefulness of R2RS metrics in predicting the set of classes impacted by a requirement. We consider 18 different R2RS metrics by combining six natural language processing techniques to measure the semantic similarity among texts (e.g., VSM) and three distribution scores to compute overall similarity (e.g., average among similarity scores). We evaluate if R2RS is useful for predicting impacted classes in combination and against four other families of metrics that are based upon temporal locality of changes, direct similarity to code, complexity metrics, and code smells. Our evaluation features five classifiers and 78 releases belonging to four large open-source projects, which result in over 700,000 candidate impacted classes. Experimental results show that leveraging R2RS information increases the accuracy of predicting impacted classes practically by an average of more than 60% across the various classifiers and projects.
2007
The analysis of the evolution of software systems is a useful source of information for a variety of activities, such as reverse engineering, maintenance, and predicting the future evolution of these systems. Current software evolution research is mainly based on the information contained in versioning systems such as CVS and SubVersion. But the evolutionary information contained therein is incomplete and of low quality, hence limiting the scope of evolution research. It is incomplete because the historical information is only recorded at the explicit request of the developers (a commit in the classical checkin/checkout model). It is of low quality because the file-based nature of versioning systems leads to a view of software as being a set of files. In this paper we present a novel approach to software evolution analysis which is based on the recording of all semantic changes performed on a system, such as refactorings. We describe our approach in detail, and demonstrate how it can be used to perform fine-grained software evolution analysis.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.