Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
Software systems in this big data era are growing larger and becoming more intricate. Tracking and managing code clones in such evolving software systems are challenging tasks. To understand how clone fragments are evolving, the programmers often analyze the co-evolution of clone fragments manually to decide about refactoring, tracking, and bug removal. Such manual analysis is infeasible for a large number of clones with clones evolving over hundreds of software revisions. We propose a visual analytics framework, that leverages big data visualization techniques to manage code clones in large software systems. Our framework combines multiple information-linked zoomable views, where users can explore and analyze clones through interactive exploration in real time. We discuss several scenarios where our framework may assist developers in real-life software development and clone maintenance. Experts' reviews reveal many future potentials of our framework.
Visual Informatics, 2019
With the era of big data approaching, the number of software systems, their dependencies, as well as the complexity of individual system are growing larger and more intricate. Understanding these evolving software systems is thus a primary challenge for cost-effective software management and maintenance. In this paper we perform a case study with evolving clones. We propose an interactive visualization system, Clone-World, that leverages the big data visualization frameworks to manage code clones in large software systems. We believe that Clone-World will not only ease the management and maintenance of clones, but also inspire future innovation to adapt visual analytics to manage big software systems. Visual investigation of how clone fragments are evolving together or in a group is important for clone refactoring, tracking, and clone related bug analysis. The programmers often need to manually analyze the co-evolution of clone fragments to decide about refactoring, tracking, and bug removal. However, manual analysis is time consuming, and nearly infeasible for large number of clones, e.g., with millions of similarity pairs, where clones are evolving over hundreds of software revisions. A few clone visualization techniques are already available in the literature, but they do not scale well with the number of software versions. In addition, a single visualization is not sufficient to capture the intangible and complex evolution of the clones. Our clone analytic system, Clone-World, gives an intuitive yet powerful solution to these problems. Clone-World combines multiple information-linked zoomable views, where users can explore and analyze clones through interactive exploration in real time. User studies and experts' reviews suggest that Clone-World may assist developers in many real-life software development and maintenance scenarios.
2007
Clones are code segments that have been created by copying-and-pasting from other code segments. Clones occur often in large software systems. It is reported that 5 to 50% of the source code of a large software system is cloned. A major challenge when studying code cloning in large software systems is handling the large amount of clone candidates produced by leading edge clone detection tools. For example, the CCFinder, clone detection tool, produces over 7 million pairs of clone candidates for the Linux kernel (which consists of over 4 MLOC). Moreover, the output of clone detection tools grows rapidly as a software system evolves. Researchers and developers need tools to help them study the large amount of clone data in order to better understand the clone phenomena in large systems. In this paper, we propose a data mining framework to help researchers cope with the large amount of data produced by clone detection tools. We propose techniques to reduce, abstract and highlight the most interesting data produced by clone detection tools. Our framework also introduces a visualization tool which allows users to query and explore clone data at various abstraction levels. We demonstrate our framework on a case study of the clone phenomena in the Linux kernel.
… Maintenance, 2009. ICSM …, 2009
Code cloning is widely recognized as a threat to the maintainability of source code. As such, many clone detection and removal strategies have been proposed. However, some clones can often not be removed easily so other strategies, based on clone management need to be developed. In this paper we describe a clone management strategy based on dynamically inferring clone relations by monitoring clipboard activity. We introduce CLONEBOARD, our Eclipse plug-in implementation that is able to track live changes to clones and offers several resolution strategies for inconsistently modified clones. We perform a user study with seven subjects to assess the adequacy, usability and effectiveness of CLONEBOARD, the results of which show that developers actually see the added value of such a tool but have strict requirements with respect to its usability. SERG De Wit et al. -Managing Code Clones Using Dynamic Change Tracking and Resolution
… of the 2006 OOPSLA workshop on …, 2006
2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation, 2014
Code cloning is a controversial software engineering practice due to contradictory claims regarding its impacts on software evolution and maintenance. While a number of studies identify some positive aspects of code clones, there is strong empirical evidence of some negative impacts of clones too. Focusing on the issues related to clones researchers suggest to manage code clones through detection, refactoring, and tracking. However, all clones in a software system are not suitable for refactoring or tracking. Thus, it is important to identify which clones we should consider for refactoring and which clones should be considered for tracking. In this research work we apply the concept of evolutionary coupling to identify clones that are important for refactoring or tracking. By mining software evolution history, we determine and analyze constrained association rules of clone fragments that evolved following a particular change pattern called Similarity Preserving Change Pattern and are important from the perspective of refactoring and tracking. According to our investigation with rigorous manual analysis on thousands of revisions of six diverse subject systems covering two programming languages, overall 13.20% of all clones in a software system are important candidates for refactoring, and overall 10.27% of all clones are important candidates for tracking. Our implemented system can automatically identify these important candidates and thus, can help us in better maintenance of code clones in terms of refactoring and tracking.
2005
The source code of software systems changes many times during the system lifecycle. We study how developers can get insight in these changes in order to understand the project context and the product artifacts. For this we propose new techniques for code evolution representation and visualization interaction from a version-centric perspective. Central to our approach is a line-based display of the changing code, where each file version is shown as a column and the horizontal axis shows time. We propose a version centric layout of line representations and a constrained interaction scheme that makes it easy to navigate. Additionally, we describe a cushion based technique to enhance visualization with information about stable evolution areas. We demonstrate the usefulness of our approach on real-life data sets.
… of the 4th International Workshop on …, 2010
Tool support for code clones can improve software quality and maintainability. While significant research has been done in locating clones in existing source code, there has been less of a research focus on proactively tracking and supporting copy-pastemodify operations, even though copying and pasting is a major source of clone formation and the resulting clones are then often modified. We designed and implemented a programming editor, based on the Eclipse integrated development environment, named CSeR (Code Segment Reuse), which keeps a record of copy-andpaste-induced clones and then tracks and visualizes the changes made to a clone with distinct colors. The core of CSeR is an algorithm that actively compares two clones for detailed differences as a programmer edits either one of them. This editbased comparison algorithm is unique to CSeR and produces more immediate, accurate, and natural results than other differencing tools.
3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis, 2005
The Visual Code Navigator (VCN) is an ongoing effort to build a visual environment for interactive visualization of large source code bases. We present two techniques that extend the previous work done on the VCN. We propose an efficient and effective mechanism for specifying and visualizing queries on the source code. Next, we show a new project evolution view that offers global insight in change correlations that span several files, and thus lets users sport possible inconsistencies, problems, or undesired project structuring. We illustrate both mechanisms using a real-life C++ source code base.
Several techniques have been proposed to identify similar code fragments in software, so-called simple clones. Identification and subsequent unification of simple clones is beneficial in software maintenance. Even further gains can be obtained by elevating the level of code clone analysis. We observed that recurring patterns of simple clones often indicate the presence of interesting higher-level similarities that we call structural clones. Structural clones show a bigger picture of similarity situation than simple clones alone. structural clones alleviate the problem of huge number of clones typically reported by simple clone detection tools, a problem that is often dealt with post detection visualization techniques. Detection of structural clones can help in understanding the design of the system for better maintenance and in reengineering for reuse, among other uses. In this paper, we propose a technique to detect some useful types of structural clones. The novelty of our approach includes the formulation of the structural clone concept and the application of data mining techniques to detect these higher-level similarities. We describe a tool called Clone Miner that implements our proposed technique. We assess the usefulness and scalability of the proposed techniques via several case studies. We discuss various usage scenarios to demonstrate in what ways the knowledge of structural clones adds value to the analysis based on simple clones alone
… in conjunction with the 6th ESEC …, 2007
Software evolution plays a key role in the overall lifecycle of a software system. In this phase, software developers extend the capabilities and functionality of the system to meet new user requirements. However, the maintenance process could rapidly lead to phenomena of" source code deterioration". The possibility to early detect bad software evolution patterns represents a paramount opportunity to keep the application maintainable. In this paper we propose a combined visualization to identify software evolution patterns ...
… , 2008. ICSM 2008. …, 2008
Code clones are similar program structures recurring in software systems. Clone detectors produce much information and a challenge is to identify useful clones depending on the goals of clone analysis. To do so, further abstraction, filtering and visualization of cloning information, with the involvement of a human expert, is required. In this paper, we describe a technique for filtering and visualization of cloning information generated by Clone Miner, a clone detection tool presented in our earlier work. Unique benefit and contribution of our approach is that a human expert can define a wide range of filters to extract abstract views of the cloning data using a clone-query system to suit specific needs of clone analysis. We then produce standardized graphical presentations of those views for various types of clone queries. We implemented the technique into an Eclipse plug-in called Clone Visualizer. Clone Visualizer works closely with Clone Miner which not only finds similar code fragments (simple clones) but also finds higher-level abstractions of the cloning information. Our method is the first attempt to address filtering and visualization of those higher level cloning abstractions. We illustrate application of our technique with examples from a clone analysis project with Clone Miner and Clone Visualizer.
2003
Abstract Code cloning—that is, the gratuitous duplication of source code within a software system—is an endemic problem in large, industrial systems [9, 7]. While there has been much research into techniques for clone detection and analysis, there has been relatively little empirical study on characterizing how, where, and why clones occur in industrial software systems.
A code clone is defined as a pair of similar code fragments within a software system. While code clones are not always harmful, they can have a detrimental effect on the overall quality of a software system due to the propagation of bugs and other maintenance implications. Because of this, software developers need to analyse the code clones that exist in a software system. However, despite the availability of several clone detection systems, the adoption of such tools outside of the clone community remains low. A possible reason for this is the difficulty and complexity involved in setting up and using these tools. In this paper, we present Clone Swarm, a code clone analytics tool that identifies clones in a project and presents the information in an easily accessible manner. Clone Swarm is publicly available and can mine any open-sourced GIT repository. Clone Swarm internally uses NiCad, a popular clone detection tool in the cloud and lets users interactively explore code clones using a webbased interface at multiple granularity levels (Function and Block level). Clone results are visualized in multiple overviews, all the way from a high-level plot down to an individual line by line comparison view of cloned fragments. Also, to facilitate future research in the area of clone detection and analysis, users can directly download the clone detection results for their projects. Clone Swarm is available online at clone-swarm.usask.ca. The source code for Clone Swarm is freely available under the MIT license on GitHub.
2006
Abstract Code duplication is a well-documented problem in industrial software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, there have been few detailed qualitative studies into how cloning actually manifests itself within software systems.
2014 IEEE International Conference on Software Maintenance and Evolution, 2014
Recently, new applications of code clone detection and search have emerged that rely upon clones detected across thousands of software systems. Big data clone detection and search algorithms have been proposed as an embedded part of these new applications. However, there exists no previous benchmark data for evaluating the recall and precision of these emerging techniques. In this paper, we present a big data clone detection benchmark that consists of known true and false positive clones in a big data inter-project Java repository. The benchmark was built by mining and then manually checking clones of ten common functionalities. The benchmark contains six million true positive clones of different clone types: Type-1, Type-2, Type-3 and Type-4, including various strengths of Type-3 similarity (strong, moderate, weak). These clones were found by three judges over 216 hours of manual validation efforts. We show how the benchmark can be used to measure the recall and precision of clone detection techniques.
IEEE Transactions on Software Engineering, 2009
Code clones are similar program structures recurring in variant forms in software system(s). Several techniques have been proposed to detect similar code fragments in software, so-called simple clones. Identification and subsequent unification of simple clones is beneficial in software maintenance. Even further gains can be obtained by elevating the level of code clone analysis. We observed that recurring patterns of simple clones often indicate the presence of interesting higher-level similarities that we call structural clones. Structural clones show a bigger picture of similarity situation than simple clones alone. Being logical groups of simple clones, structural clones alleviate the problem of huge number of clones typically reported by simple clone detection tools, a problem that is often dealt with post-detection visualization techniques. Detection of structural clones can help in understanding the design of the system for better maintenance and in reengineering for reuse, among other uses. In this paper, we propose a technique to detect some useful types of structural clones. The novelty of our approach includes the formulation of the structural clone concept and the application of data mining techniques to detect these higherlevel similarities. We describe a tool called Clone Miner that implements our proposed technique. We assess the usefulness and scalability of the proposed techniques via several case studies. We discuss various usage scenarios to demonstrate in what ways the knowledge of structural clones adds value to the analysis based on simple clones alone.
Software development has become a complex phenomenon as there are increased and ever-changing expectations from clients. In fact the development teams often feel the pressure of releases. They indulge in less than ideal approaches as well to produce code. Sometimes they cut and paste code causing code duplicates or code clones. Clones can lead to propagation of bugs and cause maintenance issues. Detection of code clones has plethora of advantages including copyright protection, elimination of duplicates by refactoring, exploration of design patterns for industry best practices and so on. Analyzing big software projects and finding duplicates is tedious task. Many researchers contributed towards identifying different kinds of clones and detection techniques. However we felt a comprehensive and extendable framework that not only supports clone detection but also visualization techniques for easy comprehension are lacking. In this paper, we propose such framework named eXtensible Software Clone Detection Framework using ontology concept (XSCDF) which is generic and supports clone detection of different languages. It provides placeholders for future techniques. We built a prototype application using Java programming language to demonstrate the proof of concept. Ontology concept is used to visualize clone detection results. The empirical results reveal that the framework has multi-language support for duplicate code detection.
Empirical Software …, 2010
2017
Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system’s code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution h...
2013
Code cloning is an inevitable phenomenon in evolution of software systems. To reduce the harmful effects of clones in software evolution, they need to be identified correctly as well in a time efficient way. There might be various types of clones in a software system. Earlier research shows detection of near-miss clones in large datasets appears to be costly in terms of time and memory. Among the clone detection tools available in practice, not very many of them are found effective in that regard. In this paper we present a standalone clone detection tool SimCad. It is based on a highly scalable and faster clone detection algorithm designed to detect both exact and near-miss clones in largescale software systems. One of the potential aspects of SimCad is that its clone detection function is made more portable by packaging it into a library called SimLib. Thus, SimLib now can be used as an off-the-shelf clone detection library that can be easily integrated into other applications that are designed to work based on detected clones. For example, a standalone tool or an Integrated Development Environment (IDE) plugin can use SimLib for realtime clone detection while providing its own services like clone visualization and/or clone management functionalities. We hope that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspects of detection and management of clones in software.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.