Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007
Teamwork is the typical characteristic of software development, because the tasks can be splitted and parallelized. The independently working developers use Software Configuration Management (SCM) systems to apply version control to their files and to keep them consistent. Several SCM systems allow working on the same files concurrently, and attempt to auto-merge the files in order to facilitate the reconciliation of the parallel modifications. The merge should produce syntactically and semantically correct source code files, therefore, developers are often involved into the resolution of the conflicts. However, when a general textual-based approach reports a successful merge, the output can still be failed in compile time, because semantic correctness cannot be ensured trivially. Renaming an identifier consists of many changes, and can cause semantic errors in the output of the merge, which subsequently have to be corrected manually. This paper introduces that matching the identifier declarations, e.g. class, field, method, local variables, with their corresponding references in the abstract syntax trees of the revisions, and considering the detected renamings during the merge takes closer to semantic correctness. The problem is illustrated and a solution is elaborated in this work.
Multidiszciplináris tudományok, 2020
During software development, when developers change the same part of the code concurrently, this may be led to merging conflicts. Resolving these conflicts might be costly and time-consuming. Three types of conflicts may arise during merge processes: textual, syntactic, and semantic. Textual conflicts occur when merging a concurrent operation, such as addition, removal or edition take place over the same parts of code. Syntactic conflicts occur when concurrent operations break the syntactic structure of the source code files when merged. Finally, a semantic conflict occurs when the merged modification is compiled without error but malfunctions. Version management systems usually use textual merging technique; users can synchronize their modifications with other users working in parallel with them, in this process, a merge is performed between local modifications and remote modifications. The previous work has examined different mechanisms to detect and resolve conflicts and proposed different tools for resolving merge conflicts, such as two-way merging, three-way merging, state-based merging, and operation-based merging. This paper discusses and investigates many concepts related to merging conflicts by asking and answering these questions; what are the factors that most affect in a merge conflict, how to avoiding and reducing merge conflicts, how to detecting merge conflicts, and how to resolve them.
2002
Parallel development has become standard practice in software development and maintenance. Though most every revision control and configuration management system provides some form of merging for combining changes made in parallel, these mechanisms often yield unsatisfactory results. The authors present a new merging algorithm, that uses a fast differencing algorithm and renaming analysis to provide better merge results. The system is language aware, but not language dependent and does not require a special editor, so it can be easily integrated in current development environments.
2003
Finding changed identifiers in programs is important for program comparison and merging. Comparing two versions of a program is complicated if renaming has occurred. Textual merging is highly unreliable if, in one version, identifiers were renamed, while in the other version, code using the old identifiers was added or modified.
Empirical studies show that merge conflicts frequently occur, impairing develop-ers' productivity, since merging conflicting contributions might be a demanding and tedious task. However, the structure of changes that lead to conflicts has not been studied yet. Understanding the underlying structure of conflicts, and the involved syntactic language elements might shed light on how to better avoid merge conflicts. To this end, in this paper we derive a catalog of conflict patterns expressed in terms of the structure of code changes that lead to merge conflicts. We focus on conflicts reported by a semi-structured merge tool that exploits knowledge about the underlying syntax of the artifacts. This way, we avoid analyzing a large number of spurious conflicts often reported by typical line based merge tools. To assess the occurrence of such patterns in different systems, we conduct an empirical study reproducing 70,047 merges from 123 GitHub Java projects. Our results show that most semi-structured merge conflicts in our sample happen because developers independently edit the same or consecutive lines of the same method. However, the probability of creating a merge conflict is approximately the same when editing methods, class fields, and modifier lists. Furthermore, we noticed that most part of conflicting merge scenarios, and merge conflicts, involve more than two developers. Also, that copying and pasting pieces of code, or even entire files, across different repositories is a common practice and cause of conflicts. Finally, we discuss how our results reveal the need for new research studies and suggest potential improvements to tools supporting collaborative software development.
… 2012 International Conference on Software …, 2012
Merge conflicts cause software defects which if detected late may require expensive resolution. This is especially true when developers work too long without integrating concurrent changes, which in practice is common as integration generally occurs at check-in. Awareness of others' activities was proposed to help developers detect conflicts earlier. However, it requires developers to detect conflicts by themselves and may overload them with notifications, thus making detection harder. This paper presents a novel solution that continuously merges uncommitted and committed changes to create a background system that is analyzed, compiled, and tested to precisely and accurately detect conflicts on behalf of developers, before check-in. An empirical study confirms that our solution avoids overloading developers and improves early detection of conflicts over existing approaches. Similarly to what happened with continuous compilation, this introduces the case for continuous merging inside the IDE.
Workshop on Variability …
Revision control systems are a major means to manage versions and variants of today's software systems. An ongoing problem in these systems is how to resolve conflicts when merging independently developed revisions. Unstructured revision control systems are purely text-based and solve conflicts based on textual similarity. Structured revision control systems are tailored to specific languages and use language-specific knowledge for conflict resolution. We propose semistructured revision control systems to inherit the strengths of both classes of systems: generality and expressiveness. The idea is to provide structural information of the underlying software artifacts in the form of annotated grammars, which is motivated by recent work on software product lines. This way, a wide variety of languages can be supported and the information provided can assist the resolution of conflicts. We have implemented a preliminary tool and report on our experience with merging Java artifacts. We believe that drawing a connection between revision control systems and product lines has benefits for both fields.
Global software developments intensify parallel changes. Although parallel changes can improve performance, their interferences contribute to faults. Current Software Configuration Management (SCM) systems can detect the interference between changes at textual level. However, our empirical study shows that, compared with textual interference, semantic approach is more effective and efficient in detecting interference in high- degree parallel changes. We propose to integrate semantic interference checking into SCM system. Semantic interferences detected during check in can alert developers to potential faults.
2011
ABSTRACT Identifiers play an important role in source code understandability, maintainability, and fault-proneness. This paper reports a study of identifier renamings in software systems, studying how terms (identifier atomic components) change in source code identifiers.
ACM SIGSOFT Software Engineering Notes, 2006
Refactoring tools allow programmers to change their source code quicker than before. However, the complexity of these changes cause versioning tools that operate at a file level to lose the history of entities and be unable to merge refactored entities. This problem can be solved by semantic, operation-based SCM with persistent IDs. MolhadoRef, our proto-type, can successfully merge edit and refactoring operations which were performed on different development branches, preserves program history better and makes it easier to understand program evolution.
2014
The Envision project aims to develop an integrated development environment (IDE) for object-oriented languages that features a visual structured code editor and is used for large-scale software development. To achieve this goal Envision works directly on an abstract syntax tree instead of a text-based source code representation. This thesis features the design of a version control system based on an abstract syntax tree designed for Envision. The resulting system is more fine-grained than traditional text-based systems since changes are tracked on the basis of nodes in the abstract syntax tree and not lines as in a text file. Nevertheless traditional text-based systems contain functionality which can be used as foundation for a version control system for abstract syntax trees. Therefore Envision's version control system is build on top of a Git back-end. The system features a comparison algorithm which is able to detect move operations, a history functionality which is able to track substructures and a merge algorithm which automatically resolves conflicts on list types. The major benefit of a version control system based on an abstract syntax tree is its syntax awareness. As a consequence the comparison algorithm is not prone to formatting changes and the merge algorithm produces always a syntactically correct program. In addition a more fine-grained change categorization in terms of granularity and change types positively affects the quality of the comparison, history and merge algorithms. Further visual improvements in the presentation provide the user with more essential information.
ArXiv, 2018
Even though many programmers rely on 3-way merge tools to integrate changes from different branches, such tools can introduce subtle bugs in the integration process. This paper aims to mitigate this problem by defining a semantic notion of confict-freedom, which ensures that the merged program does not introduce new unwanted behaviors. We also show how to verify this property using a novel, compositional algorithm that combines lightweight dependence analysis for shared program fragments and precise relational reasoning for the modifications. We evaluate our tool called SafeMerge on 52 real-world merge scenarios obtained from Github and compare the results against a textual merge tool. The experimental results demonstrate the benefits of our approach over syntactic confict-freedom and indicate that SafeMerge is both precise and practical.
Context: To reduce the integration effort arising from conflicting changes resulting from collaborative software development tasks, unstructured merge tools try to automatically solve part of the conflicts via textual similarity, whereas structured and semistructured merge tools try to go further by exploiting the syntactic structure of the involved artifacts. Objective: In this study, aiming at increasing the existing body of evidence and assessing results for systems developed under an alternative version control paradigm, we replicate an experiment conducted by Apel et al. to compare the unstructured and semistructured approach with respect to the occurrence of conflicts reported by both approaches. Method: We used both semistructured and unstructured merge in a sample 2.5 times bigger than the original study regarding the number of projects and 18 times bigger regarding the number of merge scenarios, and we compared the occurrence of conflicts. Results: Similar to the original study, we observed that semistructured merge reduces the number of conflicts in 55% of the scenarios of the new sample. However, the observed average conflict reduction of 62% in these scenarios is far superior than what has been observed before. We also bring new evidence that the use of semistructured merge can reduce the occurrence of conflicting merge scenarios by half. Conclusions: Our findings reinforce the benefits of exploiting the syntactic structure of the artifacts involved in code integration. Besides, the reductions observed in the number and size of conflicts suggest that the use of semistructured merge,when compared to the unstructured approach, might decrease integration effort without compromising correctness.
2000
Abstract. We report on a prototype tool that automates the time-consuming and error-prone process of software merging. Our tool is significantly more flexible than existing merge techniques, as it can detect syntactic, structural as well as semantic conflicts. It is implemented as a general framework for software evolution that can be customised to many different domains. Because of this, it can be used to support evolution of any kind of software artifact, independent of the target language or the considered phase in the software life ...
IEEE Transactions on Software Engineering, 2014
Source code lexicon plays a paramount role in software quality: poor lexicon can lead to poor comprehensibility and even increase software fault-proneness. For this reason, renaming a program entity, i.e., altering the entity identifier, is an important activity during software evolution. Developers rename when they feel that the name of an entity is not (anymore) consistent with its functionality, or when such a name may be misleading. A survey that we performed with 71 developers suggests that 39% perform renaming from a few times per week to almost every day and that 92% of the participants consider that renaming is not straightforward. However, despite the cost that is associated with renaming, renamings are seldom if ever documented-for example, less than 1% of the renamings in the five programs that we studied. This explains why participants largely agree on the usefulness of automatically documenting renamings. In this paper we propose REPENT (REANAMING PROGRAM ENTITIES), an approach to automatically document-detect and classify-identifier renamings in source code. REPENT detects renamings based on a combination of source code differencing and data flow analyses. Using a set of natural language tools, REPENT classifies renamings into the different dimensions of a taxonomy that we defined. Using the documented renamings, developers will be able to, for example, look up methods that are part of the public API (as they impact client applications), or look for inconsistencies between the name and the implementation of an entity that underwent a high risk renaming (e.g., towards the opposite meaning). We evaluate the accuracy and completeness of REPENT on the evolution history of five open-source Java programs. The study indicates a precision of 88% and a recall of 92%. In addition, we report an exploratory study investigating and discussing how identifiers are renamed in the five programs, according to our taxonomy.
Journal of Software Engineering and Applications
Software projects are becoming larger and more complicated. Managing those projects is based on several software development methodologies. One of those methodologies is software version control, which is used in the majority of worldwide software projects. Although existing version control systems provide sufficient functionality in many situations, they are lacking in terms of semantics and structure for source code. It is commonly believed that improving software version control can contribute substantially to the development of software. We present a solution that considers a structural model for matching source code that can be used in version control.
Proceedings of the ACM on Programming Languages, 2019
In modern software development, developers rely on version control systems like Git to collaborate in the branch-based development workflow. One downside of this workflow is the conflicts occurred when merging contributions from different developers: these conflicts are tedious and error-prone to be correctly resolved, reducing the efficiency of collaboration and introducing potential bugs. The situation becomes even worse, with the popularity of refactorings in software development and evolution, because current merging tools (usually based on the text or tree structures of source code) are unaware of refactorings. In this paper, we present IntelliMerge, a graph-based refactoring-aware merging algorithm for Java programs. We explicitly enhance this algorithm's ability in detecting and resolving refactoring-related conflicts. Through the evaluation on 1,070 merge scenarios from 10 popular open-source Java projects, we show that IntelliMerge reduces the number of merge conflicts ...
Proceedings of the 3rd international workshop on Software configuration management -, 1991
Software maintenance is the process of designing and integrating consistent changes to an existing software system. It is di cult for the maintainer to ascertain the complete e ect of a code change; the maintainer may make a change to a program that is syntactically and semantically legal, but has ripples into the parts of the program that were to remain unchanged.
2005
Abstract Merging and splitting source code entities is a common activity during the lifespan of a software system; as developers rethink the essential structure of a system or plan for a new evolutionary direction, so must they be able to reorganize the design artifacts at various abstraction levels as seems appropriate. However, while the raw effects of such changes may be plainly evident in the new artifacts, the original context of the design changes is often lost.
2014
Analysing a software system supposes two preliminary tasks: parsing the source code and resolving the names (identifiers) it contains. The parsing results in an Abstract Syntax Tree (AST) representing the source code. Name resolution maps all the identifiers found in the code to the software entities they refer to (variables, functions, classes,. . .). If there are solutions for some popular programming languages (e.g., JDT for the Java language), these two tasks can impose a significant burden on multi-language platforms (e.g., Cast, Eclipse, Rascal, Spoofax, Synectique) where a parser with name resolution must be implemented for each language analysed. For the parser, one may use a grammar of the language and a parser generator tool. For name resolution, solutions are ad-hoc and one must develop them by hand. We work with a company that had to create parsers and name resolvers for five languages in the past 18 months. As a solution, we describe in this paper, an infrastructure tha...
Empirical Software Engineering, 2015
Consider the semantic consistency, "fetch" and "get" on page 44. Why is it a semantic inconsistency? There is a subtle difference in meaning here. "Get" means to simply return the value of a field as in the following example: public int getLength() {return length;}. Typically "get" methods will be a one line method as shown in my example. However, "fetch" may be a more involved operation as in "fetching" records from a database. This will not be a one line method but many lines of code will be involved such as establishing a connection to the database, issuing an SQL query, reading results and so on. "Selector" and "Chooser" may be semantically inconsistent, but is it a problem? ⇒ These are the same issues with the abovementioned problems. Although some developers are using those words ("get" and "fetch") in a certain way, there is no clear consensus that "get" should be used in this way and "fetch" in that way. Even if the author of the program defined his/her own way to use words in identifiers, it is still vulnerable for (new) program readers to be confused by the two similar synonyms. The "wordpos" inconsistency is even more baffling yes, DrawApplet (page 44) is probably misnamed, it would have been better named as "AppletDrawer" but is it a problem for a human? It may pose a problem for tools such as code summarizers but for humans, I am not sure if it is even a problem. ⇒ The wordPOS inconsistency is defined for detecting the violation of Java naming convention. As described in Section 4.3.4, developers often point out that the naming convention violation may impose a maintenance burden if several developers as a team work together on writing the program. Suppose that each developer uses words to name identifiers in different ways. Then, these developers may not clearly understand the identifiers created by other developers. Consequently, this can be a disaster for a newly coming developer. In addition, to follow the convention, "DrawApplet" should be revised as "DrawerApplet". The "readerIndex" on page 43 is another example of a "poor example". The code that uses this identifier would probably look like this (at the call site): int ri=readerIndex(...); The code context makes it abundantly clear that this method is a "getter". Yes, it would have been better to name this method as "get"ReaderIndex, but "get" has been omitted for brevity. So, this may be an inconsistency, but not a problem.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.