Papers by Michael L Collard
Method-Naming-Standards-Survey-Dataset
This dataset includes the following files: 1. A pdf file containing the method naming standards s... more This dataset includes the following files: 1. A pdf file containing the method naming standards survey questions we used in Qualtrics for surveying professional developers. The file contains the Likert scale questions and source code examples used in the survey. 2. A CSV file containing professional developers responses to the Likert scale questions and their feedback about each method naming standard, as well as their answers to the demographic questions. 3. A pdf copy of the survey paper (Preprint) Survey Paper Citation: Alsuhaibani, R., Newman, C., Decker, M., Collard, M.L., Maletic, J.I., "On the Naming of Methods: A Survey of Professional Developers", in the Proceedings of the 43rd International Conference on Software Engineering (ICSE), Madrid Spain, May 25 - 28, 2021, 12 pages
A Survey on Method Naming Standards: Questions and Responses Artifact
2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2021
The artifacts of a large (+1100 responses) survey of professional software developers concerning ... more The artifacts of a large (+1100 responses) survey of professional software developers concerning standards for naming source code methods is presented. The artifact consists of the survey questions along with all the responses from participants. The artifact allows other researchers to examine and study the responses to the survey.

Abstract—Program slicing is used as a basis for an approach to estimate maintenance effort. A cas... more Abstract—Program slicing is used as a basis for an approach to estimate maintenance effort. A case study of the GNU Linux kernel with over 900 revisions spanning 17 years of history is presented. For each revision a system dictionary is built using a lightweight slicing approach and encodes the forward decomposition static slice profiles for all variables in all the files in the system. Changes to the system are then modeled at the behavioral level using the difference between the system dictionaries of two revisions. The three different granularities of slice (i.e., line, function, and file) are analyzed. We use an XML formatted document to represent computed function change information. The retrieved information reflects the fact that additional knowledge of the differences can be automatically derived to helps maintainers understand code changes. We consider the hypotheses: (1) that the structured format helps create traceability links between the changes and other software devel...

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)
A lightweight pointer-analysis framework, srcPtr, is presented to support the implementation and ... more A lightweight pointer-analysis framework, srcPtr, is presented to support the implementation and comparison of points-to analysis algorithms. It differentiates itself from existing tools by performing the analysis directly on the abstract syntax tree, as opposed to an intermediate representation (e.g., LLVM IR), by using srcML, an XML representation of source code. Working with srcML and the abstract syntax allows easy access to the actual source code as the programmer views it, thus better supporting comprehension. Currently the framework provides example implementations for both Andersen's and Steensgaard's pointer-analysis algorithms. It also allows for easy integration of other points-to algorithms for comparison of accuracy/speed. The approach is very scalable and can generate pointer dependencies for a 750 KLOC program in less than a minute.
Method-Naming-Standards-Survey-Dataset
This dataset includes the following files: 1. A pdf file containing the method naming standards s... more This dataset includes the following files: 1. A pdf file containing the method naming standards survey questions we used in Qualtrics for surveying professional developers. The file contains the Likert scale questions and source code examples used in the survey. 2. A CSV file containing professional developers responses to the Likert scale questions and their feedback about each method naming standard, as well as their answers to the demographic questions. 3. A pdf copy of the survey paper (Preprint). Survey Paper Citation: Alsuhaibani, R., Newman, C., Decker, M., Collard, M.L., Maletic, J.I., "On the Naming of Methods: A Survey of Professional Developers", in the Proceedings of the 43rd International Conference on Software Engineering (ICSE), Madrid Spain, May 25 - 28, 2021, 12 pages
2018 IEEE Third International Workshop on Dynamic Software Documentation (DySDoc3), 2018
A tool to automatically generate natural language documentation summaries for methods is presente... more A tool to automatically generate natural language documentation summaries for methods is presented. The approach uses prior work by the authors on stereotyping methods along with the source code analysis framework srcML. First, each method is automatically assigned a stereotype(s) based on static analysis and a set of heuristics. Then, the approach uses the stereotype information, static analysis, and predefined templates to generate a natural-language summary for each method. This summary is automatically added to the code base as a comment for each method. The predefined templates are designed to produce a generic summary for specific method stereotypes.

2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017
Two studies are conducted to evaluate an approach to automatically generate natural language docu... more Two studies are conducted to evaluate an approach to automatically generate natural language documentation summaries for C++ methods. The documentation approach relies on a method's stereotype information. First, each method is automatically assigned a stereotype(s) based on static analysis and a set of heuristics. Then, the approach uses the stereotype information, static analysis, and predefined templates to generate a natural-language summary/documentation for each method. This documentation is automatically added to the code base as a comment for each method. The result of the first study reveals that the generated documentation is accurate, does not include unnecessary information, and does a reasonable job describing what the method does. Based on statistical analysis of the second study, the most important part of the documentation is the short description as it describes the intended behavior of a method.
Poster: A Taxonomy of how Method Stereotypes Change
2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion), 2018
The role of a well-designed method should not change frequently or significantly over its lifetim... more The role of a well-designed method should not change frequently or significantly over its lifetime. As such, changes to the role of a method can be an indicator of design improvement or degradation. To measure this, we use method stereotypes. Method stereotypes provide a high-level description of a method's behavior and role; giving insight into how a method interacts with its environment and carries out tasks. When a method's stereotype changes, so has its role. This work presents a taxonomy of how method stereotypes change and why the categories of changes are significant.
2018 IEEE Third International Workshop on Dynamic Software Documentation (DySDoc3), 2018
A syntactic differencing tool (srcDiff) is used to present a summarization of the changes to a cl... more A syntactic differencing tool (srcDiff) is used to present a summarization of the changes to a class occurring over a time line. An outline of the class is presented with the ability to drill down to individual members (methods and variables). The information is presented so that one can move to the next, or previous, version of the code and examine the changes that occur. The class summary view gives basic information such as the added, removed, or modified members. At the member level, a more detailed summarization of the changes is provided. At all levels, the version number, date, and author are provided.

2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016
An approach to automatically recover the name of the branch where a given commit is originally ma... more An approach to automatically recover the name of the branch where a given commit is originally made within a GitHub repository is presented and evaluated. This is a difficult task because in Git, the commit object does not store the name of the branch when it is created. Here this is termed the commit's branch of origin. Developers typically use branches in Git to group sets of changes that are related by task or concern. The approach recovers the branch of origin only within the scope of a single repository. The recovery process first uses Git's default merge commit messages and then examines the relationships between neighboring commits. The evaluation includes a simulation, an empirical examination of 40 repositories of open-source systems, and a manual verification. The evaluations show that the average accuracy exceeds 97% of all commits and the average precision exceeds 80%.

2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016
A tool that reverse engineers UML class diagrams from C++ source code is presented. The tool take... more A tool that reverse engineers UML class diagrams from C++ source code is presented. The tool takes srcML as input and produces yUML as output. srcML is an XML representation of the abstract syntactic information of source code. The srcML parser (srcML.org) is highly scalable, efficient, and robust. yUML is a textual format for UML class diagrams that can be easily rendered into a graphical diagram via a web service (yUML.me) or a tool such as Graphvis. The approach utilizes efficient SAX (Simple API for XML) parsing to collect the information needed to construct the class diagram. Currently it supports the following UML features: differentiating between class, data type, or interface; identifying design level attributes, multiplicity and type; determining parameter direction; and identification of the relationships aggregation, composition, generalization, and realization. The tool produces yUML for all of Calligra (~1,144KLOC) in under 20 seconds (including translation into srcML). The tool is open source under a GPL license and available for download at srcML.org.

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021
This paper describes the results of a large (+1100 responses) survey of professional software dev... more This paper describes the results of a large (+1100 responses) survey of professional software developers concerning standards for naming source code methods. The various standards for source code method names are derived from and supported in the software engineering literature. The goal of the survey is to determine if there is a general consensus among developers that the standards are accepted and used in practice. Additionally, the paper examines factors such as years of experience and programming language knowledge in the context of survey responses. The survey results show that participants very much agree about the importance of various standards and how they apply to names. Additionally, the survey shows that years of experience and the programming language the participants use has almost no effect on their responses.

Proceedings of the 2nd International Workshop on Refactoring, 2018
Although there is much research advancing state-ofart of program transformation tools, their appl... more Although there is much research advancing state-ofart of program transformation tools, their application in industry source code change problems has not yet been gauged. In this context, the purpose of this paper is to better understand developer familiarity and comfort with these languages by conducting a survey. It poses, and answers, four research questions to understand how frequently source code transformation languages are applied to refactoring tasks, how well-known these languages are in industry, what developers think are obstacles to adoption, and what developer refactoring habits tell us about their current use, or underuse, of transformation languages. The results show that while source code transformation languages can fill a needed niche in refactoring, research must motivate their application. We provide explanations and insights based on data, aimed at the program transformation and refactoring communities, with a goal to motivate future research and ultimately improve industry adoption of transformation languages for refactoring tasks.
2011 27th IEEE International Conference on Software Maintenance (ICSM), 2011
Individual commits to a version control system are automatically characterized based on the stere... more Individual commits to a version control system are automatically characterized based on the stereotypes of added and deleted methods. The stereotype of each method is automatically reverse engineered using a previously defined taxonomy. Method stereotypes reflect intrinsic atomic behavior of a method and its role in the class. The stereotypes of the added and deleted methods form a descriptor of the change embodied by a given commit. These descriptors are then used to categorize commits, into types, based on the impact of the changes to a class (or classes). The goal is to gain a better understanding of the design changes to a system over its history and provide a means for documenting the commit.

2010 17th Working Conference on Reverse Engineering, 2010
The paper presents an approach that combines conceptual and evolutionary techniques to support ch... more The paper presents an approach that combines conceptual and evolutionary techniques to support change impact analysis in source code. Information Retrieval (IR) is used to derive conceptual couplings from the source code in a single version (release) of a software system. Evolutionary couplings are mined from source code commits. The premise is that such combined methods provide improvements to the accuracy of impact sets. A rigorous empirical assessment on the changes of the open source systems Apache httpd, ArgoUML, iBatis, and KOffice is also reported. The results show that a combination of these two techniques, across several cut points, provides statistically significant improvements in accuracy over either of the two techniques used independently. Improvements in recall values of up to 20% over the conceptual technique in KOffice and up to 45% over the evolutionary technique in iBatis were reported. I.

2013 IEEE International Conference on Software Maintenance, 2013
A novel approach to improve feature location by enhancing the corpus (i.e., source code) with sta... more A novel approach to improve feature location by enhancing the corpus (i.e., source code) with static information is presented. An information retrieval method, namely Latent Semantic Indexing (LSI), is used for feature location. Adding stereotype information to each method/function enhances the corpus. Stereotypes are terms that describe the abstract role of a method, for example get, set, and predicate are well-known method stereotypes. Each method in the system is automatically stereotyped via a static-analysis approach. Experimental comparisons of using LSI for feature location with, and without, stereotype information are conducted on a set of open-source systems. The results show that the added information improves the recall and precision in the context of feature location. Moreover, the use of stereotype information decreases the total effort that a developer would need to expend to locate relevant methods of the feature.

2013 IEEE International Conference on Software Maintenance, 2013
A case study of three open source systems undergoing large adaptive maintenance tasks is presente... more A case study of three open source systems undergoing large adaptive maintenance tasks is presented. The adaptive maintenance task involves migrating each system to a new version of a third party API. The changes to support the migration were spread out over multiple years for each system. The first two systems are both part of KDE, namely KOffice and Extragear/graphics. The adaptive maintenance task, for both systems, involves migrating to a new version of Qt. The third system is OpenSceneGraph that underwent a migration to a new version of OpenGL. The case study involves sifting through tens of thousands of commits to identify only those commits involved in the specific adaptive maintenance task. The object is to develop a data set that will be used for developing automated methods to identify/characterize adaptive maintenance commits.
2018 IEEE Third International Workshop on Dynamic Software Documentation (DySDoc3), 2018
The tool implements an approach that automatically derives and redocuments source code with the c... more The tool implements an approach that automatically derives and redocuments source code with the corresponding method and class stereotypes. The stereotype of each method is first computed via static analysis and a set of definitions. Then the class stereotype is computed based on the distribution of method stereotypes. The approach is fully automatic and highly scalable. It uses the srcML infrastructure to do the analysis and insertion of the stereotype information into the code.

Journal of Software: Evolution and Process, 2014
A highly efficient lightweight forward static slicing approach is presented and evaluated. The ap... more A highly efficient lightweight forward static slicing approach is presented and evaluated. The approach does not compute the program/system dependence graph but instead dependence and control information is computed as needed while computing the slice on a variable. The result is a list of line numbers, dependent variables, aliases, and function calls that are part of the slice for all variables (both local and global) for the entire system. The method is implemented as a tool, called srcSlice, on top of srcML, an XML representation of source code. The approach is highly scalable and can generate the slices for all variables of the Linux kernel in approximately 20 min on a typical desktop. Benchmark results are compared with the CodeSurfer slicing tool from GrammaTech Inc., and the approach compares well with regard to accuracy of slices.
Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, 2018
The role of a well-designed method should not change frequently or significantly over its lifetim... more The role of a well-designed method should not change frequently or significantly over its lifetime. As such, changes to the role of a method can be an indicator of design improvement or degradation. To measure this, we use method stereotypes. Method stereotypes provide a high-level description of a method's behavior and role; giving insight into how a method interacts with its environment and carries out tasks. When a method's stereotype changes, so has its role. This work presents a taxonomy of how method stereotypes change and why the categories of changes are significant.
Uploads
Papers by Michael L Collard