Skip to main content

Gerardo Canfora

Università degli Studi del Sannio, Dept. of Engineering, Faculty Member

Followers

277

Following

54

Co-authors

13

Public Views

Vinicius Garcia

Universidade Federal de Pernambuco

Iman Attarzadeh

Islamic Azad University of Tehran, Central branch

University of Cologne

University College London

RMIT International University Vietnam

Viacheslav Kuleshov

Stockholm University

Nergiz Cagiltay

Cankaya University

PALIMOTE JUSTICE

RIVERS STATE POLYTECHNIC

Applied Science University

Ataturk University

Interests

Uploads

Papers by Gerardo Canfora

Predicting issue types on GitHub

Science of Computer Programming, May 1, 2021

Software maintenance and evolution involves critical activities for the success of software proje... more Software maintenance and evolution involves critical activities for the success of software projects. To support such activities and keep code up-to-date and error-free, software communities make use of issue trackers, i.e., tools for signaling, handling, and addressing the issues occurring in software systems. However, in popular projects, tens or hundreds of issue reports are daily submitted. In this context, identifying the type of each submitted report (e.g., bug report, feature request, etc.) would facilitate the management and the prioritization of the issues to address. To support issue handling activities, in this paper, we propose Ticket Tagger, a GitHub app analyzing the issue title and description through machine learning techniques to automatically recognize the types of reports submitted on GitHub and assign labels to each issue accordingly. We empirically evaluated the tool's prediction performance on about 30,000 GitHub issues. Our results show that the Ticket Tagger can identify the correct labels to assign to GitHub issues with reasonably high effectiveness. Considering these results and the fact that the tool is designed to be easily integrated in the GitHub issue management process, Ticket Tagger consists in a useful solution for developers.

A comprehensive characterization of NLP techniques for identifying equivalent requirements

Abstract Though very important in software engineering, linking artifacts of the same type (clone... more Abstract Though very important in software engineering, linking artifacts of the same type (clone detection) or of different types (traceability recovery) is extremely tedious, error-prone and requires significant effort. Past research focused on supporting analysts with mechanisms based on Natural Language Processing (NLP) to identify candidate links. Because a plethora of NLP techniques exists, and their performances vary among contexts, it is important to characterize them according to the provided level of support. The aim of ...

Ticket Tagger: Machine Learning Driven Issue Classification

Software maintenance is crucial for software projects evolution and success: code should be kept ... more Software maintenance is crucial for software projects evolution and success: code should be kept up-to-date and error-free, this with little effort and continuous updates for the end-users. In this context, issue trackers are essential tools for creating, managing and addressing the several (often hundreds of) issues that occur in software systems. A critical aspect for handling and prioritizing issues involves the assignment of labels to them (e.g., for projects hosted on GitHub), in order to determine the type (e.g., bug report, feature request and so on) of each specific issue. Although this labeling process has a positive impact on the effectiveness of issue processing, the current labeling mechanism is scarcely used on GitHub. In this demo, we introduce a tool, called Ticket Tagger, which leverages machine learning strategies on issue titles and descriptions for automatically labeling GitHub issues. Ticket Tagger automatically predicts the labels to assign to issues, with the aim of stimulating the use of labeling mechanisms in software projects, this to facilitate the issue management and prioritization processes. Along with the presentation of the tool's architecture and usage, we also evaluate its effectiveness in performing the issue labeling/classification process, which is critical to help maintainers to keep control of their workloads by focusing on the most critical issue tickets.

RE2: Reverse-engineering and reuse re-engineering

Journal of software maintenance, Mar 1, 1994

Abstract Initial research in reuse was in the designing and implementation of reusable software. ... more Abstract Initial research in reuse was in the designing and implementation of reusable software. This research, although fruitful, did not address the area of extracting reusable components from existing software. In this paper the term reuse is used to mean the &amp;amp;#x27;reuse of existing source code&amp;amp;#x27;. A process called &amp;amp;#x27;reuse re-engineering&amp;amp;#x27;is defined and this, together with techniques from reverse-engineering, form a new method for achieving reuse. A reference paradigm is established to implement the reuse re-engineering process. This ...

The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache

Recommending refactorings based on team co-maintenance patterns

Estimating the number of remaining links in traceability recovery

Empirical Software Engineering, Oct 20, 2016

The ASE conference series is the premier research forum for automated software engineering. Each ... more The ASE conference series is the premier research forum for automated software engineering. Each year it brings together researchers and practitioners from academia and industry to discuss foundations, techniques, and tools for automated analysis, design, implementation, testing, and maintenance of software systems.

Continuous Integration and Delivery Practices for Cyber-Physical Systems: An Interview-Based Study

ACM Transactions on Software Engineering and Methodology

Continuous Integration and Delivery (CI/CD) practices have shown several benefits for software de... more Continuous Integration and Delivery (CI/CD) practices have shown several benefits for software development and operations, such as faster release cycles and early discovery of defects. For Cyber-Physical System (CPS) development, CI/CD can help achieving required goals, such as high dependability, yet it may be challenging to apply. This article empirically investigates challenges, barriers, and their mitigation occurring when applying CI/CD practices to develop CPSs in 10 organizations working in eight different domains. The study has been conducted through semi-structured interviews, by applying an open card sorting procedure together with a member-checking survey within the same organizations, and by validating the results through a further survey involving 55 professional developers. The study reveals several peculiarities in the application of CI/CD to CPSs. These include the need for (i) combining continuous and periodic builds while balancing the use of Hardware-in-the-Loop a...

Predicting issue types on GitHub

Science of Computer Programming, 2021

Software maintenance and evolution involves critical activities for the success of software proje... more Software maintenance and evolution involves critical activities for the success of software projects. To support such activities and keep code up-to-date and error-free, software communities make use of issue trackers, i.e., tools for signaling, handling, and addressing the issues occurring in software systems. However, in popular projects, tens or hundreds of issue reports are daily submitted. In this context, identifying the type of each submitted report (e.g., bug report, feature request, etc.) would facilitate the management and the prioritization of the issues to address. To support issue handling activities, in this paper, we propose Ticket Tagger, a GitHub app analyzing the issue title and description through machine learning techniques to automatically recognize the types of reports submitted on GitHub and assign labels to each issue accordingly. We empirically evaluated the tool's prediction performance on about 30,000 GitHub issues. Our results show that the Ticket Tagger can identify the correct labels to assign to GitHub issues with reasonably high effectiveness. Considering these results and the fact that the tool is designed to be easily integrated in the GitHub issue management process, Ticket Tagger consists in a useful solution for developers.

Android apps and user feedback: a dataset for software evolution and quality improvement

Proceedings of the 2nd ACM SIGSOFT International Workshop on App Market Analytics, 2017

Nowadays, Android represents the most popular mobile platform with a market share of around 80%. ... more Nowadays, Android represents the most popular mobile platform with a market share of around 80%. Previous research showed that data contained in user reviews and code change history of mobile apps represent a rich source of information for reducing software maintenance and development effort, increasing customers' satisfaction. Stemming from this observation, we present in this paper a large dataset of Android applications belonging to 23 different apps categories, which provides an overview of the types of feedback users report on the apps and documents the evolution of the related code metrics. The dataset contains about 395 applications of the F-Droid repository, including around 600 versions, 280,000 user reviews and more than 450,000 user feedback (extracted with specific text mining approaches). Furthermore, for each app version in our dataset, we employed the Paprika tool and developed several Python scripts to detect 8 different code smells and compute 22 code quality indicators. The paper discusses the potential usefulness of the dataset for future research in the field.

Silent and Continuous Authentication in Mobile Environment

Proceedings of the 13th International Joint Conference on e-Business and Telecommunications, 2016

Due to the increasing pervasiveness of mobile technologies, sensitive user information is often s... more Due to the increasing pervasiveness of mobile technologies, sensitive user information is often stored on mobile devices. Nowadays, mobile devices do not continuously verify the identity of the user while sensitive activities are performed. This enables attackers full access to sensitive data and applications on the device, if they obtain the password or grab the device after login. In order to mitigate this risk, we propose a continuous and silent monitoring process based on a set of features: orientation, touch and cell tower. The underlying assumption is that the features are representative of smartphone owner behaviour and this is the reason why the features can be useful to discriminate the owner by an impostor. Results show that our system, modeling the user behavior of 21 volunteer participants, obtains encouraging results, since we measured a precision in distinguishing an impostor from the owner between 99% and 100%.

Development Emails Content Analyzer: Intention Mining in Developer Discussions (T)

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015

How Developers' Collaborations Identified from Different Sources Tell Us about Code Changes

2014 IEEE International Conference on Software Maintenance and Evolution, 2014

Recommending refactorings based on team co-maintenance patterns

Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, 2014

An Empirical Investigation on Documentation Usage Patterns in Maintenance Tasks

2013 IEEE International Conference on Software Maintenance, 2013

The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache

2013 IEEE International Conference on Software Maintenance, 2013

How the evolution of emerging collaborations relates to code changes: an empirical study

Proceedings of the 22nd International Conference on Program Comprehension, 2014

CODES: mining source code descriptions from developers discussions

Proceedings of the 22nd International Conference on Program Comprehension, 2014

Defect prediction as a multiobjective optimization problem

Software Testing, Verification and Reliability, 2015

SummaryIn this paper, we formalize the defect‐prediction problem as a multiobjective optimization... more SummaryIn this paper, we formalize the defect‐prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques—logistic regression and decision trees specifically—trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect‐prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single‐objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative a...

Multi-objective Cross-Project Defect Prediction

2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, 2013

Predicting issue types on GitHub

Science of Computer Programming, May 1, 2021

Software maintenance and evolution involves critical activities for the success of software proje... more Software maintenance and evolution involves critical activities for the success of software projects. To support such activities and keep code up-to-date and error-free, software communities make use of issue trackers, i.e., tools for signaling, handling, and addressing the issues occurring in software systems. However, in popular projects, tens or hundreds of issue reports are daily submitted. In this context, identifying the type of each submitted report (e.g., bug report, feature request, etc.) would facilitate the management and the prioritization of the issues to address. To support issue handling activities, in this paper, we propose Ticket Tagger, a GitHub app analyzing the issue title and description through machine learning techniques to automatically recognize the types of reports submitted on GitHub and assign labels to each issue accordingly. We empirically evaluated the tool's prediction performance on about 30,000 GitHub issues. Our results show that the Ticket Tagger can identify the correct labels to assign to GitHub issues with reasonably high effectiveness. Considering these results and the fact that the tool is designed to be easily integrated in the GitHub issue management process, Ticket Tagger consists in a useful solution for developers.

A comprehensive characterization of NLP techniques for identifying equivalent requirements

Abstract Though very important in software engineering, linking artifacts of the same type (clone... more Abstract Though very important in software engineering, linking artifacts of the same type (clone detection) or of different types (traceability recovery) is extremely tedious, error-prone and requires significant effort. Past research focused on supporting analysts with mechanisms based on Natural Language Processing (NLP) to identify candidate links. Because a plethora of NLP techniques exists, and their performances vary among contexts, it is important to characterize them according to the provided level of support. The aim of ...

Ticket Tagger: Machine Learning Driven Issue Classification

Software maintenance is crucial for software projects evolution and success: code should be kept ... more Software maintenance is crucial for software projects evolution and success: code should be kept up-to-date and error-free, this with little effort and continuous updates for the end-users. In this context, issue trackers are essential tools for creating, managing and addressing the several (often hundreds of) issues that occur in software systems. A critical aspect for handling and prioritizing issues involves the assignment of labels to them (e.g., for projects hosted on GitHub), in order to determine the type (e.g., bug report, feature request and so on) of each specific issue. Although this labeling process has a positive impact on the effectiveness of issue processing, the current labeling mechanism is scarcely used on GitHub. In this demo, we introduce a tool, called Ticket Tagger, which leverages machine learning strategies on issue titles and descriptions for automatically labeling GitHub issues. Ticket Tagger automatically predicts the labels to assign to issues, with the aim of stimulating the use of labeling mechanisms in software projects, this to facilitate the issue management and prioritization processes. Along with the presentation of the tool's architecture and usage, we also evaluate its effectiveness in performing the issue labeling/classification process, which is critical to help maintainers to keep control of their workloads by focusing on the most critical issue tickets.

RE2: Reverse-engineering and reuse re-engineering

Journal of software maintenance, Mar 1, 1994

Abstract Initial research in reuse was in the designing and implementation of reusable software. ... more Abstract Initial research in reuse was in the designing and implementation of reusable software. This research, although fruitful, did not address the area of extracting reusable components from existing software. In this paper the term reuse is used to mean the &amp;amp;#x27;reuse of existing source code&amp;amp;#x27;. A process called &amp;amp;#x27;reuse re-engineering&amp;amp;#x27;is defined and this, together with techniques from reverse-engineering, form a new method for achieving reuse. A reference paradigm is established to implement the reuse re-engineering process. This ...

The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache

Recommending refactorings based on team co-maintenance patterns

Estimating the number of remaining links in traceability recovery

Empirical Software Engineering, Oct 20, 2016

The ASE conference series is the premier research forum for automated software engineering. Each ... more The ASE conference series is the premier research forum for automated software engineering. Each year it brings together researchers and practitioners from academia and industry to discuss foundations, techniques, and tools for automated analysis, design, implementation, testing, and maintenance of software systems.

Continuous Integration and Delivery Practices for Cyber-Physical Systems: An Interview-Based Study

ACM Transactions on Software Engineering and Methodology

Continuous Integration and Delivery (CI/CD) practices have shown several benefits for software de... more Continuous Integration and Delivery (CI/CD) practices have shown several benefits for software development and operations, such as faster release cycles and early discovery of defects. For Cyber-Physical System (CPS) development, CI/CD can help achieving required goals, such as high dependability, yet it may be challenging to apply. This article empirically investigates challenges, barriers, and their mitigation occurring when applying CI/CD practices to develop CPSs in 10 organizations working in eight different domains. The study has been conducted through semi-structured interviews, by applying an open card sorting procedure together with a member-checking survey within the same organizations, and by validating the results through a further survey involving 55 professional developers. The study reveals several peculiarities in the application of CI/CD to CPSs. These include the need for (i) combining continuous and periodic builds while balancing the use of Hardware-in-the-Loop a...

Predicting issue types on GitHub

Science of Computer Programming, 2021

Software maintenance and evolution involves critical activities for the success of software proje... more Software maintenance and evolution involves critical activities for the success of software projects. To support such activities and keep code up-to-date and error-free, software communities make use of issue trackers, i.e., tools for signaling, handling, and addressing the issues occurring in software systems. However, in popular projects, tens or hundreds of issue reports are daily submitted. In this context, identifying the type of each submitted report (e.g., bug report, feature request, etc.) would facilitate the management and the prioritization of the issues to address. To support issue handling activities, in this paper, we propose Ticket Tagger, a GitHub app analyzing the issue title and description through machine learning techniques to automatically recognize the types of reports submitted on GitHub and assign labels to each issue accordingly. We empirically evaluated the tool's prediction performance on about 30,000 GitHub issues. Our results show that the Ticket Tagger can identify the correct labels to assign to GitHub issues with reasonably high effectiveness. Considering these results and the fact that the tool is designed to be easily integrated in the GitHub issue management process, Ticket Tagger consists in a useful solution for developers.

Android apps and user feedback: a dataset for software evolution and quality improvement

Proceedings of the 2nd ACM SIGSOFT International Workshop on App Market Analytics, 2017

Nowadays, Android represents the most popular mobile platform with a market share of around 80%. ... more Nowadays, Android represents the most popular mobile platform with a market share of around 80%. Previous research showed that data contained in user reviews and code change history of mobile apps represent a rich source of information for reducing software maintenance and development effort, increasing customers' satisfaction. Stemming from this observation, we present in this paper a large dataset of Android applications belonging to 23 different apps categories, which provides an overview of the types of feedback users report on the apps and documents the evolution of the related code metrics. The dataset contains about 395 applications of the F-Droid repository, including around 600 versions, 280,000 user reviews and more than 450,000 user feedback (extracted with specific text mining approaches). Furthermore, for each app version in our dataset, we employed the Paprika tool and developed several Python scripts to detect 8 different code smells and compute 22 code quality indicators. The paper discusses the potential usefulness of the dataset for future research in the field.

Silent and Continuous Authentication in Mobile Environment

Proceedings of the 13th International Joint Conference on e-Business and Telecommunications, 2016

Due to the increasing pervasiveness of mobile technologies, sensitive user information is often s... more Due to the increasing pervasiveness of mobile technologies, sensitive user information is often stored on mobile devices. Nowadays, mobile devices do not continuously verify the identity of the user while sensitive activities are performed. This enables attackers full access to sensitive data and applications on the device, if they obtain the password or grab the device after login. In order to mitigate this risk, we propose a continuous and silent monitoring process based on a set of features: orientation, touch and cell tower. The underlying assumption is that the features are representative of smartphone owner behaviour and this is the reason why the features can be useful to discriminate the owner by an impostor. Results show that our system, modeling the user behavior of 21 volunteer participants, obtains encouraging results, since we measured a precision in distinguishing an impostor from the owner between 99% and 100%.

Development Emails Content Analyzer: Intention Mining in Developer Discussions (T)

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015

How Developers' Collaborations Identified from Different Sources Tell Us about Code Changes

2014 IEEE International Conference on Software Maintenance and Evolution, 2014

Recommending refactorings based on team co-maintenance patterns

Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, 2014

An Empirical Investigation on Documentation Usage Patterns in Maintenance Tasks

2013 IEEE International Conference on Software Maintenance, 2013

The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache

2013 IEEE International Conference on Software Maintenance, 2013

How the evolution of emerging collaborations relates to code changes: an empirical study

Proceedings of the 22nd International Conference on Program Comprehension, 2014

CODES: mining source code descriptions from developers discussions

Proceedings of the 22nd International Conference on Program Comprehension, 2014

Defect prediction as a multiobjective optimization problem

Software Testing, Verification and Reliability, 2015

SummaryIn this paper, we formalize the defect‐prediction problem as a multiobjective optimization... more SummaryIn this paper, we formalize the defect‐prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques—logistic regression and decision trees specifically—trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect‐prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single‐objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative a...

Multi-objective Cross-Project Defect Prediction

2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, 2013