Automating Code Review

This document discusses research on automating code review processes using Deep Learning techniques, specifically Transformer models. The study aims to enhance code quality and reduce the time developers spend on reviewing by training models to suggest code changes and generate comments based on reviewer feedback. Preliminary results indicate improvements in automation capabilities, but further research is needed to address limitations and enhance performance in more complex scenarios.

Uploaded by

ananyagaonkar344

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views5 pages

Automating Code Review

Uploaded by

ananyagaonkar344

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)

Automating Code Review

Rosalia Tufano
SEART @ Software Institute
Università della Svizzera italiana (USI)
Switzerland
[Link]@[Link]

Abstract—Code reviews are popular in both industrial and When I started my PhD in February 2020, little effort had
open source projects. The benefits of code reviews are widely been devoted to the automation of the most complex code
recognized and include better code quality and lower likelihood review tasks, namely those dealing with the review of the
of introducing bugs. However, code review comes at the cost of
spending developers’ time on reviewing their teammates’ code. code itself (i.e., identifying problems in the submitted code and
The goal of this research is to investigate the possibility of using implement changes needed to address them). Indeed, only a
Deep Learning (DL) to automate specific code review tasks. few works started investigating the possibility to learn code
We started by training vanilla Transformer models to learn change patterns in software repositories [14], [15], which
code changes performed by developers during real code review might be used to improve the quality of the code submitted
activities. This gives the models the possibility to automatically
(i) revise the code submitted for review without any input from for review (e.g., by learning code changes often applied by
the reviewer; and (ii) implement changes required to address developers to fix a specific quality issue).
a specific reviewer’s comment. While the preliminary results
were encouraging, in this first work we tested DL models in
The goal of my PhD is the automation of the above-
rather simple code review scenarios, substantially simplifying mentioned non-trivial tasks. In particular, we target three spe-
the targeted problem. This was also due to the choices we made cific tasks, focusing on both the contributor and the reviewer
when designing both the technique and the experiments. Thus, sides of the review process. First, we defined two tasks to
in a subsequent work, we exploited a pre-trained Text-To-Text- learn code changes performed by developers during real code
Transfer-Transformer (T5) to overcome some of these limitations
and experiment DL models for code review automation in more
review activities. The first one — code-to-code (Tc2c )—, on
realistic and challenging scenarios. The achieved results show the contributor side, aims at providing them with a revised
the improvements brought by T5 both in terms of applicability version of their code implementing code transformations usu-
(i.e., scenarios in which it can be applied) and performance. ally recommended during code review before the code is even
Despite this, we are still far from performance levels making submitted for review. The second task, — code&comment-
these techniques deployable in practice, thus calling for additional
research in this area, as we discuss in our future work agenda.
to-code (Tc&nl2c ) —, on the reviewer side, provides the re-
Index Terms—Code Review, Deep Learning viewer commenting on a submitted code with the revised code
implementing their comment expressed in natural language.
Successively, we defined a third task, — code-to-comment
I. I NTRODUCTION (Tc2nl ) — still on the contributor side, taking as input the
code submitted for review and requesting to the contributor
Code Review is the process of analyzing code written
code changes as a reviewer would do, by commenting the
by a teammate to judge whether it is of sufficient quality.
code in natural language.
Recent studies provided evidence that reviewed code has lower
chances of being buggy [1]–[3] and exhibits higher internal The overall idea is not to replace developers during code
quality [3]. Given these benefits, code reviews are widely review, but to design techniques that can work in tandem with
adopted both in industrial and open source projects. them by spotting and/or fixing code quality issues being typical
The benefits brought by code reviews do not come for targets of a code review. A complete automation, besides not
free. Indeed, code reviews add additional expenses to the being realistic, would also dismiss one of the benefits of code
standard development costs due to the allocation of one or review: knowledge sharing among developers [16].
more reviewers having the responsibility of verifying the We started our investigation by training two Transformer
correctness, quality, and soundness of newly developed code. models to automate Tc2c and Tc&nl2c , respectively. The results
Bosu and Carver report that developers spend, on average, of this study have been published in the following paper [17]:
more than six hours per week reviewing code [4].
For this reason, researchers started proposing techniques Towards Automating Code Review Activities
aimed at automating specific code review tasks. Several works Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk,
Gabriele Bavota. In Proceedings of the 43rd International Conference
targeted the recommendation of proper reviewers for a given on Software Engineering (ICSE 2021), pp. 163-174.
change (e.g., [5]–[11]), while others focused on classifying
the contributions to review into different categories [12], [13] While the results achieved in this work were promising, our
(again to ease the identification of proper reviewers). approach also had substantial limitations.

979-8-3503-2263-7/23/$31.00 ©2023 IEEE 192

DOI 10.1109/ICSE-Companion58688.2023.00053
Authorized licensed use limited to: SDM COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on April 01,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
Since this was the first work on automating code review, This would allow (i) the reviewer to automatically attach
we applied several filters and adopted an abstraction process to their natural language comment a preview of how the
of the code to foster the learning of the model and simplify code would look like by implementing their recommendation,
the dataset, thus impacting its representativeness of real code and (ii) the contributor to have a better understanding of
review activities. Thus, we built a new dataset (e.g., relaxing what the reviewer is recommending. The model has been
our filtering criteria and avoiding the abstraction process) to trained with the same instances collected for the previous task
consider more realistic code review scenarios and adopted a but accompanied this time by the natural language comment
more powerful pre-trained model to improve the automation from the reviewer: The model was fed with 17,194 triplets
capabilities as compared to our ICSE’21 paper. We also Cs , Rnl → Cr.
expanded the support provided to developers by introducing
the Tc2nl task (i.e., automated generation of reviewer’s natural We quantitatively and qualitatively evaluated the predictions
language comments for a given code). We reported the results provided by the two models. For the quantitative analysis,
of this second investigation in the following paper [18]: we assessed the ability of the models in modifying the code
submitted for review exactly as done by developers during
Using Pre-Trained Models to Boost Code Review Automation
real code review activities. This means that we compare,
Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pas-
carella, Denys Poshyvanyk, Gabriele Bavota. In Proceedings of the
for the same code submitted for review, the output of the
44th International Conference on Software Engineering (ICSE 2022), manual code review process and of the models (both in the
pp. 2291-2302 scenario where a natural language comment is provided or
As a next step in my PhD, we are working to further not as input). The qualitative analysis focused instead on
improve the support given by our models, providing them with characterizing successful and unsuccessful predictions made
a broader view of the code review process which may help in by the two models, to better understand their limitations.
further boosting performance.
The achieved results indicate that, for Tc2c , the model can
In the following we summarize our ICSE’21 (Section II)
correctly recommend a change as a reviewer would do in 3%
and ICSE’22 (Section III) papers. Then, we discuss planned
to 16% of cases, depending on the number of candidate rec-
directions for future work (Section IV) and overview the
ommendations it is allowed to generate (from 1 to 10). When
related literature (Section V). Section VI concludes the paper.
also having available a reviewer comment in natural language
(Tc&nl2c ) the performance of the approach is boosted, with
II. A PPROACHING THE AUTOMATION OF C ODE R EVIEW the generated code that correctly implements the reviewer’s
comment in 12% to 31% of cases. The qualitative analysis
We started by investigating how Transformer models [19] showed that the two models can learn a variety of code changes
can support the automation of Tc2c and Tc&nl2c (as defined in usually implemented in code review (e.g., changes related to
Section I) [17]. To automate Tc2c , a first Transformer model exception handling, concurrency, etc.). We point the interested
is dedicated to the learning of code changes recommended reader to our paper for a full discussion of the qualitative
by reviewers and implemented by contributors during code findings [17].
review. This would allow the model to give a fast and
preliminary feedback to the contributor about possible code While the achieved results were promising, this preliminary
changes to implement as soon as they submit the code (i.e., work also has substantial limitations. First, the data used
before even starting the review process). This model has been for the training and evaluation of the models. We applied
trained on 17,194 code pairs of Cs → Cr where Cs is the several filters to foster the learning of the model and sim-
code submitted for review and Cr is the code implementing plify the dataset, thus impacting its representativeness of real
a specific code change recommendation provided by the re- code review activities. For example, we discarded all Cs /Cr
viewer. This data has been gathered from real code review composed by more than 100 tokens, to speedup the model
activities performed by developers in software projects hosted training and control the complexity of the learning. Second,
on Gerrit and GitHub. Once trained, the model can take as we did not work on raw source code, but on an abstracted
input a previously unseen code and recommend code changes version which allowed us to limit the vocabulary size and
as a reviewer would do. The used architecture is a classic avoid the model suffering from the out-of-vocabulary problem
encoder-decoder model with one encoder taking the submitted (i.e., the model is not able to represent tokens that never
code as input and one decoder generating the revised code. show up in the training data). While being beneficial for the
For Tc&nl2c we adapted the previous architecture to use two learning phase, the abstraction does not allow the model to
encoders and one decoder. The two encoders take as input Cs handle cases in which the transformation from Cs → Cr
and Rnl , where Rnl represents a natural language comment results in the introduction of new identifiers and literals that
posted by a reviewer and requiring the implementation of a were not present in Cs . This means that the model cannot
change. The decoder is still in charge of generating Cr , this fix code quality issues requiring the introduction of new
time being a revised version of Cs implementing the specific identifiers/literals. These observations led us to our second
recommendation expressed in natural language by the reviewer work on the problem of automating code review activities
(Rnl ). described in the next section.

193

Authorized licensed use limited to: SDM COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on April 01,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
III. U SING P RE - TRAINED M ODELS TO B OOST C ODE Moving to the new (more complex) dataset, T5 achieves
R EVIEW AUTOMATION the following results. In the case of Tc2c , when a single
prediction is proposed by T5, it achieves 5% of correct
To partially address the limitations of our preliminary work, predictions. Such a result should be considered in the context
we experimented with DL models for code review automation of what we obtained with the encoder-decoder model that, on
in more realistic and challenging scenarios. We started by a much simpler test dataset, achieved for the same task 3%
training a Text-To-Text-Transfer-Transformer (T5) model [20] of correct predictions. Similar observations can be made for
on a bigger version of the dataset used in the previous work Tc&nl2c , where T5 can generate 14% of correct predictions. For
(we mined new open source projects on GitHub increasing the this task, the encoder-decoder model on the simpler dataset
size of the initial dataset). To avoid the usage of the abstraction achieves 12% of correct predictions. Moving to Tc2nl , T5
process adopted in the first stage of this research, we adopted struggles in formulating natural language comments identical
the SentencePiece subword-based tokenizer [21], which allows to the ones written by reviewers, with a 2% success rate.
to work with the raw source code while keeping the size of the It is worth noting that the reported results represent a
vocabulary under control. Also, we increased the maximum lower bound for the performance of our approach. Indeed,
length of the considered code components from 100 “ab- we consider a prediction as “correct” only if it is identical
stracted” tokens to 512 “SentencePiece” tokens. The absence to the reference one. For example, in the case of Tc2nl , the
of an abstraction mechanism and the increased upper bound for natural language comment generated by T5 is classified as
input/output length allowed us to build a substantially larger correct only if it is equal to the reference one, including
dataset as compared to our previous work [17] (140k instances punctuation. However, it is possible that a natural language
vs 17k) and, more importantly, to feature in such a dataset a comment generated by T5 is different but semantically equiv-
wider variety of code transformations implemented in the code alent to the one written by the developer (e.g., “variable
review process, including quite challenging instances such as v should be private” vs “change v visibility to private”).
those requiring the introduction of new identifiers and literals. Similar observations hold for the two code-generation tasks
The possibility of considering such a variety of code (e.g., a reviewer’s comment could be addressed in different
changes is also due to the learning abilities of the T5 model but semantically equivalent ways). To have an idea on the
[20]. T5 is subjected to a first training (pre-training) whose number of valuable predictions present among those classified
purpose is to provide it with a general knowledge useful to as “wrong” (i.e., the non-correct predictions), we manually
solve a set of related tasks. Suppose, for example, that we want analyzed a sample of 100 “wrong” predictions for each task.
to train a model able to translate English to German. Instead Overall, our analysis showed that the correct predictions really
of starting by training the model for this task, T5 can be represent a lower bound for the performance of T5, especially
pre-trained in an unsupervised manner by using the denoising for the two tasks in which natural language comments are
objective (or masked language modeling): The model is fed involved. For example, for Tc2nl we found out that 36% of
with sentences having 15% of their tokens (e.g., words in “wrong” predictions are actually a semantically equivalent
English sentences) randomly masked and it is asked to predict natural language comment produced by the model. For further
them. By learning how to predict the masked tokens, the details we point the interested reader to our paper [18].
model acquires knowledge about the language of interest. To provide a better idea of the capabilities of the model, the
In our example, we could pre-train the model on English top part of Fig. 1 shows one example of correct prediction gen-
and German sentences. Once pre-trained, T5 is fine-tuned on erated by the model for each task. For Tc2c (code-to-code), the
the downstream tasks in a supervised fashion. Each task is first code represents the input of the model, while the second
formulated in a “text-to-text” format (i.e., both the input and its output. We highlighted in bold the parts of code changed by
the output of the model are represented as text). For example, the model and replaced irrelevant parts of the methods with
for the translation task a dataset composed of pairs of English [...] to save space. For Tc&nl2c (code&comment-to-code), the
and German sentences allows to fine-tune the model. input provided by the model includes the comment written by
In our research, we pre-train T5 on Java source code the reviewer and requiring a specific change to the part of code
and “technical English” (e.g., natural language text used to highlighted in orange. Finally, for Tc2nl (code-to-comment), we
document source code). In particular, we built a pre-training report the code provided as input to the model (first line) with
dataset consisting of nearly 1.5M instances starting from two the comment it generated as output (second line). The bottom
public datasets featuring instances including both source code of Fig. 1 (black background) shows instead an example of
and technical English: the official Stack Overflow dump [22] “wrong” but valid prediction for Tc2nl , with T5 asking the
and CodeSearchNet [23]. Then, we fine-tune T5 on the three developer to implement the same change suggested by the
tasks defined in Section I: Tc2c , Tc&nl2c and Tc2nl . We started reviewer (the longer comment is the one generated by T5).
by evaluating T5 on the same (simpler) dataset used in our While this work represents a significant step forward in
ICSE’21 paper to compare it with the encoder-decoder model automating code review, the achieved performance is still
presented in Section II. The results showed the superiority quite far from levels which could be considered valuable by
of T5. For example, in Tc&nl2c the encoder-decoder model developers. We discuss Section IV our plans to further boost
achieves 10% correct predictions, while T5 reaches 30%. the automation of code review.

194

Authorized licensed use limited to: SDM COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on April 01,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
Correct predictions
code-to-code
public ConfigBuilder readFrom(View<?> view) { if (view instanceof Dataset && view instanceof FileSystemDataset)
{ FileSystemDataset dataset = (FileSystemDataset) view; [...] }
public ConfigBuilder readFrom(View<?> view) { if (view instanceof FileSystemDataset)
{ FileSystemDataset dataset = (FileSystemDataset) view; [...] }

code&comment-to-code “I suggest ObjectUtils check for nulls”

private String getBillingFrequencyDescription(Award award) { if (award == null || [Link]() == null) { [...] }
private String getBillingFrequencyDescription(Award award) { if ([Link](award) || [Link]([Link]())) { [...] }

code-to-comment
public List<[...]> getExecuteBefore() { Rules ann = [Link]().getAnnotation([Link]); if(ann != null) [...] }
“Rename ‘ann’ to ‘rules’, ‘rulesAnnotation’ or something more descriptive.”

Alternative and valid predictions

code-to-comment “Please make this one a variable as well”
public void handleSetDeviceLifecycleStatusByChannelResponse([...]) { [...] [Link]().[...])}

“Extract the building of the ResponseMessage to it's own variable (in eclipse, select the text, right-click > refactor > extract local variable / select code + shift+alt+L). This will make the code a
bit more readable, especially when you'll be passing in other things besides the ResponseMessage.”

Fig. 1. Examples of correct and alternative predictions (revised from [18])

IV. F UTURE D IRECTIONS Chouchen et al. [13] used a binary classifier to assess the
quality of the code submitted for review leveraging quality
Investigating the usage of customized pre-training ob- metrics as features. Similarly, Shi et al. [32] presented a
jectives. In [18] (Section III) we adopted a pre-trained model DL model taking as input the code submitted for review
for the automation of code review. The positive role played and the revised code implementing changes recommended by
by pre-training on the achieved performance is clear in our reviewers and providing as output the acceptance or not of the
experiments. However, we did not investigate the possible changes. These techniques are complementary to ours.
impact of using different pre-training pre-training objective(s), Our research has been inspired by works aimed at learning
possibly specialized for code review automation. Indeed, the general change patterns from developers’ activities [14], [15].
one we used (denonising objective) is just one of the possible For example, Neural Machine Translation models have been
pre-training objectives and recent work from the natural lan- used to learn how to automatically modify a given Java method
guage processing field [24] suggest that pre-training objectives as developers would do during a pull request [14].
tailored for the specific downstream task of interest may boost Several recent works built on top of the research we
the model’s performance. We plan to propose and compare presented in [17], [18]. Li et al. [33] and Hong et al. [34]
different pre-training objectives to identify the one(s) best presented techniques to improve the results we achieved on
suited for the automation of code review tasks. the automated generation of reviewers’ comments (Tc2nl ) by
Investigating the role of context on the model perfor- using pre-trained DL models [33] or by exploiting information
mance. In our studies we limited the focus of the model retrieval to recommend reviewers’ comments posted in the
on a single method at time. In other words, the submitted past for code snippets similar to the one to review [34]. Li
code given as input to the model is a single method, possibly et al. [35] targeted our two tasks related to the automatic
accompanied by a reviewer’s comment (depending on the implementation of a reviewer’s comment (Tc&nl2c ) and to the
task). This means that, for example, the model has no further generation of reviewers’ comments for a given code (Tc2nl ).
information about other code submitted for review, the class They also estimate the quality of the submitted code to decide
in which the method is implemented, the past review rounds, whether it needs a review or not. The approach exploits a pre-
etc. Intuitively, more context provided as input could improve trained model and can work with 9 programming languages.
the performance of the model. We plan to experiment the
model when using a wider context, for example by looking VI. C ONCLUSION
at the entire class the method belongs to, providing more We presented our efforts in the automation of code review
information about other submitted code changes, or feeding and our future plans in this area. As discussed in Section V
multiple reviewers’ comments at a time. Clearly, adding more several researchers are targeting similar problems, thus in-
context could increase the models’ performance but also creasing our hopes in a more and more successful automation
the complexity of the training and of the learning. Thus, a which may be then subject to technological transfer to practi-
reasonable trade-off must be targeted. tioners. I would like to conclude by summarizing my steps in
the PhD: I started in February 2020 and defended my thesis
proposal in December 2021. I just started the fourth (and last)
V. R ELATED W ORKS year of PhD and plan to defend my thesis in January 2024.
Several works targeted the optimization of the reviewers’ ACKNOWLEDGMENT
assignment [5]–[11], [13], [25]–[31]. These works exploit This project has received funding from the European Re-
different features and algorithms to recommend the most suited search Council (ERC) under the European Union’s Horizon
reviewer for a given change. 2020 research and innovation programme (grant No. 851720).

195

Authorized licensed use limited to: SDM COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on April 01,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [18] R. Tufano, S. Masiero, A. Mastropaolo, L. Pascarella, D. Poshyvanyk,
and G. Bavota, “Using pre-trained models to boost code review au-
tomation,” in 44th IEEE/ACM International Conference on Software
[1] S. McIntosh, Y. Kamei, and B. A. andß Ahmed E. Hassan, “The impact Engineering, ICSE, pp. 2291–2302, 2022.
of code review coverage and code review participation on software [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.
quality: A case study of the qt, vtk, and itk projects,” in 11th IEEE/ACM Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in 30th
Working Conference on Mining Software Repositories, MSR, pp. 192– Advances in Neural Information Processing Systems NIPS, pp. 5998–
201, 2014. 6008, 2017.
[2] R. Morales, S. McIntosh, and F. Khomh, “Do code review practices [20] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena,
impact design quality? a case study of the qt, vtk, and itk projects,” in Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning
22nd IEEE International Conference on Software Analysis, Evolution with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21,
and Reengineering, SANER, pp. 171–180, 2015. pp. 140:1–140:67, 2020.
[3] G. Bavota and B. Russo, “Four eyes are better than two: On the impact [21] T. Kudo and J. Richardson, “Sentencepiece: A simple and language inde-
of code reviews on software quality,” in IEEE International Conference pendent subword tokenizer and detokenizer for neural text processing,”
on Software Maintenance and Evolution, ICSME, pp. 81–90, 2015. CoRR, 2018.
[4] A. Bosu and J. C. Carver, “Impact of peer code review on peer impres- [22] “Stack exchange dumps.” url[Link]
sion formation: A survey,” in 7th IEEE/ACM International Symposium [23] H. Husain, H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt,
on Empirical Software Engineering and Measurement, ESEM, pp. 133– “Codesearchnet challenge: Evaluating the state of semantic code search,”
142, 2013. CoRR, vol. abs/1909.09436, 2019.
[5] J. Jiang, D. Lo, J. Zheng, X. Xia, Y. Yang, and L. Zhang, “Who [24] J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, “Pegasus: Pre-training with
should make decision on this pull request? analyzing time-decaying extracted gap-sentences for abstractive summarization,” in Proceedings
relationships and file similarities for integrator prediction,” J. Syst. of the 37th International Conference on Machine Learning, ICML’20,
Softw., vol. 154, p. 196–210, aug 2019. [Link], 2020.
[6] P. Thongtanunam, C. Tantithamthavorn, R. G. Kula, N. Yoshida, H. Iida, [25] J. Jiang, Y. Yang, J. He, X. Blanc, and L. Zhang, “Who should
and K.-i. Matsumoto, “Who should review my code? a file location- comment on this pull request? analyzing attributes for more accurate
based code-reviewer recommendation approach for modern code re- commenter recommendation in pull-based development,” Inf. Softw.
view,” in 2015 IEEE 22nd International Conference on Software Anal- Technol., vol. 84, p. 48–62, apr 2017.
ysis, Evolution, and Reengineering (SANER), pp. 141–150, 2015. [26] A. Strand, M. Gunnarson, R. Britto, and M. Usman, “Using a context-
[7] A. Ouni, R. G. Kula, and K. Inoue, “Search-based peer reviewers aware approach to recommend code reviewers: Findings from an indus-
recommendation in modern code review,” in 2016 IEEE International trial case study,” in Proceedings of the ACM/IEEE 42nd International
Conference on Software Maintenance and Evolution (ICSME), pp. 367– Conference on Software Engineering: Software Engineering in Practice,
377, 2016. ICSE-SEIP ’20, (New York, NY, USA), p. 1–10, Association for
[8] X. Xia, D. Lo, X. Wang, and X. Yang, “Who should review this Computing Machinery, 2020.
change?: Putting text and file location analyses together for more [27] M. B. Zanjani, H. Kagdi, and C. Bird, “Automatically recommending
accurate recommendations,” in 2015 IEEE International Conference on peer reviewers in modern code review,” IEEE Transactions on Software
Software Maintenance and Evolution (ICSME), pp. 261–270, 2015. Engineering, vol. 42, no. 6, pp. 530–543, 2016.
[28] M. M. Rahman, C. K. Roy, and J. A. Collins, “Correct: Code reviewer
[9] S. Asthana, R. Kumar, R. Bhagwan, C. Bird, C. Bansal, C. Maddila,
recommendation in github based on cross-project and technology expe-
S. Mehta, and B. Ashok, “Whodo: Automating reviewer suggestions at
rience,” in 2016 IEEE/ACM 38th International Conference on Software
scale,” in Proceedings of the 2019 27th ACM Joint Meeting on European
Engineering Companion (ICSE-C), pp. 222–231, 2016.
Software Engineering Conference and Symposium on the Foundations
[29] H. Ying, L. Chen, T. Liang, and J. Wu, “Earec: Leveraging expertise and
of Software Engineering, ESEC/FSE 2019, (New York, NY, USA),
authority for pull-request reviewer recommendation in github,” in 2016
p. 937–945, Association for Computing Machinery, 2019.
IEEE/ACM 3rd International Workshop on CrowdSourcing in Software
[10] E. Mirsaeedi and P. C. Rigby, “Mitigating turnover with code review Engineering (CSI-SE), pp. 29–35, 2016.
recommendation: Balancing expertise, workload, and knowledge dis- [30] Z. Xia, H. Sun, J. Jiang, X. Wang, and X. Liu, “A hybrid approach to
tribution,” in Proceedings of the ACM/IEEE 42nd International Con- code reviewer recommendation with collaborative filtering,” in 2017 6th
ference on Software Engineering, ICSE ’20, (New York, NY, USA), International Workshop on Software Mining (SoftwareMining), pp. 24–
p. 1183–1195, Association for Computing Machinery, 2020. 31, 2017.
[11] W. H. A. Al-Zubaidi, P. Thongtanunam, H. K. Dam, C. Tantithamtha- [31] Y. Yu, H. Wang, G. Yin, and T. Wang, “Reviewer recommendation for
vorn, and A. Ghose, “Workload-aware reviewer recommendation using a pull-requests in github,” Inf. Softw. Technol., vol. 74, p. 204–218, jun
multi-objective search-based approach,” in Proceedings of the 16th ACM 2016.
International Conference on Predictive Models and Data Analytics in [32] S.-T. Shi, M. Li, D. Lo, F. Thung, and X. Huo, “Automatic code review
Software Engineering, pp. 21–30, 2020. by learning the revision of source code,” in Proceedings of the AAAI
[12] X. Ge, S. Sarkar, J. Witschey, and E. Murphy-Hill, “Refactoring-aware Conference on Artificial Intelligence, vol. 33, pp. 4910–4917, 2019.
code review,” in 2017 IEEE Symposium on Visual Languages and [33] L. Li, L. Yang, H. Jiang, J. Yan, T. Luo, Z. Hua, G. Liang, and C. Zuo,
Human-Centric Computing (VL/HCC), pp. 71–79, 2017. “Auger: Automatically generating review comments with pre-training
[13] M. Chouchen, A. Ouni, M. W. Mkaouer, R. G. Kula, and K. Inoue, models,” in Proceedings of the 30th ACM Joint European Software
“Whoreview: A multi-objective search-based approach for code review- Engineering Conference and Symposium on the Foundations of Software
ers recommendation in modern code review,” Applied Soft Computing, Engineering, ESEC/FSE 2022, (New York, NY, USA), p. 1009–1021,
vol. 100, p. 106908, 2021. Association for Computing Machinery, 2022.
[14] M. Tufano, J. Pantiuchina, C. Watson, G. Bavota, and D. Poshyvanyk, [34] Y. Hong, C. Tantithamthavorn, P. Thongtanunam, and A. Aleti, “Com-
“On learning meaningful code changes via neural machine translation,” mentfinder: A simpler, faster, more accurate code review comments
in 41st IEEE/ACM International Conference on Software Engineering, recommendation,” in Proceedings of the 30th ACM Joint European
ICSE, pp. 25–36, 2019. Software Engineering Conference and Symposium on the Foundations
[15] S. Shi, M. Li, D. Lo, F. Thung, and X. Huo, “Automatic code review of Software Engineering, ESEC/FSE 2022, (New York, NY, USA),
by learning the revision of source code,” in The Thirty-Third AAAI p. 507–519, Association for Computing Machinery, 2022.
Conference on Artificial Intelligence, AAAI 2019, pp. 4910–4917, 2019. [35] Z. Li, S. Lu, D. Guo, N. Duan, S. Jannu, G. Jenks, D. Majumder,
[16] A. Bacchelli and C. Bird, “Expectations, outcomes, and challenges of J. Green, A. Svyatkovskiy, S. Fu, and N. Sundaresan, “Automating code
modern code review,” in 35th IEEE/ACM International Conference on review activities by large-scale pre-training,” in Proceedings of the 30th
Software Engineering, ICSE, pp. 712–721, 2013. ACM Joint European Software Engineering Conference and Symposium
[17] R. Tufano, L. Pascarella, M. Tufano, D. Poshyvanyk, and G. Bavota, on the Foundations of Software Engineering, ESEC/FSE 2022, (New
“Towards automating code review activities,” in 43rd IEEE/ACM In- York, NY, USA), p. 1035–1047, Association for Computing Machinery,
ternational Conference on Software Engineering, ICSE, pp. 163–174, 2022.
2021.

196

Authorized licensed use limited to: SDM COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on April 01,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.

Code Review Automation Strengths and Weaknesses of The State of The Art
No ratings yet
Code Review Automation Strengths and Weaknesses of The State of The Art
16 pages
Automating Code Review with CodeReviewer
No ratings yet
Automating Code Review with CodeReviewer
17 pages
Information and Software Technology: Chanathip Pornprasit, Chakkrit Tantithamthavorn
No ratings yet
Information and Software Technology: Chanathip Pornprasit, Chakkrit Tantithamthavorn
12 pages
AI - Research - Paper - Updated
No ratings yet
AI - Research - Paper - Updated
8 pages
Review 4 Repair
No ratings yet
Review 4 Repair
17 pages
A Study Code Review in Software Development Using AI
No ratings yet
A Study Code Review in Software Development Using AI
7 pages
ASTRJ 2023, Vol. 17, NR 4, s162-167
No ratings yet
ASTRJ 2023, Vol. 17, NR 4, s162-167
6 pages
DeepReview - Automatic Code Review Using Deep Multi-Instance Learn
No ratings yet
DeepReview - Automatic Code Review Using Deep Multi-Instance Learn
15 pages
Using Large-Scale Heterogeneous Graph Representation Learning For Code Review Recommendations at Microsoft
No ratings yet
Using Large-Scale Heterogeneous Graph Representation Learning For Code Review Recommendations at Microsoft
11 pages
DLDay18 Paper 40
No ratings yet
DLDay18 Paper 40
9 pages
A Fine-Grained Taxonomy of Code Review Feedback in Typescript Projects
No ratings yet
A Fine-Grained Taxonomy of Code Review Feedback in Typescript Projects
32 pages
Thongtanunam 2015
No ratings yet
Thongtanunam 2015
10 pages
Codeagent:: Autonomous Communicative Agents For Code Review
No ratings yet
Codeagent:: Autonomous Communicative Agents For Code Review
35 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Li 2023 CRR
No ratings yet
Li 2023 CRR
10 pages
CT098 3 2 RMCT - Part2 - NP069828
No ratings yet
CT098 3 2 RMCT - Part2 - NP069828
10 pages
AI-Powered Code Review Assistant
No ratings yet
AI-Powered Code Review Assistant
6 pages
AI Based Code Review and System Research Paper
No ratings yet
AI Based Code Review and System Research Paper
6 pages
AI-Driven Real-Time Code Correction
No ratings yet
AI-Driven Real-Time Code Correction
11 pages
Automated Code Review Tool Diggit
No ratings yet
Automated Code Review Tool Diggit
5 pages
Automated Code Review and Bug Detection
No ratings yet
Automated Code Review and Bug Detection
6 pages
Refactoring Practices in The Context of Modern Code Review - An in
No ratings yet
Refactoring Practices in The Context of Modern Code Review - An in
11 pages
AI-Powered Code Analysis Tool
No ratings yet
AI-Powered Code Analysis Tool
78 pages
CodeCritique Research Paper
No ratings yet
CodeCritique Research Paper
6 pages
AI Coder Research Proposal
No ratings yet
AI Coder Research Proposal
61 pages
Gptzero Ai Scan
No ratings yet
Gptzero Ai Scan
9 pages
Mastropaolo CodeSummarization
No ratings yet
Mastropaolo CodeSummarization
12 pages
Deep Learning for Bug Assignment
No ratings yet
Deep Learning for Bug Assignment
19 pages
1 s2.0 S2352711024000487 Main
No ratings yet
1 s2.0 S2352711024000487 Main
8 pages
Research Report Bartes-Catalin-Razvan IS 248
No ratings yet
Research Report Bartes-Catalin-Razvan IS 248
8 pages
P E E S C R: Redicting Xpert Valuations in Oftware ODE Eviews
No ratings yet
P E E S C R: Redicting Xpert Valuations in Oftware ODE Eviews
8 pages
Automated Research Review Support Using Machine Learning
No ratings yet
Automated Research Review Support Using Machine Learning
26 pages
DevNet Associate - Code Review and Testing
No ratings yet
DevNet Associate - Code Review and Testing
7 pages
Automated Programming and Program Repair
No ratings yet
Automated Programming and Program Repair
19 pages
Code Review
No ratings yet
Code Review
37 pages
Towards Javascript Program Repair With Generative Pre-Trained Transformer (Gpt-2)
No ratings yet
Towards Javascript Program Repair With Generative Pre-Trained Transformer (Gpt-2)
7 pages
Exploring The Potential of Llama Models in Automated Code Refinement: A Replication Study
No ratings yet
Exploring The Potential of Llama Models in Automated Code Refinement: A Replication Study
12 pages
Automation Essentials
No ratings yet
Automation Essentials
4 pages
(مصعد) Code Generation كويس
No ratings yet
(مصعد) Code Generation كويس
11 pages
Enthuzia Stic
No ratings yet
Enthuzia Stic
9 pages
FinalMP 2 Removed
No ratings yet
FinalMP 2 Removed
46 pages
A Deep Dive Into Large Language Models For Automated Bug Localization and Repair
No ratings yet
A Deep Dive Into Large Language Models For Automated Bug Localization and Repair
23 pages
Analyzing Individual Performance of Source Code Review
No ratings yet
Analyzing Individual Performance of Source Code Review
8 pages
Code Reviewing in The Trenches Understanding Challenges Best Practices and Tool Needs
No ratings yet
Code Reviewing in The Trenches Understanding Challenges Best Practices and Tool Needs
7 pages
Research Paper On Automated Code Review Sysytem
No ratings yet
Research Paper On Automated Code Review Sysytem
3 pages
Code Review Best Practices Guide
No ratings yet
Code Review Best Practices Guide
22 pages
LLM Critics Help Catch LLM Bugs Paper
No ratings yet
LLM Critics Help Catch LLM Bugs Paper
23 pages
Wy Rich 2019
No ratings yet
Wy Rich 2019
5 pages
Scaffidi 2007 Doctoral
No ratings yet
Scaffidi 2007 Doctoral
2 pages
Cha Marathi
No ratings yet
Cha Marathi
33 pages
04 Generate Code With Azure OpenAI Service
No ratings yet
04 Generate Code With Azure OpenAI Service
28 pages
Challenges and Paths Towards AI For Software Engineering
No ratings yet
Challenges and Paths Towards AI For Software Engineering
76 pages
AutoCodeRover: The Future of Program Improvement and GitHub Issue Resolution
No ratings yet
AutoCodeRover: The Future of Program Improvement and GitHub Issue Resolution
9 pages
Refactoring Vs Refuctoring Advancing The State of AI Automated Code Improvements 1
No ratings yet
Refactoring Vs Refuctoring Advancing The State of AI Automated Code Improvements 1
10 pages
Unit-5 Software Coding & Testing: .1 Code Review (Coding Concepts)
No ratings yet
Unit-5 Software Coding & Testing: .1 Code Review (Coding Concepts)
21 pages
Improving Code Review Effectiveness with BSC
No ratings yet
Improving Code Review Effectiveness with BSC
32 pages
Code Generation Tools (Almost) For Free? A Study of Few-Shot, Pre-Trained Language Models On Code
No ratings yet
Code Generation Tools (Almost) For Free? A Study of Few-Shot, Pre-Trained Language Models On Code
12 pages
CS145 Project 3: Data Cycle Exploration
No ratings yet
CS145 Project 3: Data Cycle Exploration
5 pages
Answer Key ABM2
No ratings yet
Answer Key ABM2
6 pages
SecureData ZProtect 8.4.0 Developer
No ratings yet
SecureData ZProtect 8.4.0 Developer
66 pages
CLASS 3 Class Card Grading Sheet 4th Sem
No ratings yet
CLASS 3 Class Card Grading Sheet 4th Sem
145 pages
Mckinsey 2016 Digital-By-Default-A-Guide-To-Transforming-Government-Final
No ratings yet
Mckinsey 2016 Digital-By-Default-A-Guide-To-Transforming-Government-Final
13 pages
Liebert Gxt3 UPS 120V/208V 500VA-3000VA: User Manual
No ratings yet
Liebert Gxt3 UPS 120V/208V 500VA-3000VA: User Manual
48 pages
Ed 01 (EN) Creemers Kompressor
No ratings yet
Ed 01 (EN) Creemers Kompressor
31 pages
Successful - Business - Planning - Energising - Your - Compa... - (TWO - The - Strategic - Business - Plan - Tactical - Section)
No ratings yet
Successful - Business - Planning - Energising - Your - Compa... - (TWO - The - Strategic - Business - Plan - Tactical - Section)
54 pages
PG Spec 20V4000DS4000 4000kVA 3D FC 50Hz
No ratings yet
PG Spec 20V4000DS4000 4000kVA 3D FC 50Hz
5 pages
Guide To Maximizing Your Google Business Profile For Local SEO Success in 2025
No ratings yet
Guide To Maximizing Your Google Business Profile For Local SEO Success in 2025
4 pages
Artificial Intelligence (AI) Startup Business Plan by Slidesgo
No ratings yet
Artificial Intelligence (AI) Startup Business Plan by Slidesgo
55 pages
Design of A Eco-Friendly City
No ratings yet
Design of A Eco-Friendly City
31 pages
Grant's Resume Scholarship
No ratings yet
Grant's Resume Scholarship
2 pages
Analisis Perusahaan Waskita Karya
No ratings yet
Analisis Perusahaan Waskita Karya
3 pages
Trax October Proposal
No ratings yet
Trax October Proposal
30 pages
PLSQL Web Toolkit
No ratings yet
PLSQL Web Toolkit
31 pages
Principles of Software Engineering
No ratings yet
Principles of Software Engineering
2 pages
WMI For Ugradation of BTALNM To BTALNM1 RDSO and SR HQ Signed
No ratings yet
WMI For Ugradation of BTALNM To BTALNM1 RDSO and SR HQ Signed
65 pages
An Analysis of Tigrays Fractured Landscape in The Wake of Genocide
No ratings yet
An Analysis of Tigrays Fractured Landscape in The Wake of Genocide
31 pages
Gimenez MarxismFeminism 1975
No ratings yet
Gimenez MarxismFeminism 1975
21 pages
Grade 6 Detailed Lesson Plan: II. Content Iii. Learning Resources
No ratings yet
Grade 6 Detailed Lesson Plan: II. Content Iii. Learning Resources
5 pages
Impact of Make in India On Automobile Sector
No ratings yet
Impact of Make in India On Automobile Sector
12 pages
Understanding Smart Sensors Third Edition Frank Download Available
100% (4)
Understanding Smart Sensors Third Edition Frank Download Available
113 pages
Sushmitha Nagarajan: Architect
No ratings yet
Sushmitha Nagarajan: Architect
4 pages
Provisional Allotment 1202414070
No ratings yet
Provisional Allotment 1202414070
3 pages
Admin Head
No ratings yet
Admin Head
1 page
Minetruck MT436B: Atlas Copco Underground Trucks
No ratings yet
Minetruck MT436B: Atlas Copco Underground Trucks
6 pages
Awareness ISO 22301 Danang Suryo W
100% (2)
Awareness ISO 22301 Danang Suryo W
36 pages
DCG80-110 - Workshop Manual - 3A, 4F and Stage V - CECS 2.0 - VDCG17 - 01GB
No ratings yet
DCG80-110 - Workshop Manual - 3A, 4F and Stage V - CECS 2.0 - VDCG17 - 01GB
1,328 pages
4 Ms For Production
100% (1)
4 Ms For Production
27 pages
Sop MM
No ratings yet
Sop MM
20 pages

Automating Code Review

Uploaded by

Automating Code Review

Uploaded by

2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)

Automating Code Review

979-8-3503-2263-7/23/$31.00 ©2023 IEEE 192

code&comment-to-code “I suggest ObjectUtils check for nulls”

Alternative and valid predictions

Fig. 1. Examples of correct and alternative predictions (revised from [18])

You might also like