Towards Semantic Clone Detection
via Probabilistic Software Modeling
Hannes Thaller, Lukas Linsbauer, Alexander Egyed
Institute for Software Systems Engineering
Johannes Kepler University Linz, Austria
{hannes.thaller, lukas.linsbauer, alexander.egyed}@jku.at
Abstract—Semantic clones are program components with sim- probabilities are used to conduct statistical tests (Generalized
ilar behavior, but different textual representation. Semantic Likelihood Ratio Test) that produce the final clone decision.
similarity is hard to detect, and semantic clone detection is
arXiv:2001.07399v1 [cs.SE] 21 Jan 2020
still an open issue. We present semantic clone detection via
Probabilistic Software Modeling (PSM) as a robust method for II. BACKGROUND
detecting semantically equivalent methods. PSM inspects the A basic understanding of clone detection and probabilistic
structure and runtime behavior of a program and synthesizes a
network of Probabilistic Models (PMs). Each PM in the network software modeling is needed to understand the approach. We
represents a method in the program and is capable of generating will use a monospace font to refer to program elements
and evaluating runtime events. We leverage these capabilities to (e.g., factorial_a) and italics to refer to the corresponding
accurately find semantic clones. Results show that the approach model elements (e.g., factorial_a).
can detect semantic clones in the complete absence of syntactic
similarity with high precision and low error rates.
Index Terms—clone detection, semantic clone detection, prob-
A. Clone Detection
abilistic modeling, multivariate testing, software modeling, static Clone detection is the process of finding pairs of similar
code analysis, dynamic code analysis, runtime monitoring, infer- program fragments as illustrated in Figure 1. Figure 1 shows
ence, simulation, deep learning
three different implementations of the factorial computation.
Figure 1a uses a for loop, while Figure 1b uses a while loop
I. I NTRODUCTION
implementation. Finally, Figure 1c uses recursion to compute
Copying and pasting source code fragments leads to code the factorial of n. The clone detection process includes the
clones. Code clones are considered an anti-pattern as they representation (e.g., text fragments), pairing (e.g., of text
increase maintenance costs, promote bad software design, and fragments of similar size), the similarity evaluation (e.g.,
propagate bugs [1], [2], [3], [4], [5], [6], [7], [8]. Code clones counting the differences in the text fragments), and the clone
are traditionally split into four categories. Type 1-3 [9], [10], decision (e.g., less than 10 differences).
[11] code clones are textual copies of a program fragment with Representations can be, for example, text (e.g., source code),
possible changes. Type 4 code clones are behavioral copies of graphs (e.g., AST), or probabilistic models (like in this work).
a program fragment that do not have any syntactic similarity Pairing is the process of selecting two code fragments that
but implement the same functionality (semantic equivalence). are potentially a clone. Each pair is called a candidate clone
For example, the iterative and recursive implementations of pair (or candidate pair). The similarity evaluation measures
the Fibonacci algorithm have no syntactic similarity while the similarity between the fragments of a candidate pair. The
implementing the same functionality. clone decision labels the candidate pair as a clone given that
Juergens et al. [12] have shown that existing tools only the similarity fulfills some criteria.
have limited capabilities for detecting Type 4 clones. This The properties of the similarity metric splits clones into two
limitation can also be seen in various clone detection tool groups [9]. Type 1-3 clones capture textual similarity while
comparisons [13], [14], [9], [10], [15] through the absence Type 4 clones capture semantic similarity [10], [14], [18],
or explicit exclusion of Type 4 clones. Nevertheless, Type 4 [9], [11], [19]. These types are increasingly challenging to
clones exist and tools for detecting them are needed [12], [16]. detect, with Type 4 being the most complex one. Figure 1a and
We present Semantic Clone Detection via Probabilistic Figure 1b are an instance of a Type 3 clone while Figure 1a (or
Software Modeling (SCD-PSM). SCD-PSM detects semantic Figure 1b) and 1c are an instance of a Type 4 clone. Note, that
clones with no textual and structural similarity. First, a network the definition of a semantic clone is often relaxed where up-to
of Probabilistic Models (PMs) is built via Probabilistic Software 50% syntactic similarity of the code fragments is allowed [13],
Modeling (PSM) [17]. Each PM models an executable (e.g., [20]. However, we consider these clones as complex Type 3
a method in Java) in the program under analysis. SCD-PSM clones (additions, deletions, reordering) and not as semantic
leverages these PMs and their inferential capabilities to detect clones. This means that semantic clones in the context of
semantically equivalent executables. Probabilistic inference this work are clones with no syntactic similarity except for
enables a similarity measure based on probabilities. These per-chance similarities (e.g., equal parameter names).
factorial_b(n){
factorial_a(n){ product = 1
factorial_c(n){
product = 1 i = 1
if(n <= 1){
for(i = 1; i <= n; i++){ while(i <= n){
return 1
product *= i product *= i
}
} i++
return factorial_c(n - 1) * n
return product }
}
} return product
}
(a) A for implementation of factorial. (b) A while implementation of factorial. (c) A recursive implementation of factorial.
Figure 1: The for and while implementations are complex Type 3 clones in which new lines were added and some changed.
The recursive implementation is a Type 4 clone of the for and while implementations without any syntactic resemblance.
B. Probabilistic Software Modeling NVP to the original data-space x = g(z) ∼ X.
Conditioning finds a latent-space configuration (i.e., a latent-
Probabilistic Software Modeling (PSM) [17] is a data-driven code) ẑ such that the associated data g(ẑ) = x̂ satisfies a
modeling paradigm that transforms a program into a network of given condition. First a proposal code is drawn form the latent-
Probabilistic Models (PMs). PSM extracts a program’s structure space ẑ which is then inverted to its data form x̂ = g(ẑ).
given by types, properties, and executables (e.g., classes, fields, Then the error is measured on the conditioned dimensions
and methods respectively in Java). This structure includes the via, e.g., Mean Squared Error (MSE). The error is used
call dependencies between the different code elements which to update the latent code ẑ and the procedure is repeated
defines the topology of the PM network. Each PM is optimized until convergence. For example, one can condition the return
towards a program execution. The program execution can either value from factorial_a on to the return value of the
be synthetic (e.g., random testing), from tests (e.g., developer fibonacci method. First, samples are drawn from the
tests), or from the program in its production environment. In factorial_a model retaining only the dimension associated with
the context of clone detection, synthetic program executions the return value. Then, samples are drawn from the fibonacci
suffice as the results are based on differential comparisons of model and the error between the return value dimensions is
two elements. computed. This error is back-propagated to the latent-code
Each PM represents an executable (e.g., a method in Java) which is updated according to the errors. After convergence of
in the program. Inputs are parameters, property reads, and the optimization the fibonacci sample contains the same return
invocation return values. Outputs are the method return value, values as imposed by the factorial_a sample. Furthermore, the
property writes, and invocation parameters. The distinction remaining dimension n is resampled (imputed) in such a way
between inputs and outputs is only a logical view from a soft- that it adheres to the joint relationship of all the variables
ware engineering perspective. The actual PMs are multivariate in fibonacci. Finally, fibonacci can be used to evaluate the
density estimators without such distinction (joint model of all likelihood of the conditioned sample.
variables). PMs can generate observations that are similar to
the initial training data. More importantly, each model can III. A PPROACH
evaluate the likelihood of data. The likelihood is used to SCD-PSM uses the models built by PSM and compares
detect behavioral equivalence between models, which is then them for behavioral equivalence. The behavioral equivalence is
generalized to the semantic equivalence between executables then generalized to semantic equivalence of executables (i.e.,
in the program. methods).
The PMs in the network are real Non-Volume Preserving
transformations (NVPs) [21], a generative likelihood-based A. Similarity Evaluation
latent-variable model for density estimation. NVPs learn The similarity evaluation computes the cross-wise likelihood
a function that maps data to a known latent-space, e.g., of the models by sampling and conditioning. Given is a
input parameter values n and return values product of pair of candidate PMs, each representing an executable. The
factorial_a, to a bivariate normal distributions. More similarity evaluation starts by selecting a reference model (null-
formally, each NVP is a neural network that learns a bijective model) M null and an alternative model (alt-model) M alt . Then,
function f : X 7→ Z (with g = f −1 ) between the original null-dimensions Mknull and alt-dimensions Mkalt are selected
data x ∈ X and predefined latent-variables z ∈ Z. The latent- from the models, e.g., parameter n from factorial_a is
variables are selected, such that sampling, conditioning, and compared to parameter n of factorial_b. Then, a reference
likelihood evaluation is efficient and straightforward, e.g., via sample Dknull is generated by M null as illustrated in Figure 2
an isotropic unit norm Gaussian N (0, 1). (1) representing the behavior of M null . This reference sample
Sampling generates data x ∈ X by drawing observations is used to generate a conditioned alternative sample Dalt|null
from the latent-variables z ∼ Z and inverting them via the (2) representing the behavior of M alt given that dimensions k
Table I: The 8 subject examples used in the evaluation.
Similarity Evaluation Clone Decision
Link A
1 null alt 3 Subject Style Clone Class Parameter Executable
g null 4
n
oni
diti
sampling con Factorial iterative A 1 1
2
alt Factorial recursive A 1 1
pooling 6
Fibonacci iterative B 1 1
5
Link B
alt null Fibonacci recursive B 1 1
con null Clone
diti Decision BubbleSort iterative C 1 1
oni
ng
sampling BubbleSort recursive C 3 2
alt
MergeSort iterative C 6 2
Likelihood
Conditional Marginal Ratio MergeSort recursive C 8 3
Sample Sample
22 12
Figure 2: SCD-PSM evaluates the similarity of a pair of models
via their data likelihood. The likelihoods are combined into
the final clone decision. pooling does not allow any sub-model relationship, while the
soft pooling relaxes this condition slightly.
The final requirement is that a candidate pair is only accepted
are fixed to the behavior of Mknull (3). Finally, the likelihood of as a clone if the selected dimensions k of both, M null and
Dnull is evaluated under M null , resulting in the base likelihood M alt contain at least one input and output dimension. That is,
of the reference sample under the null-model LLnull , and methods are semantically equivalent if at least parts of their
Dalt|null is evaluated under M alt , resulting in the likelihood input and output relationship is equivalent.
of the conditioned alternative sample under the alt-model In conclusion, the clone decision combines the link results
LLalt|null . Then, the null and alt roles are swapped and the and controls the results for a predefined false positive rate.
procedure is repeated (see Figure 2 Link B).
The swapping of roles is necessary because of sub-model IV. S TUDY
relationships. For instance, one model returns data distributed We implemented a prototype for SCD on top of PSM and
according to N (0, 1) and the other according to N (0, 5). One applied the similarity evaluation given in Section III.
link will lead to a high likelihood (sub-model is null) while the 1) The study uses 8 well-known algorithms listed in Table I
other link will result in low likelihood (super-model is null). distributed in 3 clone classes. Each clone class is a
In conclusion, the similarity evaluation tests the likelihood well-understood example of semantic clones with 0%
of the models in the context of each other. The final clone syntactic similarity. Each subject was triggered with
decision is based on these likelihood values. positive uniform distributed random values.
B. Clone Decision 2) The Probabilistic Model Network was computed via Gra-
dient, a PSM prototype [17]. The same hyper-parameters
The final step is to combine the likelihood values from the were selected as in our previous reported experiments.
similarity evaluation to a final decision as shown in Figure 2). 3) The Candidate Clone Pairs were all combinations of
The two likelihood ratios (4) are combined by a pooling dimensions of the PMs. The candidate pairs were formed
operator (5) and compared against a critical value yielding from all 8 subject systems.
the final clone decision (6). 4) Each valid candidate pair was tested for behavioral equality
More formally, the procedure makes use of the Generalized by cross-wise likelihood evaluation described in Section
Likelihood Ratio Test (GLRT) [22]. The log-GLRT measures III-A.
whether the log-likelihoods are significantly different from 0 5) The clone decision was computed via the GLRT and the
with results were pooled as described in Section III-B.
λ = LLalt − LLnull , (1)
A. Controlled Variables
where LL is the log-likelihood. The null hypothesis is that the
models are equal. It is rejected for small ratios λ ≤ c where c The study controls for pooling, the Type 1 error , and
is set to an appropriate Type 1 error, i.e., false-positive rate. For the number of particles used in the similarity evaluation
example, λ < log(0.01) allows 1 out of 100 candidates to be (Section III-A).
a false positive, i.e., wrongly rejecting semantic equivalence. Pooling describes how likelihoods are combined to the final
The Clone Decision (6) is computed by pooling (5) the link clone decision {soft, hard} (see Section III-B).
results. Hard pooling accepts the candidate pair as a clone if Type 1 error, or the false-positive rate, defines the critical value
the null hypothesis for both links could not be rejected. Soft c at which clones are considered significantly different
pooling accepts the candidate pair as a clone if the average {0.001, 0.01} (Section III-B). The critical value is the total
log-likelihood ratio of both links cannot be rejected. Hard Type 1 for both links.
Number of Particles are the number of samples that are balanced accuracy shows high detection rates of the approach
sampled during the similarity evaluation for the reference in most experiments settings.
sample Dnull and the alternative sample Dalt . A low
number of particles is faster to compute but has a higher VI. L IMITATIONS
variance in the results.
SCD-PSM inherits the limitations of PSM. PSM models
B. Response Variables data. Object references are handles to containers (objects) that
The performance of the clone detection is measured via store data. Thereby, SCD-PSM cannot detect semantic clones
precision, recall, and the balanced accuracy. These metrics of executables that solely manage object references, e.g., a
are computed by the True Positive (TP), False Positive (FP), collection library. However, this limitation does only hold if
True Negative (TN), and False Negative (FN) proportion of the program never accesses the underlying data. Furthermore,
detected clone instances, e.g., correctly identifying a clone pair PSM explodes lists into singular values since distributions do
counts towards TP. not contain any order information. This means executables
Precision measures the performance to detect only relevant that change the order of sequences are matched based on the
instances given by values, not their order. As a consequence, invoking a wrongly
implemented, e.g., sorting algorithm, would result in a false
TP
(2) positive. Extending PSM to model distributions of sequences
TP + FP will alleviate this issue.
Recall measures the performance of detecting all relevant A limitation of the detection process is that it is built on
instances given by runtime observations. This means that the approach can only
TP be applied to runnable source code.
(3) The final limitation is that the approach cannot detect Type
TP + FN
2-3 clones. Slight changes, e.g., flipping a plus sign to a minus,
Balanced Accuracy measures the performance of detecting
have large implications on the resulting runtime behavior. These
relevant and irrelevant instances but considers a possible
changes will impact the semantic detection process such that
imbalance between the number of relevant and irrelevant
the candidate clone pair will not be accepted. For example,
instances. It is given by
common clone detectors will report Listing 1 and Listing 2 as
TP TN clones since they differ only by one character (ignoring names
+
TP + FN TN + FP (4) and reducing minimum size). However, this does not hold for
2 Type 4 detectors because the input and output relationship
C. Experiment Results is different. In contrast, many clone detectors will not detect
The study results are given in Table II. Average precision was Listing 1 and 3 as clones because of the many additions. Type 4
0.991, recall was 0.797, and the balanced accuracy was 0.870. detectors will report this pair as clones since the behavior of
The precision across experiments was excellent, indicating that adding one to the input is identical. This hints that Type 2-3
models can reliably detect behavioral equality. This is reflected and Type 4 clones represent detached concepts that share less
in the low number of FPs. However, the FNs indicate that common ground than expected. More importantly, this raises
some positive examples are missed. Reducing the Type 1 error, the question whether existing detectors that report Type 3-4
i.e., falsely rejecting semantic equality, improves on the FNs. detection capabilities generalize as expected.
The recall was good for most evaluates. However, hard pooling i n c ( a : I n t ) : I n t {
caused a slight drop in the recall. The balanced accuracy is return a + 1
}
good to excellent for most experiment configurations. Perfect
scores are given for experiment 1 and 12. Listing 1: Increment method
No effect of Pooling, Type 1, and Particles on the accuracy
can be seen. dec ( a : I n t ) : I n t {
return a − 1
V. D ISCUSSION }
The results from Section IV-C are encouraging. The general Listing 2: Decrement method
performance was good to excellent. No significant difference
between the different levels of pooling, Type 1, and Particles in
Table II can be seen. However, a larger sample size is needed to inc ( a : Int ) : Int {
b = 1 ∗ 3.12
precisely attribute effects on the performance. In the 10-particle c = b / 2
setting a higher variance of performance can be seen caused d = c + −0.5
by per-chance errors. The number of FPs is in all experiments return ( I n t ) a + d
}
low which is expected given that the Type 1 error was set to
0.001 and 0.01. In contrast, the number of FNs is acceptable. Listing 3: Complicated increment method
This is reflected in the Recall that ranges from 0.64 to 1. The
Table II: Results of the clone detection experiments.
Controlled Variables Response Variables
Pooling Type I Particles TP FP TN FN Precision Recall Balanced Accuracy
1 hard 0.001 10 22 0 14 0 1.00 1.00 1.00
2 hard 0.001 50 18 0 10 8 1.00 0.69 0.78
3 hard 0.001 100 20 0 12 4 1.00 0.83 0.89
4 soft 0.001 10 14 0 18 4 1.00 0.78 0.89
5 soft 0.001 50 22 0 10 4 1.00 0.85 0.89
6 soft 0.001 100 22 0 10 4 1.00 0.85 0.89
7 hard 0.010 10 8 0 26 2 1.00 0.80 0.94
8 hard 0.010 50 14 0 14 8 1.00 0.64 0.78
9 hard 0.010 100 14 0 14 8 1.00 0.64 0.78
10 soft 0.010 10 16 2 10 8 0.89 0.67 0.72
11 soft 0.010 50 20 0 12 4 1.00 0.83 0.89
12 soft 0.010 100 22 0 14 0 1.00 1.00 1.00
VII. R ELATED W ORK actions a method takes, e.g., accessing an array, writing a
Many studies have evaluated textual clones. However, there property, or invoking a method. These actions correspond,
are only a few studies reporting reliable results on semantic to some extend, to the dimensions of PSM models, i.e.,
clones without relaxing the definition of Type 4. represent entry-points of information (e.g., field accesses,
Rattan [11] et al. provided a review of clone detection invocations, etc.) Oreo counts these entry-points and compares
studies. The review also investigated approaches that tackle then between the fragments in a candidate pair. No analysis of
Type 4 clones. They conclude that some approaches solve the runtime assignments is conducted, nor is the relationship
approximations (i.e., complex Type 3 clones) of Type 4 clones. between the actions analyzed like SCD-PSM does. Oreo
Horwitz [23] detected textual and semantic differences reports many complex Type 3 and Type 4 clones up to 50%
in programs via a Program Representation Graph, which syntactic similarity based on this semantic similarity (and the
is similar to a Program Dependency Graph (PDG). PDG- additional pipeline steps). However, more research is needed to
based approaches [18], [24], [25] use (static) data and control identify the weaknesses and strengths of both approaches. This
dependencies to find similar sub-graphs between the candidates. highlights the need for a hard but well understood baseline
They can detect complex Type 3 clones, e.g., Figure 1a and dataset of Type 4 clones similar to the examples in our study
Figure 1b. However, the compared PDG sub-graphs are a but extended with a larger variety of semantic clones.
representation of the source code; thereby, the approaches still
rely on syntactic similarity [26]. VIII. C ONCLUSION AND F UTURE W ORK
Another category of semantic clone detectors are test-based
methods. Test-based methods randomly trigger the execution In this work, we presented a viable approach for semantic
of two candidates and measure whether equal inputs cause clone detection - Semantic Clone Detection via Probabilistic
similar outputs. Jiang and Su [27] were able to detect semantic Software Modeling (SCD-PSM). SCD-PSM leverages the PMs
clones without syntactical similarities. A similar approach of PSM to detect method level semantic clones with 0%
was presented by Deissenboeck et al. [28]. One issue with syntactic similarity.
test-based clone detection is that candidates need a similar We have discussed the similarity evaluation and the clone
signature. Differences in data types or the number of parameters decision that represent the central aspect of a clone detector.
can not be effectively handled by the test-case generators or We evaluated the concepts on a set of well-known semantic
the similarity measurement. SCD-PSM works similar to test- clones that provide a hard baseline for Type 4 detectors.
based methods in that it observes the runtime and compares Our future work is to evaluate the scalability of the approach
the resulting behavior. However, SCD-PSM builds generative with large programs. Furthermore, we want to compare SCD-
models from the observed behavior capable of generating PSM with existing Type 3 clone detectors.
and evaluating data. Missing dimensions are imputed by In conclusion, SCD-PSM is capable of detecting semantic
conditioning and sampling. This allows SCD-PSM to overcome clones with 0% syntactic similarity.
the issue of signature mismatches. Furthermore, PSM abstracts
the data types into text, integer, and floats mitigating data type ACKNOWLEDGMENTS
mismatches.
Finally, the clone detector Oreo [20] has also reported Type 3 The research reported in this paper has been supported by the
to Type 4 detection capabilities. Oreo uses a combination Austrian ministries BMVIT and BMDW, and the Province of
of representations and detection stages to find clones. Most Upper Austria in terms of the COMET - Competence Centers
important is the semantic similarity comparison based on for Excellent Technologies Programme managed by FFG.
R EFERENCES [15] F. Farmahinifarahani, V. Saini, D. Yang, H. Sajnani, and C. V. Lopes,
“On Precision of Code Clone Detection Tools,” in 2019 IEEE 26th Inter-
[1] Mayrand, Leblanc, and Merlo, “Experiment on the automatic detection national Conference on Software Analysis, Evolution and Reengineering
of function clones in a software system using metrics,” in Proceedings of (SANER), Feb. 2019, pp. 84–94.
International Conference on Software Maintenance ICSM-96. Monterey, [16] V. Kafer, S. Wagner, and R. Koschke, “Are there functionally similar
CA, USA: IEEE, 1996, pp. 244–253. code clones in practice?” in 2018 IEEE 12th International Workshop on
[2] A. Monden, D. Nakae, T. Kamiya, S. Sato, and K. Matsumoto, “Software Software Clones (IWSC). Campobasso: IEEE, Mar. 2018, pp. 2–8.
quality analysis by code clones in industrial legacy software,” in [17] H. Thaller, L. Linsbauer, R. Ramler, and A. Egyed, “Probabilistic
Proceedings Eighth IEEE Symposium on Software Metrics. Ottawa, Software Modeling: A Data-driven Paradigm for Software Analysis,”
Ont., Canada: IEEE Comput. Soc, 2002, pp. 87–94. arXiv:1912.07936 [cs], Dec. 2019.
[3] R. C. Martin, Ed., Clean Code: A Handbook of Agile Software [18] J. Krinke, “Identifying Similar Code with Program Dependence Graphs,”
Craftsmanship. Upper Saddle River, NJ: Prentice Hall, 2009. Proceedings Eighth Working Conference on Reverse Engineering, pp.
[4] M. Fowler and K. Beck, Refactoring: Improving the Design of Existing 301–309, 2001.
Code, ser. The Addison-Wesley Object Technology Series. Reading, [19] H. Thaller, R. Ramler, J. Pichler, and A. Egyed, “Exploring code
MA: Addison-Wesley, 1999. clones in programmable logic controller software,” in 2017 22nd
[5] A. Hunt and D. Thomas, The Pragmatic Programmer: From Journeyman IEEE International Conference on Emerging Technologies and Factory
to Master. Reading, Mass: Addison-Wesley, 2000. Automation (ETFA). Limassol: IEEE, Sep. 2017, pp. 1–8.
[6] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, “An empirical [20] V. Saini, F. Farmahinifarahani, Y. Lu, P. Baldi, and C. V. Lopes, “Oreo:
study of operating systems errors,” ACM SIGOPS Operating Systems Detection of clones in the twilight zone,” in Proceedings of the 2018
Review, vol. 35, no. 5, p. 73, Dec. 2001. 26th ACM Joint Meeting on European Software Engineering Conference
[7] Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: Finding Copy-Paste and Symposium on the Foundations of Software Engineering - ESEC/FSE
and Related Bugs in Large-Scale Software Code,” IEEE Transactions 2018. Lake Buena Vista, FL, USA: ACM Press, 2018, pp. 354–365.
on Software Engineering, vol. 32, no. 3, pp. 176–192, 2006. [21] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using
[8] R. Geiger, B. Fluri, H. C. Gall, and M. Pinzger, “Relation of Code Real NVP,” arXiv:1605.08803 [cs, stat], May 2016.
Clones and Change Couplings,” in Fundamental Approaches to Software [22] J. Fan, C. Zhang, and J. Zhang, “Generalized Likelihood Ratio Statistics
Engineering, L. Baresi and R. Heckel, Eds. Berlin, Heidelberg: Springer and Wilks Phenomenon,” The Annals of Statistics, vol. 29, no. 1, pp.
Berlin Heidelberg, 2006, vol. 3922, pp. 411–425. 153–193, 2001.
[9] C. K. Roy and J. R. Cordy, “A Survey on Software Clone Detection [23] S. Horwitz, “Identifying the Semantic and Textual Differences Between
Research,” Queen’s School of Computing TR, vol. 115, p. 115, 2007. Two Versions of a Program,” in PLDI, 1990.
[10] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, “Comparison [24] M. Gabel, L. Jiang, and Z. Su, “Scalable Detection of Semantic
and Evaluation of Clone Detection Tools,” IEEE Transactions on Software Clones,” in Proceedings of the 13th International Conference on Software
Engineering, vol. 33, no. 9, pp. 577–591, 2007. Engineering - ICSE ’08. ACM Press, 2008, p. 321.
[11] D. Rattan, R. Bhatia, and M. Singh, “Software clone detection: A [25] R. Komondoor and S. Horwitz, “Using Slicing to Identify Duplication
systematic review,” Information and Software Technology, vol. 55, no. 7, in Source Code,” in Proceedings of the 8th International Symposium on
pp. 1165–1199, Jul. 2013. Static Analysis, ser. SAS ’01. London, UK, UK: Springer-Verlag, 2001,
[12] E. Juergens, F. Deissenboeck, and B. Hummel, “Code Similarities pp. 40–56.
Beyond Copy & Paste,” in 2010 14th European Conference on Software [26] S. Wagner, A. Abdulkhaleq, I. Bogicevic, J.-P. Ostberg, and J. Ramadani,
Maintenance and Reengineering. Madrid: IEEE, Mar. 2010, pp. 78–87. “How are functionally similar code clones syntactically different? An
[13] J. Svajlenko and C. K. Roy, “Evaluating clone detection tools with empirical study and a benchmark,” PeerJ Computer Science, vol. 2, p.
BigCloneBench,” in 2015 IEEE International Conference on Software e49, Mar. 2016.
Maintenance and Evolution (ICSME). Bremen, Germany: IEEE, Sep. [27] L. Jiang and Z. Su, “Automatic Mining of Functionally Equivalent
2015, pp. 131–140. Code Fragments via Random Testing,” in Proceedings of the Eighteenth
[14] R. Koschke, “Survey of research on software clones,” in Duplication, International Symposium on Software Testing and Analysis, ser. ISSTA
Redundancy, and Similarity in Software, ser. Dagstuhl Seminar Pro- ’09. New York, NY, USA: ACM, 2009, pp. 81–92.
ceedings, R. Koschke, E. Merlo, and A. Walenstein, Eds., no. 06301. [28] F. Deissenboeck, L. Heinemann, B. Hummel, and S. Wagner, “Challenges
Dagstuhl, Germany: Internationales Begegnungs- und Forschungszentrum of the Dynamic Detection of Functionally Similar Code Fragments,”
für Informatik (IBFI), Schloss Dagstuhl, Germany, 2007. in 2012 16th European Conference on Software Maintenance and
Reengineering, Mar. 2012, pp. 299–308.