SSRN 4331099
SSRN 4331099
we
in Smart Contracts
Wanqing Jiea , Jiaqi Wanga , Arthur Sandor Voundi Koea,∗, Jin Lia,∗, Qi Chena , Pengfei Huanga , Yaqi Wua , Yin Wanga
a Institute of Artificial Intelligence and Blockchain, Guangzhou University, 510006, Guangzhou, China
ie
Abstract
ev
Current automatic data-driven vulnerability detection in smart contracts selects and processes features of interest under
black box settings without empirical justification. In this paper, we propose a smart contract testing methodology that
bestows developers with flexible, practical and customizable strategies towards vulnerability detection. It enforces
strong whitebox knowledge to a series of supervised multimodal tasks under static analysis. Each task encapsulates
a vulnerability detection branch test and pipelines feature selection, dimension unification, feature fusion, model
r
training and decision making. Moreover, we exploit multiple features made up of code and graph embeddings at the
single modality level (intramodal settings) and across individual modalities (intermodal settings). We also assign each
task to either intramodal or intermodal settings, and show how to train self-attentive bi-LSTM, textCNN, and random
er
forest (RF) models to extract a joint multimodal feature representation per task. We evaluate our framework over
101,082 functions extracted from the SmartEmbed dataset, and rank each multimodal vulnerability mining strategy
in terms of detection performance. Extensive experiments show that our work outperforms existing schemes and the
highest performance reaches 99.71%.
pe
Keywords: Smart Contract, Vulnerability detection, Multimodal, AI approach, White box
1. Introduction
ot
The increasing popularity and adoption of blockchain technology have resulted in an abundance of blockchain
solutions. According to [1], the investment rate in global blockchain deployments will reach 19 billion USD by 2024.
What stands out in blockchain technology is the use of smart contracts [2] to allow untrustworthy parties to securely
adhere to a set of promises over their assets. In the literature, the term smart contract tends to be used to refer to an
tn
immutable self-contained block of rules written in a contract oriented language such as Solidity [3]. A well-known
fact is that the first proof of concept related to smart contracts was provided over the Ethereum blockchain [4], and
an estimate of one million smart contracts, which control several billion dollars in digital currency, has been deployed
on Ethereum. There is some evidence such consistent wealth attracts attackers and raises concerns over the safety of
smart contracts [5]. There are two notable directions towards securing smart contracts: contract codification, which
rin
addresses the need to write an optimized and correct contract with fewer bugs [6], and contract vulnerability detection,
which identifies weaknesses in the contract code such as the re-entrancy vulnerability in decentralized autonomous
organization (DAO) contract [7]. In this paper, we investigate the detection of vulnerabilities in solidity written smart
contracts for the Ethereum blockchain.
Historically, mining vulnerabilities has often been referred to as ”searching for a needle in haystack” [8]. Research
on the subject mostly focuses on the use of static analysis for vulnerability uncovering. For example, based on the
taxonomy of smart contracts proposed by Atzei et al. [9], Argañaraz et al. [10] leverages static analysis of the source
Pr
∗ Correspondingauthor
Email addresses: [email protected] (Arthur Sandor Voundi Koe), [email protected] (Jin Li)
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
code and formulate expert-based verification rules. Another example is Slither [11], a static analyzer for solidity
written smart contracts that relies on a set of manually coded vulnerability detectors. However, one criticism of much
we
of the literature on static analysis is that it suffers from heavy reliance on expert-based rules, leading to high false
positive rate, as well as from a laborious effort to address new vulnerabilities [12]. Despite the aforesaid, Gao et al.
[13], in their impressive investigation, advocate to use static analysis which is faster, compared to other more reliable
but expensive methods such as dynamic analysis [14][15] and dynamic symbolic execution [7][16][17].
Different methods based on Machine learning (ML) and deep learning (DL) have been proposed to support au-
tomation in software vulnerability mining. Nevertheless, traditional machine learning techniques still rely on fixed
ie
expert-based predictors which are susceptible to bias and insufficient generalization [18]. To uphold automatic data-
driven vulnerability detection, deep networks were successfully applied to supervised feature learning for single and
multiple modalities.
ev
The topic of multimodal learning applied to vulnerability mining in smart contracts can best be treated when
considering three multimodal data sources. First, the contract source code also known as source code layer (SC) made
of features acquired from processing the contract source code. Second, the built-based data or built-based layer (BB)
comprising features extracted from the contract compilation. Third, the contract’s Ethereum virtual machine (EVM)
bytecode also known as EVM bytecode layer (EVMB) which encompasses features obtained from processing the
r
contract EVM bytecode. Under multimodal learning, each multimodal data source, also known as modality, has one
or multiple sub-modalities expressed as features.
We hold on to the three modalities mentioned previously and classify the relationship among features under two
er
main categories. First, the intramodal settings describing the analysis of features belonging to an individual modality,
namely SC, BB, and EVMB. Second, the intermodal settings relating to the combination of features across individual
modalities. We further distinguish two subgroups in the intermodal settings: two-by-two intermodal settings (SC+BB,
SC+EVMB, BB+EVMB) and three-by-three intermodal settings (SC+BB+EVMB).
The most serious disadvantage in multimodal learning that applies to smart contract static vulnerability analysis
pe
is the lack of a common testing methodology. This deficiency causes developers to struggle when designing solutions
to ensure the reliability of their smart contracts. Such a methodology avenue should guarantee strong whitebox
knowledge testing and discourage active learning.
settings. First, understanding the nature of raw features to extract as well as their significance towards vulnerability
detection performance. Second, choosing the best suitable AI models to perform vulnerability detection inference.
Third, finding the appropriate feature fusion technique yielding higher detection outcome. Fourth, maintaining a high
level of whitebox knowledge throughout the vulnerability detection pipeline.
tn
1.3. Motivations
Most studies in the field of smart contract vulnerability mining have only focused on intramodal and two-by-two
intermodal settings. Moreover, they operate under black box settings and embrace fixed rules regarding feature se-
lection, feature fusion, and AI models to leverage. Hence, it is not possible to investigate the significant relationships
between feature selection approaches, feature fusion techniques, AI models’ choice and the effectiveness of vulner-
rin
ability detection in smart contracts. Such situation denotes the lack of a common and clear methodology to guide
developers in vulnerability uncovering under intramodal settings, two-by-two and three-by-three intermodal settings.
In this paper, we aim to design an information-rich framework to enhance the research on smart contract vulnera-
bility mining under multimodal AI.
ep
dings and graph embeddings. We use a word2vec model and a bidirectional encoder representation from transformers
(BERT) model to generate code embeddings, while a graph convolutional network (GCN) outputs graph embeddings.
2
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
We define eighty-four flexible, practical and customizable strategies to achieve strong whitebox knowledge and
guide developers and researchers towards practical and effective smart contract vulnerability uncovering under multi-
we
modal AI settings.
We model each strategy as a supervised vulnerability detection task that pipelines feature selection, single feature
dimension unification technique, single feature fusion approach, model training, and a single unit for decision making.
We exploit features from intramodal settings: SC, BB and EVMB separately, from two-by-two intermodal settings:
SC+BB, SC+EVMB and BB+EVMB separately, and from three-by-three intermodal settings: SC+BB+EVMB. Fig-
ure 1 depicts the three multimodal data sources together with their associated features. Owing to multimodal learning,
ie
each task relies on max-pooling (MP), spatial pyramid pooling (SPP), and dense layers (Dense) for feature dimension
uniformization. Each task further implements either horizontal or vertical feature concatenation for feature fusion.
Each task includes AI training and AI inference for state-of-the art text convolutional neural network (textCNN), bi-
ev
directional long short term memory (bi-LSTM) with self-attention, and random forest (RF) machine learning model.
What can be clearly seen is that the set of all tasks in our framework form the smart contract vulnerability branch
coverage under multimodal learning. We compare our work with the existing literature and assess the increase in
performance under intramodal and intermodal settings respectively.
r
1.5. Our Contributions
This paper discusses an innovative multimodal learning approach for detecting smart contract vulnerabilities. The
main contributions of this work are summarized as follows.
er
1) Features mixing. Our framework expresses multiple features respectively under intramodal and intermodal
settings. We characterise such features as code and graph embeddings, to leverage the power of natural language
processing (NLP) algorithms.
2) Vulnerability detection strategies. We develop a series of supervised tasks for automatic vulnerability mining
pe
in Ethereum smart contracts under multimodal learning [19]. Each task represents a vulnerability detection branch test
and pipelines feature selection, feature dimension unification, feature fusion, model training and model testing. More-
over, we assign each task to serve as a vulnerability detection strategy in either intramodal, two-by-two intermodal or
three-by-three intermodal settings.
3) Experimental evaluation of strategies.
We evaluate every task by leveraging textCNN, bi-LSTM, and RF for training and decision making; MP, SPP
and dense layers for dimension unification; as well as horizontal feature concatenation (stack) and vertical feature
ot
concatenation (concat) for feature fusion. Extensive empirical analysis over the SmartEmbed dataset [13] reveals that
under intramodal settings, artifacts from (BB) perform best, while under two-by-two intermodal settings, (SC+BB)
has significant advantage. Finally, the best detection strategy is achieved by the shared representation learning across
the three modalities (SC+BB+EVMB), and based on evidence, two-by-two intermodal settings outperform intramodal
tn
settings.
2. Background
rin
It is necessary here to clarify exactly what is meant by feature fusion and how our work exploits such a concept.
The term feature fusion refers to combining features of different layers, different modalities or different branches
[20]. The concept of feature fusion embodies a multitude of techniques grouped under four categories.
First, feature vectors addition performing element-wise addition. For example, let A and B be two vectors of same
size. The fusion of A and B produces a single vector C where A + B = C.
ep
Second, feature vectors concatenation. This paper distinguishes two main concatenation types. In horizontal
concatenation known as concat, let I be a row vector of dimension M1×n , and J be a row vector of dimension M1×k .
The horizontal concatenation of vectors I and J is equivalent to M1×(n+k) . In vertical concatenation denoted as stack,
let U be a row vector of dimension Md×n , and V be a row vector of dimension Me×n , such that U and V have the same
amount of column-wise elements n with same or different values. We define the stacking of vectors U and V as the
matrix T such that T = M(d+e)×n .
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
The third feature fusion technique is gated feature vectors fusion leveraged by [21] that proposes a gated fusion
unit which concatenates feature vectors as input, then combines them with an average pooling layer, a dense layer and
we
a sigmoid layer for caricature recognition.
The fourth feature fusion approach is based on attention mechanism which measures the contributions of each
individual feature towards the segmentation accuracy and can remove redundant features [22].
In this work, we weight horizontal and vertical concatenations of features over other feature fusion techniques.
ie
3. Methodology
This section highlights the two essential parts of our framework. First, how to extract features and fuse them to
get a joint multimodal representation under intramodal and intermodal settings. Second, how to input intramodal and
ev
intermodal features to AI models to design vulnerability detection strategies that provide full white-box knowledge.
r
3.1.1. Feature extraction under intramodal settings
We evaluate key aspects to acquire features at every separate layer of the intramodal settings. Moreover, Figure 1
er
illustrates the intermodal settings and a more detailed account of intramodal layers is given below.
a) Source code layer (SC): This layer manages features acquired from processing the contract source code. In
this work, we choose the function as granularity level, and parse the contract source code to extract the set of all
functions. We rely on existing tools [11] to define the ground truth binary labels for the different functions. We
leverage improvements from natural language processing (NLP) and convert function definitions into embedding
pe
vectors. Specifically, we apply two types of embedding vector models: the Word2vec model [23], and the BERT
model [24]. The word2vec embeddings (SC-W2V) and the BERT embeddings (SC-Bert) form the two features of
interest at the SC layer.
b) Built-based layer (BB): This layer manages features extracted during the contract source code compilation.
We exploit the static analyzer tool Slither [11], and extract the call flow graph (CFG) and the static single assignment
expression (SSA) of every function. We apply word2vec and BERT embeddings to SSA encodings, and generate graph
ot
embeddings over CFGs thanks to an untrained graph convolutional network (GCN). As a result, word2vec embeddings
(SSA-W2V), BERT embeddings (SSA-Bert), and graph embeddings (BB-CFG) are the three main features at the BB
layer. We set the binary label for every function at the BB layer through a simple matching with the corresponding
function from the SC layer.
tn
c) EVM Bytecode Layer (EVMB): The EVMB layer is responsible for features acquired from processing the
contract bytecode expression. We disassemble every contract EVM bytecode using the Eth2vec [25] tool, and design
a CFG generator to output CFGs from each disassembled contract. We perform label matching from SC functions and
set the corresponding label to each CFG. Regarding the features of interest at the EVMB layer, we apply an untrained
GCN over CFGs to generate graph embeddings (EVMB-CFG), as well as word2vec embeddings (EVMB-ASM) to
every function in disassembled contracts.
rin
(BB+EVMB) and (SC+EVMB) and the three-by-three intermodal settings deal with features from (SC+BB+EVMB).
a) SC + BB Combination: This two-by-two intermodal combination of features from SC and BB layers in-
vestigates the associated performance in vulnerability detection. It brings together five features: SC-W2V, SC-Bert,
SSA-W2V, SSA-Bert and BB-CFG.
b) SC + EVMB Combination: Such two-by-two intermodal settings combine features from SC and EVMB
Pr
layers. It assesses the vulnerability detection performance linked to such combination, and brings together four
features of interest: SC-W2V, SC-Bert, EVMB-ASM and EVMB-CFG.
4
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
2022/10/18 08:55 【1018】fig1.svg
we
pragma solidity ^0.4.24; function A1(){
contract A{ total = total + msg.value;
function A1() { uint amount = msg.value;
…… …… } Word2Vec SC-W2V
}
function A2(address
function A2(address _add)
_add){ ……
{ uint balance =
}
balances[_add] SC-Bert
function …… BERT
…… }
} …
…
source code word embedding SC features
function
(b) Build-based layer (BB)
ie
function A1(){
total2=total1+msg.value;
uint amount1=msg.value;
Word2Vec SSA-W2V
…… }
ev
…… } BERT
……
} …
function A2(address static single assignment(SSA) word embedding BB features
_add){ ……
}
function …… 𝑁1
}
…
𝑁2
source code 𝑁3
BB-CFG
r
𝑁4 GCN
𝑁𝑖 Node Edge
graph embedding BB features
control flow graphs(CFG)
(c) EVM bytecode layer (EVMB)
er
[1] PUSH1 0x60
[2] BLOCKHASH
[3] MSTORE
[5] PUSH1 0x04
[6] CALLDATASIZE
EVMB-ASM
606060405260043610603f [7] LT Word2Vec
576000357c01000000000 [9] PUSH1 0x3f
000000000000000000000 [10] JUMPI
000000000000000000000 [12] PUSH1 0x00
pe
00000900463ffffffff168063 …… word embedding EVMB features
1176bd96146044575b600 aseembly code(asm)
080fd5b3415604e5760008
0fd5b60626004808035906
020019091905050606456 𝑁1
5b005b80600……
EVM bytecode 𝑁2
𝑁3
EVMB-CFG
GCN
𝑁4
Figure 1: Overview of three modalities and associated features. (a) Source code layer (SC) manages features acquired from processing the contract
file:///E:/postgraduate——freshman/智能合约漏洞检测【实验增强版】/论文图表/【1018】fig1.svg 1/1
source code. (b) Built-based layer (BB) manages features extracted during the contract source code compilation. (c) The EVM Bytecode Layer
(EVMB) is responsible for features acquired from processing the contract bytecode expression.
tn
c) BB + EVMB Combination: The current combination evaluates the detection performance in mixing features
from BB and EVMB layers. It combines five main features: SSA-W2V, SSA-Bert, BB-CFG, EVMB-ASM and
EVMB-CFG.
rin
feature dimension unification, feature fusion, model training and decision making. We aim to assess and classify the
performance of every vulnerability detection strategy.
5
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
Source Code Layer (SC)
we
Word
Extracted Embedding
Functions Code
ie
SC+BB+EVMB
ev
Word Word
Static Single Embedding BB+EVMB Extracted
Embedding
Assignment Functions
Assembly Code
1
1
3 2
3 2
4
r
4
Control Flow Graph Graph
Embedding Control Flow Graph Graph
Embedding
er
Figure 2: Intermodal settings for smart contract vulnerability detection. The two-by-two intermodal settings include the combinations of features
from (SC+BB), (BB+EVMB) and (SC+EVMB), and the three-by-three intermodal settings deal with features from (SC+BB+EVMB).
pe
Appendix A details all the specific tasks leveraged to build our framework. The following is a brief description
of what happens during feature dimension unification, feature fusion, as well as during model training and testing.
a) Feature dimension unification: Recent research has revealed that at least one dimension should be equal when
combining different features of interest. To achieve such objective, dimension unification relies on three techniques.
First, we apply a max-pooling layer (MP) to input features. Second, we implement a fully connected layer also
known as dense layer (Dense) over feature candidates for fusion. Third, we experiment with spatial pyramid pooling
layer (SPP) [26, 27] to uniformize feature dimension. Recent research has revealed pooling layers choose meaningful
ot
information but cause detailed information loss [28]. Dense layers learn local and global feature information between
layers [29], but learning too many features may slow down the training and lead to overfitting.
From the three methods mentioned above, MP, Dense and SPP layers are all likely to have a positive impact in the
dimension unification stage: We evaluate the impact of each those dimension unification methods over vulnerability
tn
detection effectiveness.
b) Proper feature fusion: The proper feature fusion follows the dimension unification stage. It relies on hori-
zontal concatenation (concat) and vertical concatenation (stack) for feature fusion under intramodal and intermodal
settings.
c) model training:
rin
After the feature fusion stage, we adopt a bi-LSTM model with self attention at fusion model training stage. The
bi-LSTM model with self attention is the fixed strategy selection model in this stage.
In addition, we replaced the bi-LSTM model with VanillaRNN [30] and gated recurrent unit (GRU) [31] for
performance comparison (Table 2), respectively, in the intramodal fusion of the SC layer at the fusion model training
stage. The performance of the bi-LSTM model for smart contract vulnerability detection is significantly better.
ep
d) Decision making: The last stage is decision making. To train and evaluate the vulnerability detection, we
proceed as follows. First, we adopt a bi-LSTM model with self attention combined with a random forest (RF) model.
Second, we combine a self-attentive bi-LSTM with textCNN model.
We train RF model and textCNN model in the decision making stage as the strategy selection model in our
multimodal feature fusion network.
Vulnerability detection strategies: Through the above four stages of model selection respectively, we investigate
Pr
twelve strategies of interest at each intramodal or intermodal setting after feature selection, and we denote | as the
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
SC features multimodal feature
fusion settings
we
SC-W2V
intramodal
SC
(a) uniform vector dimensions (b) feature fusion (c) model training (d) decision making
SC-Bert
BB features BB
self-attention
stack/concat
features
𝑃1 𝑃1′
bi-LSTM
intermodal ……
layer
SSA-Bert RF/textCNN yො
SC+BB or
ie
……
BB-CFG
SC+EVMB MP/Dense/
SPP layer
EVMB features 𝑃2 𝑃2′
BB+EVMB …… …… 𝑉
EVMB-ASM
SC+BB+EVMB
ev
mutimodal feature fusion network
EVMB-CFG
Figure 3: Multimodal feature fusion architecture for smart contract vulnerability detection. According to the multimodal feature fusion settings
(intramodal and intermodal), to select the processed features and input them into the multimodal feature fusion network. The network includes four
stages: (a) uniform vector dimensions, (b) feature fusion, (c) model training and (d) decision making.
r
pipeline symbol. er
• strategy 1: SPP | concat | (bi-LSTM+self-attention) | textCNN.
• strategy 2: SPP | concat | (bi-LSTM+self-attention) | RF.
• strategy 3: SPP | stack | (bi-LSTM+self-attention) | RF.
pe
• strategy 4: SPP | stack | (bi-LSTM+self-attention) | textCNN.
• strategy 5: MP | concat | (bi-LSTM+self-attention) | textCNN.
• strategy 6: MP | concat | (bi-LSTM+self-attention) | RF.
• strategy 7: MP | stack | (bi-LSTM + self-attention) | RF.
ot
tings, and the optimal performance model under each setting is used as the final framework.
The performance of each strategy and outperforming strategies in our smart contract vulnerability detection frame-
work will be described in detail in the next experiment section.
4. Experiments
ep
a physical machine with the following characteristics: Intel (R) Xeon (R) Gold 6240R CPU running at 2.40 GHz, 32
GB of RAM and a hard disk drive of 6.5 TB. We conduct experiments with Python 3.6.2 under Ubuntu 20.04.
7
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
4.2. Dataset Construction
we
We exploit contracts from the SmartEmbed dataset [13]. Such dataset is made of 5000 verified smart contracts
published with source code on the Ethereum Mainnet.
Using our parser, we process the contract source code and extract the set of function definitions. We obtain a
complete dataset of 101,082 functions. we label our functions using existing tools [11] and the result is as follows:
87,641 functions are classified as non-vulnerable under the class 0, and 13,441 functions are classified as vulnerable
under the class 1. Such statistics translate into class imbalance between class 0 and class 1.
We solve the class imbalance issue at the SC layer and propagate the effects over the other layers. To achieve our
ie
objective, we evaluate several class imbalance resolution approaches summarized in Table 1 in which W represents
the weight associated to a class. First, we simply upsample and downsample examples from both classes with no
particular technique in mind. Second, we follow the inverse number of samples (INS) approach. Third, we implement
ev
the effective sample number weighting (ENS) technique. Fourth, we adopt the inverse square root of the number of
samples (ISNS) method. Fifth, we experiment with the synthetic minority oversampling technique (SMOTE).
We leverage a random forest AI model as our baseline construction to solve the imbalance issue. For every
approach, the testing set takes 20% of the dataset. We achieve the best results under SMOTE. We undersample class
0 with a factor of 28.5251% and we upsample class 1 to 25 000 samples. We propagate the SMOTE technique to
r
address class imbalance issues at the BB and the EVMB layers respectively.
Following we propose the questions committed to provide white-box knowledge, and give corresponding answers
based on analysis of the experimental results. Among them, RQ1 and RQ2 are for intramodal settings, and RQ3 to
RQ6 are for intermodal settings. Besides, We embolden each column-wise tuple that holds the highest performance
result in tables below. Such a move aims to ease the interpretability of data.
a) RQ1: What strategy yields highest detection results in SC, BB and EVMB separately? Is our intermodal
framework outperform state-of-the-art methods?
Regarding the SC layer, we leverage several state-of-the-art AI models: LSTM, VanillaRNN [30], RF, textCNN
and gated recurrent unit (GRU) [31] for training and testing over the balanced dataset attached to the layer. We com-
pare such state-of-art models with vulnerability detection tasks from task 1 to task 12 and Table 2 portrays the com-
ep
parison results. We observe that for selecting (SC-W2V+SC-Bert) features, strategy 7 (MP | stack | (bi-LSTM+self-
attention) | RF) outperforms existing state of the art models and aligns as the best strategy for vulnerability detection
at the SC layer.
As with the BB layer, we hook on the AMEVulDetector model [34], which promotes feature fusion through a
cross-attention layer. We further implement two types of untrained GCNs: GCN1 which produces embeddings BB-
Pr
CFG1 under the Keras library, and GCN2 realized under the Stellargraph framework, which outputs embeddings BB-
CFG2. The key difference between GCN1 and GCN2 stems from the observation that embeddings from GCN1 reveal
8
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
Table 2: SC layer performance comparison
we
Methods Acc(%) Recall(%) Precision(%) F1(%) Methods Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 98.04 97.89 98.20 98.04 MP|stack|bi-LSTM|textCNN 97.62 97.30 97.96 97.62
MP|concat|bi-LSTM|RF 97.40 96.76 98.08 97.40 MP|concat|bi-LSTM|textCNN 97.36 96.91 97.84 97.36
MP|stack|RNN[30]|RF 95.88 94.31 97.80 95.88 MP|stack|RNN[30]|textCNN 95.44 94.09 97.13 95.44
MP|stack|GRU[31]|RF 95.82 94.44 97.52 95.82 MP|stack|GRU[31]|textCNN 95.10 94.36 96.10 95.10
Dense|stack|bi-LSTM|RF 97.62 96.74 98.56 97.62 Dense|stack|bi-LSTM|textCNN 97.66 96.74 98.64 97.66
ie
Dense|concat|bi-LSTM|RF 97.48 97.52 97.44 97.48 Dense|concat|bi-LSTM|textCNN 97.43 97.06 97.84 97.44
SPP|stack|bi-LSTM|RF 96.94 96.51 97.40 96.94 SPP|stack|bi-LSTM|textCNN 96.67 96.82 96.48 96.67
SPP|concat|bi-LSTM|RF 97.08 96.52 97.67 97.08 SPP|concat|bi-LSTM|textCNN 95.92 96.67 95.12 95.92
ev
a very large sparse matrix, while embeddings from GCN2 reveal a lesser sparse matrix than GCN1. We hypothesize
such sparse matrix may result from consequent loss of information during GCN1 generation. We prioritize BB-CFG2
to increase the upshot and exhibit experimental results in Table 3. We conclude that for selecting (SSA-W2V+SSA-
Bert+BB-CFG) features, strategy 7 ( MP | stack | (bi-LSTM+self-attention) | RF) yields the highest performance
r
at the BB layer over AMEVulDetector model [34] and the range of all the strategies of BB layer.
Methods
Acc(%)
er
Recall(%)
BB-CFG1
Precision(%) F1(%) Acc(%)
BB-CFG2
Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 97.80 96.72 98.96 97.80 98.27 97.56 99.01 98.27
MP|stack|bi-LSTM|textCNN 97.33 96.20 98.56 97.33 98.08 97.40 98.80 98.08
pe
MP|concat|bi-LSTM|RF 97.44 96.47 98.48 97.44 98.24 97.33 99.20 98.24
MP|concat|bi-LSTM|textCNN 97.60 96.56 98.72 97.60 98.08 97.77 98.40 98.08
Dense|stack|bi-LSTM|RF 97.59 96.48 98.77 97.59 98.13 97.35 98.96 98.13
Dense|stack|bi-LSTM|textCNN 97.24 95.95 98.64 97.24 98.13 97.33 98.99 98.13
Dense|concat|bi-LSTM|RF 97.76 97.01 98.56 97.76 97.88 96.72 99.12 97.88
Dense|concat|bi-LSTM|textCNN 97.08 96.30 97.92 97.08 97.36 96.69 98.08 97.36
ot
As regards the EVMB layer, we cling to the deep graph convolutional neural network (DGCNN) model [35]
delivering embeddings EVMB-DGCNN. We reintroduce the above-mentioned GCN1, which produces embeddings
EVMB-CFG1, and GCN2, which outputs embeddings EVMB-CFG2. We analyse the results in Table 4 and make
the following conclusion based on EVMB-CFG2 that offers higher detection effectivity: strategy 9 ( Dense | stack |
rin
(bi-LSTM+self-attention) | RF) performs better than DGCNN [35] and the range from strategy 1 to strategy 12 at
EVMB layer.
We depict in Fig. 4 the receiver operator characteristic (ROC) curves to support our findings regarding optimal
strategies in SC, BB and EVMB layers.
b) RQ2: What modality leads to better detection performance among SC, BB and EVMB?
ep
Based on empirical evidence from Table 2, Table 3, and Table 4, we conclude the following under intramodal
settings. Among SC, BB and EVMB, BB leads smart contact vulnerability assessment, followed by SC performance-
wise, while EVMB offers the least performance upshot.
a) RQ3: What strategy upholds highest detection results in (SC + BB), (SC + EVMB), (BB + EVMB), and
(SC + BB + EVMB) combinations separately?
9
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
we
Table 4: EVMB layer performance comparison
EVMB-DGCNN[35] EVMB-CFG1 EVMB-CFG2
Methods
Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 93.48 91.84 95.44 93.48 93.92 93.03 94.96 93.92 95.16 93.93 96.56 95.16
MP|stack|bi-LSTM|textCNN 93.16 91.53 95.12 93.16 93.38 92.61 94.28 93.38 94.60 93.93 95.36 94.60
MP|concat|bi-LSTM|RF 93.84 92.41 95.52 93.84 93.68 92.72 94.80 93.68 95.08 93.85 96.48 95.08
ie
MP|concat|bi-LSTM|textCNN 94.10 92.73 95.76 94.20 93.88 92.88 95.04 93.88 95.24 94.84 95.68 95.24
Dense|stack|bi-LSTM|RF 94.10 92.42 96.08 94.10 94.44 93.57 95.44 94.44 95.64 94.96 96.40 95.64
Dense|stack|bi-LSTM|textCNN 93.32 92.54 94.24 93.32 94.40 92.96 96.08 94.40 95.34 93.92 96.96 95.34
Dense|concat|bi-LSTM|RF 94.08 92.29 96.20 94.08 93.96 91.98 96.32 93.96 95.20 94.35 96.16 95.20
ev
Dense|concat|bi-LSTM|textCNN 93.72 93.07 94.48 93.72 93.87 92.68 95.28 93.88 94.72 92.74 97.04 94.72
SPP|stack|bi-LSTM|RF 91.24 90.69 91.92 91.24 92.56 91.92 93.32 92.56 93.76 93.13 94.48 93.76
SPP|stack|bi-LSTM|textCNN 90.50 91.62 89.16 90.50 91.32 90.13 92.80 91.32 93.56 93.11 94.08 93.56
SPP|concat|bi-LSTM|RF 91.24 90.24 92.48 91.24 91.48 90.99 92.08 91.48 94.36 93.42 95.44 94.36
SPP|concat|bi-LSTM|textCNN 91.56 89.20 94.56 91.57 91.72 90.65 93.04 91.72 93.36 94.53 92.56 93.60
r
ROC curve
er ROC curve
1.00 1.00
0.98 0.98
pe
0.96 0.96
MP|stack|bi-LSTM|RF MP|stack|bi-LSTM|RF
True positive rate
ROC curve
1.00
0.95
0.90 MP|stack|bi-LSTM|RF
True positive rate
MP|stack|bi-LSTM|textCNN
rin
0.85 MP|concat|bi-LSTM|RF
MP|concat|bi-LSTM|textCNN
Dense|stack|bi-LSTM|RF
0.80 Dense|stack|bi-LSTM|textCNN
Dense|concat|bi-LSTM|RF
Dense|concat|bi-LSTM|textCNN
0.75 SPP|stack|bi-LSTM|RF
SPP|stack|bi-LSTM|textCNN
0.70 SPP|concat|bi-LSTM|RF
SPP|concat|bi-LSTM|textCNN
ep
(c) EVMB-ROC
10
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
Regarding SC+BB combination, we exploit GCN1 and GCN2 from the BB layer as detailed in Answer to RQ1.
We weight the combination SC+BB-CFG2, which outputs higher results. We conclude based on Table 5 and Table
we
6 that for (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) feature selecting, strategy 9 ( Dense | stack | (bi-
LSTM+self-attention) | RF) realizes better vulnerability detection within the range from 12 strategies.
As for SC+EMVB combination, we call upon GCN1 and GCN2 from the EVMB layer as illustrated in Answer
to RQ1. We spotlight the SC+EVMB-CFG2 combination, which delivers increased outcome. We conclude based
on Table 5 and Table 6 that for (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) feature selection, strategy 10 (
Dense | stack | (bi-LSTM+self-attention) | textCNN) achieves better vulnerability detection within the range from
ie
all strategies.
In terms of BB+EVMB combination, We emphasize on BB+EVMB-CFG2 combination that renders superior
outcome, and conclude the following. For selecting (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM)
ev
features, strategy 8 MP | stack | (bi-LSTM+self-attention) | textCNN) yields higher performance within the range
from all strategies, based on Table 5 and Table 6.
r
Methods
Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 99.11 98.92 99.31 99.11 98.38 98.09 98.68 98.38 98.39 97.79 99.02 98.39
MP|stack|bi-LSTM|textCNN 99.04 99.04 99.04er 99.04 98.24 97.33 99.20 98.24 98.62 97.99 99.28 98.62
MP|concat|bi-LSTM|RF 99.16 98.81 99.52 99.16 98.24 97.86 98.64 98.24 98.60 98.33 98.88 98.60
MP|concat|bi-LSTM|textCNN 99.04 98.96 99.12 99.04 98.08 98.16 98.00 98.08 98.80 98.41 99.20 98.80
Dense|stack|bi-LSTM|RF 99.15 98.65 99.66 99.15 98.55 98.25 98.72 98.48 98.54 97.96 99.14 98.54
Dense|stack|bi-LSTM|textCNN 98.98 98.62 99.36 98.98 98.22 98.28 98.16 98.22 98.35 97.79 98.94 98.35
pe
Dense|concat|bi-LSTM|RF 98.64 98.33 98.96 98.64 98.44 98.09 98.80 98.44 97.92 97.92 97.92 97.92
Dense|concat|bi-LSTM|textCNN 98.76 98.11 99.44 98.76 98.28 97.93 98.64 98.28 98.12 97.85 98.40 98.12
SPP|stack|bi-LSTM|RF 99.00 98.50 99.52 99.00 98.18 98.16 98.20 98.18 98.27 98.00 98.56 98.27
SPP|stack|bi-LSTM|textCNN 98.93 98.97 98.90 98.93 98.36 98.01 98.72 98.36 98.49 97.66 99.36 98.49
SPP|concat|bi-LSTM|RF 99.00 98.80 99.20 99.00 97.88 97.38 98.40 97.88 98.20 98.01 98.40 98.20
SPP|concat|bi-LSTM|textCNN 99.04 98.65 99.44 99.04 97.92 98.54 97.28 97.92 98.12 97.55 98.72 98.12
ot
Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 99.34 98.86 99.84 99.34 98.65 98.02 98.80 98.65 98.87 98.52 99.23 98.87
MP|stack|bi-LSTM|textCNN 99.07 98.59 99.57 99.07 98.53 98.16 98.32 98.53 99.23 98.97 99.50 99.23
MP|concat|bi-LSTM|RF 99.38 99.28 99.76 99.52 98.52 98.24 98.80 98.52 98.88 98.57 99.20 98.88
MP|concat|bi-LSTM|textCNN 99.20 98.89 99.52 99.20 98.52 98.41 98.96 98.52 98.84 98.49 99.20 98.84
rin
Dense|stack|bi-LSTM|RF 99.39 98.89 99.90 99.39 98.74 98.28 98.26 98.74 99.01 98.67 99.36 99.01
Dense|stack|bi-LSTM|textCNN 99.37 99.12 99.62 99.37 98.81 98.29 99.34 98.81 98.92 98.49 99.36 98.92
Dense|concat|bi-LSTM|RF 99.00 98.73 99.28 99.00 98.32 98.32 98.32 98.32 98.60 98.41 98.80 98.60
Dense|concat|bi-LSTM|textCNN 99.04 98.73 99.36 99.04 98.20 97.78 98.64 98.20 98.80 98.72 98.88 98.80
SPP|stack|bi-LSTM|RF 99.28 99.04 99.52 99.28 98.31 98.20 98.42 98.31 98.95 98.79 99.12 98.95
SPP|stack|bi-LSTM|textCNN 99.23 99.18 99.28 99.23 98.44 97.86 99.04 98.44 99.18 99.00 99.36 99.18
ep
SPP|concat|bi-LSTM|RF 99.24 98.73 99.76 99.24 98.52 98.40 98.64 98.52 99.08 98.96 99.20 99.08
SPP|concat|bi-LSTM|textCNN 98.84 98.26 99.44 98.84 98.12 98.15 98.08 98.12 99.00 98.57 99.44 99.00
In the case of SC+BB+EVMB three-by-three intermodal combination, we leverage GCN1 and GCN2. We com-
bine (SC+BB) with EVMB-CFG2 instead of EVMB-CFG1, to enhance results. Based on Table 7, We conclude
Pr
11
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
Dense | stack | (bi-LSTM+self-attention) | textCNN) reaches the highest performance within the range from strategy
1 to strategy 12.
we
Table 7: Three-by-three intermodal fusion performance
SC+BB+EVMB-CFG1 SC+BB+EVMB-CFG2
Methods
Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 99.37 99.15 99.60 99.37 99.49 99.08 99.90 99.49
ie
MP|stack|bi-LSTM|textCNN 99.37 98.96 99.78 99.37 99.57 99.23 99.92 99.57
MP|concat|bi-LSTM|RF 99.40 99.20 99.60 99.40 99.40 99.12 99.68 99.40
MP|concat|bi-LSTM|textCNN 99.32 99.20 99.44 99.32 99.52 99.20 99.84 99.52
Dense|stack|bi-LSTM|RF 99.59 99.26 99.92 99.59 99.70 99.39 99.98 99.68
ev
Dense|stack|bi-LSTM|textCNN 99.56 99.28 99.84 99.56 99.71 99.36 99.92 99.64
Dense|concat|bi-LSTM|RF 99.16 99.04 99.28 99.16 99.28 98.89 99.68 99.28
Dense|concat|bi-LSTM|textCNN 99.12 98.58 99.68 99.12 99.28 99.20 99.36 99.28
SPP|stack|bi-LSTM|RF 99.05 98.64 99.49 99.05 99.50 99.36 99.64 99.50
SPP|stack|bi-LSTM|textCNN 99.44 99.22 99.68 99.44 99.28 99.04 99.52 99.28
r
SPP|concat|bi-LSTM|RF 99.20 98.66 99.76 99.20 99.44 99.28 99.60 99.44
SPP|concat|bi-LSTM|textCNN 99.24 99.04 99.44 99.24 99.28 99.20 99.36 99.28
er
b) RQ4: Which two-by-two intermodal combination yields the best detection performance towards vulner-
abilities in smart contracts?
Based on empirical evidence from Table 5 and Table 6, we conclude the following. (SC+BB) leads to higher
performances, followed by (BB+EVMB), while (SC+EVMB) yields the least detection performance in two-by-two
pe
intramodal settings.
c) RQ5: What approach appeals more between two-by-two and three-by-three intermodal settings?
We exploit the results from Table 5, Table 6 and Table 7. We observe that the three-by-three intermodal settings
achieve better vulnerability detection performance than the two-by-two settings.
d) RQ6: What settings deliver better results between intramodal and intermodal settings?
From the lines above, we learnt that three-by-three settings deliver better results than two-by-two intermodal
settings. We infer from Table 2, Table 3, Table 4, Table 5 and Table 6 that two-by-two intermodal settings outperform
ot
the intramodal configuration. We conclude that intermodal settings outperform intramodal settings in vulnerability
detection performance.
Fig. 5 illustrates the receiver operator characteristic (ROC) curves to weight our conclusions towards optimal
strategies for vulnerability detection in two-by-two and three-by-three intermodal settings.
tn
Our framework provides strong white-box knowledge for intramodal, two-by-two and three-by-three intermodal
feature selection, and achieves higher vulnerability detection performance compared to existing methods.
5. Discussion
ep
12
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
we
ROC curve ROC curve
1.00 1.00
0.99
0.98
0.98
MP|stack|bi-LSTM|RF MP|stack|bi-LSTM|RF
ie
0.97
True positive rate
ev
SPP|stack|bi-LSTM|RF 0.92 SPP|stack|bi-LSTM|RF
SPP|stack|bi-LSTM|textCNN SPP|stack|bi-LSTM|textCNN
0.93 SPP|concat|bi-LSTM|RF SPP|concat|bi-LSTM|RF
SPP|concat|bi-LSTM|textCNN SPP|concat|bi-LSTM|textCNN
0.92 0.90
0.000 0.005 0.010 0.015 0.020 0.025 0.00 0.01 0.02 0.03 0.04 0.05
False positive rate False positive rate
r
ROC curve ROC curve
1.00 1.00
0.98
MP|stack|bi-LSTM|RF
er 0.99
MP|stack|bi-LSTM|RF
True positive rate
13
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
5.2. Flexibility in Feature Extraction
we
We notice that many published contracts lack their source code. Penetration testers in most cases have to deal
solely with contract bytecode, which increases the complexity of the analysis. We found that to concentrate on
the EVMB layer leads poor detection performance, and some other type of information from SC or BB layers are
necessary to improve the experience. Our framework can process contracts published with our without their source
code.
ie
Our architecture is limited by implementation rather than the choice of design. First, we think adopting the
majority of AI techniques is impractical and we investigate few AI models to design our methodology. Second, Our
work relies on the word2vec NLP technique and lacks support for out-of-vocabulary words. We propose to replace
ev
word2vec with the fastText NLP model [36] to solve such limitation.
6. Related Works
In this section, we revisit the issue of vulnerability detection in smart contracts. Our content is twofold. First,
r
we investigate the literature on smart contract vulnerability detection under rule-based static analysis. Second, we
re-explore existing data-driven approaches for smart contract vulnerability mining.
er
6.1. Rule-based approach for vulnerability detection under static analysis
To formalize the research on vulnerabilities in smart contracts, the work of Atzei et al. [9] proposed a taxonomy
of smart contract vulnerabilities. Following such an initiative, Argañaraz et al. [10] opted for the use of static
analysis over contract source codes to extract both functional and security vulnerabilities. In their work, the authors
pe
formulate some expert-based rules towards the programming language of interest. Gao et al. [13] designed the
SmartEmbed vulnerability analysis tool to detect clones, bugs, and to validate the vulnerability-freeness of a contract.
Moreover, Gao et al. [13] outlined major disadvantage of relying on expert-based rules. It is cumbersome to keep
up with the attack surface and the sophistication of attacks. Furthermore, Gao et al. [13] vowed for the use of static
analysis over more reliable but expensive techniques such as dynamic symbolic execution [7][16][17] and dynamic
analysis [14][15]. The authors in [11] designed Slither, a fantastic and incremental static analysis based vulnerability
detection framework for solidity written smart contracts. Slither can accommodate new vulnerability detectors aiming
ot
To keep pace with the always evolving smart contracts attack surface, the authors in [37] advocate the usage of
data-driven techniques, such as machine learning, for discovering vulnerabilities in smart contracts. The interesting
work of Eth2vec [25] applies unsupervised machine learning techniques to built-based features and EVM bytecode
extracted features. It aims to automatically cluster buggy contracts as well as cloned contracts. Although Eth2vec is
an innovative work, it only supports contracts present in the training dataset and lack the use of supervise learning.
The authors in [38] further advocate the use of supervised classification, which leads to higher detection results
rin
than unsupervised classification in natural language processing . Further to support AI-based vulnerability detection,
Teng et al. [39] introduce a static time-slicing source-code based protocol based on long short term memory (LSTM)
model, to fetch contracts which behave differently from the initial one reported on the Ethereum dapps website. The
work of [39] requires to manually extract the features from data in order to characterize the dataset. The compelling
work of Qian et al. [17] leverages a bidirectional-LSTM with an attention mechanism over various embedding vector
ep
dimensions. It aims to uncover the embedding vector dimension that yields the best vulnerability detection results.
Moreover, compared to state-of-the art AI models such as Vanilla-recurrent neural network (RNN), LSTM, and bi-
LSTM without self-attention, the work of Qian et al. [17] leads to higher performances.
The work of Liu et al. [34] designs a smart contract vulnerability detection framework that fuses features from
SC and BB layers in black box settings. Such a framework exploits an attentive multi-encoder network comprising
Pr
self-attention and cross-attention layers to detect vulnerabilities and to provide feature importance through weight in-
terpretability. Although, being such a promising piece of research, the work of Liu et al. [34] exhibits two drawbacks.
14
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
First it still builds upon expert-based rules and inherits the weaknesses associated to rule-based systems. Second, it
only supports smart contracts published with source code. It therefore lacks support for the EVM bytecode processing,
we
which is an important limitation.
7. Conclusion
In this paper, we design a novel framework that detects vulnerabilities in Ethereum-based solidity written smart
ie
contracts. To that end, We leverage three modalities of interest. First, features extracted from the contract source
code (SC layer). Second, features acquired during the contract compilation (BB layer). Third, features obtained
from the contract bytecode processing (EVMB layer). Our work is different from existing schemes that leverage
intramodal or two-by-two intermodal settings under a black-box approach. We propose the following innovations.
ev
First, we dismiss expert-based patterns, manual feature fusion and leverage AI automation. Second, we define a
theorethical methodology based on multiple supervised detection tasks in multimodal learning. We further evaluate
such tasks on real datasets with several state-of-the-art AI models. Third, we provide developers and researchers
with succinct white-box knowledge to achieve high performing vulnerability detection in intramodal and intermodal
settings. Our framework tolerates the absence of one or two modalities and supports contracts published without
r
source code. We empirically found that under intramodal settings, BB provides the best performances followed by
SC, with at last, EVMB. Under two-by-two intramodal settings, SC+BB performs better, followed by BB+EVMB
and finally SC+EVMB. Furthermore, we conclude that three-by-three intermodal settings outperform two-by-two
er
intermodal settings and two-by-two intermodal settings outdo intramodal settings. Although our scheme limits the
number AI models used, it is impractical to consider all AI models for vulnerability training and inference. We believe
our framework can advance the field of research towards smart contract vulnerability detection. As future works, we
aim to solve the out-of-vocabulary issue in our framework, and dive into feature importance.
pe
Appendix A. Vulnerability detection strategies
The lines below describe the multiple supervised tasks of interest that combine the hierarchy of fused features
together with state-of-the-art AI models under intramodal and intermodal settings.
a) SC layer
At the SC layer, we select SC-W2V and SC-Bert as input features. We proceed with dimension unification with SPP,
MP and Dense separately. Then we fuse features on the one hand with concat and on the other hand with stack. To
train and evaluate the vulnerability detection, we proceed as follows. First, we adopt a bi-LSTM model with self
tn
attention combined with a random forest (RF) model. Second, we combine a self-attentive bi-LSTM with textCNN
model.
We investigate twelve tasks of interest at the SC layer, and we denote | as the pipeline symbol.
15
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
• task 9: (SC-W2V+SC-Bert) | Dense | stack | (bi-LSTM+self-attention) | RF.
we
• task 10: (SC-W2V+SC-Bert) | Dense | stack | (bi-LSTM+self-attention) | textCNN.
• task 11: (SC-W2V+SC-Bert) | Dense | concat | (bi-LSTM+self-attention) | textCNN.
• task 12: (SC-W2V+SC-Bert) | Dense | concat | (bi-LSTM+self-attention) | RF.
b) BB layer
ie
Regarding the BB layer, we select SSA-W2V, SSA-Bert and BB-CFG as input features. We implement dimension
unification using SPP, MP and Dense separately. Then we fuse features on the one hand with concat and on the other
hand with stack. For model training and inference, first, we adopt a bi-LSTM model with self attention combined
with a random forest (RF) model. Second, we combine a self-attentive bi-LSTM with textCNN model. We inspect
ev
twelve tasks of interest at the BB layer and we signify | as the pipeline symbol.
r
• task 15: (SSA-W2V+SSA-Bert+BB-CFG) | MP | stack | (bi-LSTM+self-attention) | textCNN.
• task 16: (SSA-W2V+SSA-Bert+BB-CFG) | MP | stack | (bi-LSTM+self-attention) | RF.
er
• task 17: (SSA-W2V+SSA-Bert+BB-CFG) | Dense | concat | (bi-LSTM+self-attention) | textCNN.
• task 18: (SSA-W2V+SSA-Bert+BB-CFG) | Dense | concat | (bi-LSTM+self-attention) | RF.
• task 19: (SSA-W2V+SSA-Bert+BB-CFG) | Dense | stack | (bi-LSTM+self-attention) | textCNN.
pe
• task 20: (SSA-W2V+SSA-Bert+BB-CFG) | Dense | stack | (bi-LSTM+self-attention) | RF.
• task 21: (SSA-W2V+SSA-Bert+BB-CFG) | SPP | concat | (bi-LSTM+self-attention) | RF.
• task 22: (SSA-W2V+SSA-Bert+BB-CFG) | SPP | concat | (bi-LSTM+self-attention) | textCNN.
• task 23: (SSA-W2V+SSA-Bert+BB-CFG) | SPP | Dense | (bi-LSTM+self-attention) | RF.
ot
c) EVMB layer
tn
Concerning the EVMB layer, we adopt EVMB-CFG and EVMB-ASM as input features. We leverage SPP, MP and
Dense separately to unify dimension. We fuse features on the one hand with concat and on the other hand with stack.
For model training and testing, first, we implement a bi-LSTM model with self attention combined with a random
forest (RF) model. Second, we incorporate a self-attentive bi-LSTM with textCNN model.
We evaluate twelve tasks of interest at the EVMB layer. The pipeline symbol | represents the process flow from
rin
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
• task 32: (EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention) | RF.
we
• task 33: (EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention) | textCNN.
• task 34: (EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention) | RF.
ie
Appendix A.2. Strategies under intermodal settings
a) SC+BB Combination
Regarding SC+BB combination, we set SC-W2V, SC-Bert, SSA-W2V, SSA-Bert and BB-CFG as input features. We
ev
unify feature dimension with MP, SPP and Dense separately. We fuse features on the one hand with concat and on
the other hand with stack. To train and evaluate AI models for vulnerability detection, we proceed as follows. First,
we deploy a bi-LSTM model with self attention combined with a random forest (RF) model. Second, we link a
self-attentive bi-LSTM with textCNN model.
We examine twelve tasks of interest under SC+BB combination.
r
• task 37: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | MP | concat | (bi-LSTM+self-attention) |
textCNN. er
• task 38: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | MP | concat | (bi-LSTM+self-attention) | RF.
• task 39: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | MP | stack | (bi-LSTM+self-attention) | textCNN.
pe
• task 40: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | MP | stack | (bi-LSTM+self-attention) | RF.
RF.
Regarding SC+EVMB combination, we fix SC-W2V, SC-Bert, EVMB-CFG, and EVMB-ASM as input features. We
apply dimension unification using MP, SPP and Dense separately. We fuse features with concat and stack respectively.
For AI models training and inference, first, we choose a bi-LSTM model with self attention combined with a random
forest (RF) model. Second, we adopt a self-attentive bi-LSTM with textCNN model.
We evaluate twelve tasks of interest under SC + EVMB combination.
Pr
17
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
• task 50: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | MP | concat | (bi-LSTM+self-attention) | RF.
we
• task 51: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-LSTM+self-attention) | textCNN.
• task 52: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-LSTM+self-attention) | RF.
ie
• task 55: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention) | textCNN.
• task 56: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention) | RF.
ev
• task 57: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention) | textCNN.
• task 58: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention) | RF.
• task 59: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention) | textCNN.
r
• task 60: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention) | RF.
c) BB+EVMB Combination er
As of BB+EVMB combination, we define SSA-W2V, SSA-Bert, BB-CFG, EVMB-CFG, and EVMB-ASM as input
features. We enforce dimension unification with SPP, MP and Dense separately. We fuse features with concat and
stack respectively. To support vulnerability detection under multimodal learning, first, we integrate a bi-LSTM model
with self attention combined with a random forest (RF) model. Second, we adopt a self-attentive bi-LSTM with
pe
textCNN model.
We explore twelve tasks of interest under the BB + EVMB combination.
• task 61: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | concat | (bi-LSTM+self-attention)
| textCNN.
| RF.
• task 65: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | concat | (bi-LSTM+self-
attention) | textCNN.
• task 66: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | concat | (bi-LSTM+self-
rin
attention) | RF.
• task 67: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention)
| textCNN.
ep
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
• task 71: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention)
| textCNN.
we
• task 72: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention)
| RF.
d) SC+BB+EVMB Combination
Concerning SC+BB+EVMB full combination of layers, we set SC-W2V, SC-Bert, SSA-W2V, SSA-Bert, BB-CFG,
ie
EVMB-CFG, and EVMB-ASM as input features. We implement dimension unification with SPP, MP and Dense
separately, and fuse features with concat and stack respectively. For vulnerability detection under multimodal learning,
first, we deploy a bi-LSTM model with self attention combined with a random forest (RF) model. Second, we endorse
a self-attentive bi-LSTM with textCNN model.
ev
We explore twelve tasks of interest under SC+BB+EVMB combination.
• task 73: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | concat |
(bi-LSTM+self-attention) | textCNN.
• task 74: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | concat |
r
(bi-LSTM+self-attention) | RF.
• task 75: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-
er
LSTM+self-attention) | textCNN.
• task 76: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-
LSTM+self-attention) | RF.
pe
• task 77: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | concat
| (bi-LSTM+self-attention) | textCNN.
• task 78: (SC-W2V + SC-Bert + SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | concat
| (bi-LSTM+self-attention) | RF.
(bi-LSTM+self-attention) | textCNN.
• task 80: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | stack |
(bi-LSTM+self-attention) | RF.
tn
References
[1] S. Liu, Global spending on blockchain solutions 2024 — statista, 2020. https://www.statista.com/statistics/800426/
worldwide-blockchain-solutions-spending.
[2] N. Szabo, Smart contracts : Building blocks for digital markets, 2018.
Pr
19
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
[5] M. Bartoletti, L. Pompianu, An empirical analysis of smart contracts: platforms, applications, and design patterns, in: International con-
ference on financial cryptography and data security, Springer, 2017, pp. 494–509. https://link.springer.com/chapter/10.1007/
we
978-3-319-70278-0_31.
[6] K. Delmolino, M. Arnett, A. Kosba, A. Miller, E. Shi, Step by step towards creating a safe smart contract: Lessons and insights from a
cryptocurrency lab, in: International conference on financial cryptography and data security, Springer, 2016, pp. 79–94. https://eprint.
iacr.org/2015/460.pdf.
[7] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, A. Hobor, Making smart contracts smarter, in: Proceedings of the 2016 ACM SIGSAC conference
on computer and communications security, 2016, pp. 254–269. https://dl.acm.org/doi/pdf/10.1145/2976749.2978309.
[8] T. Zimmermann, N. Nagappan, L. Williams, Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista,
ie
in: 2010 Third international conference on software testing, verification and validation, IEEE, 2010, pp. 421–428. https://ieeexplore.
ieee.org/abstract/document/5477059/.
[9] N. Atzei, M. Bartoletti, T. Cimoli, A survey of attacks on ethereum smart contracts (sok), in: International conference on principles of
security and trust, Springer, 2017, pp. 164–186. https://link.springer.com/chapter/10.1007/978-3-662-54455-6_8.
[10] M. Argañaraz, M. Berón, M. J. Pereira, P. Henriques, Detection of vulnerabilities in smart contracts specifications in ethereum platforms,
ev
in: 9th Symposium on Languages, Applications and Technologies (SLATE 2020), volume 83, Schloss Dagstuhl–Leibniz-Zentrum fuer
Informatik, 2020, pp. 1–16. https://bibliotecadigital.ipb.pt/bitstream/10198/22794/1/OASIcs-SLATE-2020-2.pdf.
[11] J. Feist, G. Grieco, A. Groce, Slither: a static analysis framework for smart contracts, in: 2019 IEEE/ACM 2nd International Workshop on
Emerging Trends in Software Engineering for Blockchain (WETSEB), IEEE, 2019, pp. 8–15. https://arxiv.org/pdf/1908.09878.
pdf.
[12] S. Kalra, S. Goel, M. Dhawan, S. Sharma, Zeus: analyzing safety of smart contracts., in: Ndss, 2018, pp. 1–12. http://pages.cpsc.
ucalgary.ca/~joel.reardon/blockchain/readings/ndss2018_09-1_Kalra_paper.pdf.
r
[13] Z. Gao, L. Jiang, X. Xia, D. Lo, J. Grundy, Checking smart contracts with structural code embedding, IEEE Transactions on Software
Engineering (2020). https://arxiv.org/pdf/2001.07125.pdf.
[14] Mythril: Security analysis tool for evm bytecode, 2018. https://github.com/ConsenSys/mythril.
er
[15] J. Krupp, C. Rossow, teEther: Gnawing at ethereum to automatically exploit smart contracts, in: 27th USENIX Security Symposium
(USENIX Security 18), USENIX Association, 2018, pp. 1317–1333. https://www.usenix.org/conference/usenixsecurity18/
presentation/krupp.
[16] I. Nikolić, A. Kolluri, I. Sergey, P. Saxena, A. Hobor, Finding the greedy, prodigal, and suicidal contracts at scale, in: Proceedings of the 34th
annual computer security applications conference, 2018, pp. 653–663. https://dl.acm.org/doi/pdf/10.1145/3274694.3274743.
[17] P. Qian, Z. Liu, Q. He, R. Zimmermann, X. Wang, Towards automated reentrancy detection for smart contracts based on sequential models,
pe
IEEE Access 8 (2020) 19685–19695. https://ieeexplore.ieee.org/abstract/document/8970384/.
[18] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, B. Murphy, Cross-project defect prediction: a large scale experiment on data vs. domain
vs. process, in: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium
on The foundations of software engineering, 2009, pp. 91–100. https://dl.acm.org/doi/pdf/10.1145/1595696.1595713.
[19] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A. Y. Ng, Multimodal deep learning, in: ICML, 2011. https://icml.cc/2011/papers/
399_icmlpaper.pdf.
[20] Y. Dai, F. Gieseke, S. Oehmcke, Y. Wu, K. Barnard, Attentional feature fusion, in: Proceedings of the IEEE/CVF Winter Conference
on Applications of Computer Vision, 2021, pp. 3560–3569. https://openaccess.thecvf.com/content/WACV2021/papers/Dai_
ot
Attentional_Feature_Fusion_WACV_2021_paper.pdf.
[21] L. Dai, F. Gao, R. Li, J. Yu, X. Shen, H. Xiong, W. Wu, Gated fusion of discriminant features for caricature recognition, in: International
Conference on Intelligent Science and Big Data Engineering, Springer, 2019, pp. 563–573. https://link.springer.com/chapter/10.
1007/978-3-030-36189-1_47.
[22] H. Zhou, Z. Fang, Y. Gao, B. Huang, C. Zhong, R. Shang, Feature fusion network based on attention mechanism for 3d semantic segmen-
tn
tation of point clouds, Pattern Recognition Letters 133 (2020) 327–333. https://www.sciencedirect.com/science/article/pii/
S0167865520300994.
[23] Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: International conference on machine learning, PMLR, 2014,
pp. 1188–1196. http://proceedings.mlr.press/v32/le14.pdf.
[24] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding,
in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 4171–4186. https:
rin
//aclanthology.org/N19-1423.
[25] N. Ashizawa, N. Yanai, J. P. Cruz, S. Okamura, Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum
smart contracts, in: Proceedings of the 3rd ACM International Symposium on Blockchain and Secure Critical Infrastructure, 2021, pp. 47–59.
https://dl.acm.org/doi/pdf/10.1145/3457337.3457841.
[26] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE trans-
actions on pattern analysis and machine intelligence 37 (2015) 1904–1916. https://link.springer.com/content/pdf/10.1007/
ep
978-3-319-10578-9_23.pdf.
[27] X. Ouyang, K. Gu, P. Zhou, Spatial pyramid pooling mechanism in 3d convolutional network for sentence-level classification,
IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (2018) 2167–2179. https://ieeexplore.ieee.org/
abstract/document/8413124/.
[28] N. Dong, Q. Feng, M. Zhai, J. Chang, X. Mai, A novel feature fusion based deep learning framework for white blood cell classifi-
cation, Journal of Ambient Intelligence and Humanized Computing (2022) 1–13. https://link.springer.com/article/10.1007/
s12652-021-03642-7.
Pr
[29] Z. Zhang, Z. Tang, Y. Wang, Z. Zhang, C. Zhan, Z. Zha, M. Wang, Dense residual network: Enhancing global dense feature flow for character
recognition, Neural Networks 139 (2021) 77–85. https://www.sciencedirect.com/science/article/pii/S0893608021000472.
20
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
[30] C. Olah, S. Carter, Attention and augmented recurrent neural networks, Distill 1 (2016) e1. https://distill.pub/2016/
augmented-rnns/?spm=a2c4e.11153940.blogcont640631.83.666325f4P1sc03.
we
[31] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint
arXiv:1412.3555 (2014). https://arxiv.org/abs/1412.3555.
[32] T. Parr, The definitive antlr 4 reference, The Definitive ANTLR 4 Reference (2013) 1–326. https://www.torrossa.com/en/resources/
an/5241753.
[33] F. Bond, Solidity grammar for antlr4, 2019. https://github.com/solidityj/solidity-antlr4.
[34] Z. Liu, P. Qian, X. Wang, L. Zhu, Q. He, S. Ji, Smart contract vulnerability detection: From pure neural network to interpretable graph feature
and expert pattern fusion, in: IJCAI, 2021, pp. 2751–2759. https://www.ijcai.org/proceedings/2021/0379.pdf.
ie
[35] M. Zhang, Z. Cui, M. Neumann, Y. Chen, An end-to-end deep learning architecture for graph classification, in: Proceedings of the AAAI
conference on artificial intelligence, volume 32, 2018. https://ojs.aaai.org/index.php/AAAI/article/view/11782.
[36] A. Joulin, E. Grave, P. B. T. Mikolov, Bag of tricks for efficient text classification, EACL 2017 (2017) 427. https://aclanthology.org/
E17-2.pdf#page=459.
[37] J. A. Harer, L. Y. Kim, R. L. Russell, O. Ozdemir, L. R. Kosta, A. Rangamani, L. H. Hamilton, G. I. Centeno, J. R. Key, P. M. Ellingwood,
ev
et al., Automated software vulnerability detection with machine learning, arXiv preprint arXiv:1803.04497 (2018). https://arxiv.org/
pdf/1803.04497.pdf.
[38] F. Hill, K. Cho, A. Korhonen, Learning distributed representations of sentences from unlabelled data, in: Proceedings of the 2016 Conference
of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1367–1377.
https://aclanthology.org/N16-1162.pdf.
[39] T. Hu, X. Liu, T. Chen, X. Zhang, X. Huang, W. Niu, J. Lu, K. Zhou, Y. Liu, Transaction-based classification and detection approach
for ethereum smart contract, Information Processing & Management 58 (2021) 102462. https://www.sciencedirect.com/science/
r
article/pii/S0306457320309547.
er
pe
ot
tn
rin
ep
Pr
21
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099