0% found this document useful (0 votes)
34 views21 pages

SSRN 4331099

This paper presents a novel multimodal AI framework for detecting vulnerabilities in smart contracts, emphasizing a flexible and customizable methodology that integrates both static analysis and machine learning techniques. It introduces a series of supervised tasks that utilize various feature extraction methods, including code and graph embeddings, and evaluates their effectiveness on a large dataset, achieving a detection performance of up to 99.71%. The framework aims to address the limitations of existing approaches by providing a comprehensive testing methodology that enhances the reliability of smart contract vulnerability detection.

Uploaded by

salah0205.i.am
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views21 pages

SSRN 4331099

This paper presents a novel multimodal AI framework for detecting vulnerabilities in smart contracts, emphasizing a flexible and customizable methodology that integrates both static analysis and machine learning techniques. It introduces a series of supervised tasks that utilize various feature extraction methods, including code and graph embeddings, and evaluates their effectiveness on a large dataset, achieving a detection performance of up to 99.71%. The framework aims to address the limitations of existing approaches by providing a comprehensive testing methodology that enhances the reliability of smart contract vulnerability detection.

Uploaded by

salah0205.i.am
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

d

A Novel Extended Multimodal AI Framework towards Vulnerability Detection

we
in Smart Contracts

Wanqing Jiea , Jiaqi Wanga , Arthur Sandor Voundi Koea,∗, Jin Lia,∗, Qi Chena , Pengfei Huanga , Yaqi Wua , Yin Wanga
a Institute of Artificial Intelligence and Blockchain, Guangzhou University, 510006, Guangzhou, China

ie
Abstract

ev
Current automatic data-driven vulnerability detection in smart contracts selects and processes features of interest under
black box settings without empirical justification. In this paper, we propose a smart contract testing methodology that
bestows developers with flexible, practical and customizable strategies towards vulnerability detection. It enforces
strong whitebox knowledge to a series of supervised multimodal tasks under static analysis. Each task encapsulates
a vulnerability detection branch test and pipelines feature selection, dimension unification, feature fusion, model

r
training and decision making. Moreover, we exploit multiple features made up of code and graph embeddings at the
single modality level (intramodal settings) and across individual modalities (intermodal settings). We also assign each
task to either intramodal or intermodal settings, and show how to train self-attentive bi-LSTM, textCNN, and random
er
forest (RF) models to extract a joint multimodal feature representation per task. We evaluate our framework over
101,082 functions extracted from the SmartEmbed dataset, and rank each multimodal vulnerability mining strategy
in terms of detection performance. Extensive experiments show that our work outperforms existing schemes and the
highest performance reaches 99.71%.
pe
Keywords: Smart Contract, Vulnerability detection, Multimodal, AI approach, White box

1. Introduction
ot

The increasing popularity and adoption of blockchain technology have resulted in an abundance of blockchain
solutions. According to [1], the investment rate in global blockchain deployments will reach 19 billion USD by 2024.
What stands out in blockchain technology is the use of smart contracts [2] to allow untrustworthy parties to securely
adhere to a set of promises over their assets. In the literature, the term smart contract tends to be used to refer to an
tn

immutable self-contained block of rules written in a contract oriented language such as Solidity [3]. A well-known
fact is that the first proof of concept related to smart contracts was provided over the Ethereum blockchain [4], and
an estimate of one million smart contracts, which control several billion dollars in digital currency, has been deployed
on Ethereum. There is some evidence such consistent wealth attracts attackers and raises concerns over the safety of
smart contracts [5]. There are two notable directions towards securing smart contracts: contract codification, which
rin

addresses the need to write an optimized and correct contract with fewer bugs [6], and contract vulnerability detection,
which identifies weaknesses in the contract code such as the re-entrancy vulnerability in decentralized autonomous
organization (DAO) contract [7]. In this paper, we investigate the detection of vulnerabilities in solidity written smart
contracts for the Ethereum blockchain.

1.1. Vulnerability Detection Landscape


ep

Historically, mining vulnerabilities has often been referred to as ”searching for a needle in haystack” [8]. Research
on the subject mostly focuses on the use of static analysis for vulnerability uncovering. For example, based on the
taxonomy of smart contracts proposed by Atzei et al. [9], Argañaraz et al. [10] leverages static analysis of the source
Pr

∗ Correspondingauthor
Email addresses: [email protected] (Arthur Sandor Voundi Koe), [email protected] (Jin Li)

Preprint submitted to Information Science October 23, 2022

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
code and formulate expert-based verification rules. Another example is Slither [11], a static analyzer for solidity
written smart contracts that relies on a set of manually coded vulnerability detectors. However, one criticism of much

we
of the literature on static analysis is that it suffers from heavy reliance on expert-based rules, leading to high false
positive rate, as well as from a laborious effort to address new vulnerabilities [12]. Despite the aforesaid, Gao et al.
[13], in their impressive investigation, advocate to use static analysis which is faster, compared to other more reliable
but expensive methods such as dynamic analysis [14][15] and dynamic symbolic execution [7][16][17].
Different methods based on Machine learning (ML) and deep learning (DL) have been proposed to support au-
tomation in software vulnerability mining. Nevertheless, traditional machine learning techniques still rely on fixed

ie
expert-based predictors which are susceptible to bias and insufficient generalization [18]. To uphold automatic data-
driven vulnerability detection, deep networks were successfully applied to supervised feature learning for single and
multiple modalities.

ev
The topic of multimodal learning applied to vulnerability mining in smart contracts can best be treated when
considering three multimodal data sources. First, the contract source code also known as source code layer (SC) made
of features acquired from processing the contract source code. Second, the built-based data or built-based layer (BB)
comprising features extracted from the contract compilation. Third, the contract’s Ethereum virtual machine (EVM)
bytecode also known as EVM bytecode layer (EVMB) which encompasses features obtained from processing the

r
contract EVM bytecode. Under multimodal learning, each multimodal data source, also known as modality, has one
or multiple sub-modalities expressed as features.
We hold on to the three modalities mentioned previously and classify the relationship among features under two
er
main categories. First, the intramodal settings describing the analysis of features belonging to an individual modality,
namely SC, BB, and EVMB. Second, the intermodal settings relating to the combination of features across individual
modalities. We further distinguish two subgroups in the intermodal settings: two-by-two intermodal settings (SC+BB,
SC+EVMB, BB+EVMB) and three-by-three intermodal settings (SC+BB+EVMB).
The most serious disadvantage in multimodal learning that applies to smart contract static vulnerability analysis
pe
is the lack of a common testing methodology. This deficiency causes developers to struggle when designing solutions
to ensure the reliability of their smart contracts. Such a methodology avenue should guarantee strong whitebox
knowledge testing and discourage active learning.

1.2. Technical Challenges


There are four main technical challenges in addressing smart contract vulnerability mining under multimodal AI
ot

settings. First, understanding the nature of raw features to extract as well as their significance towards vulnerability
detection performance. Second, choosing the best suitable AI models to perform vulnerability detection inference.
Third, finding the appropriate feature fusion technique yielding higher detection outcome. Fourth, maintaining a high
level of whitebox knowledge throughout the vulnerability detection pipeline.
tn

1.3. Motivations
Most studies in the field of smart contract vulnerability mining have only focused on intramodal and two-by-two
intermodal settings. Moreover, they operate under black box settings and embrace fixed rules regarding feature se-
lection, feature fusion, and AI models to leverage. Hence, it is not possible to investigate the significant relationships
between feature selection approaches, feature fusion techniques, AI models’ choice and the effectiveness of vulner-
rin

ability detection in smart contracts. Such situation denotes the lack of a common and clear methodology to guide
developers in vulnerability uncovering under intramodal settings, two-by-two and three-by-three intermodal settings.
In this paper, we aim to design an information-rich framework to enhance the research on smart contract vulnera-
bility mining under multimodal AI.
ep

1.4. Our Approach


On the question of smart contract vulnerability detection, this paper describes the design and implementation of
a novel and transparent methodology towards vulnerability mining; our framework supports contracts published with
or without their source code.
We apply static analysis to the function granularity level and characterise all extracted features using code embed-
Pr

dings and graph embeddings. We use a word2vec model and a bidirectional encoder representation from transformers
(BERT) model to generate code embeddings, while a graph convolutional network (GCN) outputs graph embeddings.
2

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
We define eighty-four flexible, practical and customizable strategies to achieve strong whitebox knowledge and
guide developers and researchers towards practical and effective smart contract vulnerability uncovering under multi-

we
modal AI settings.
We model each strategy as a supervised vulnerability detection task that pipelines feature selection, single feature
dimension unification technique, single feature fusion approach, model training, and a single unit for decision making.
We exploit features from intramodal settings: SC, BB and EVMB separately, from two-by-two intermodal settings:
SC+BB, SC+EVMB and BB+EVMB separately, and from three-by-three intermodal settings: SC+BB+EVMB. Fig-
ure 1 depicts the three multimodal data sources together with their associated features. Owing to multimodal learning,

ie
each task relies on max-pooling (MP), spatial pyramid pooling (SPP), and dense layers (Dense) for feature dimension
uniformization. Each task further implements either horizontal or vertical feature concatenation for feature fusion.
Each task includes AI training and AI inference for state-of-the art text convolutional neural network (textCNN), bi-

ev
directional long short term memory (bi-LSTM) with self-attention, and random forest (RF) machine learning model.
What can be clearly seen is that the set of all tasks in our framework form the smart contract vulnerability branch
coverage under multimodal learning. We compare our work with the existing literature and assess the increase in
performance under intramodal and intermodal settings respectively.

r
1.5. Our Contributions
This paper discusses an innovative multimodal learning approach for detecting smart contract vulnerabilities. The
main contributions of this work are summarized as follows.
er
1) Features mixing. Our framework expresses multiple features respectively under intramodal and intermodal
settings. We characterise such features as code and graph embeddings, to leverage the power of natural language
processing (NLP) algorithms.
2) Vulnerability detection strategies. We develop a series of supervised tasks for automatic vulnerability mining
pe
in Ethereum smart contracts under multimodal learning [19]. Each task represents a vulnerability detection branch test
and pipelines feature selection, feature dimension unification, feature fusion, model training and model testing. More-
over, we assign each task to serve as a vulnerability detection strategy in either intramodal, two-by-two intermodal or
three-by-three intermodal settings.
3) Experimental evaluation of strategies.
We evaluate every task by leveraging textCNN, bi-LSTM, and RF for training and decision making; MP, SPP
and dense layers for dimension unification; as well as horizontal feature concatenation (stack) and vertical feature
ot

concatenation (concat) for feature fusion. Extensive empirical analysis over the SmartEmbed dataset [13] reveals that
under intramodal settings, artifacts from (BB) perform best, while under two-by-two intermodal settings, (SC+BB)
has significant advantage. Finally, the best detection strategy is achieved by the shared representation learning across
the three modalities (SC+BB+EVMB), and based on evidence, two-by-two intermodal settings outperform intramodal
tn

settings.

2. Background
rin

It is necessary here to clarify exactly what is meant by feature fusion and how our work exploits such a concept.
The term feature fusion refers to combining features of different layers, different modalities or different branches
[20]. The concept of feature fusion embodies a multitude of techniques grouped under four categories.
First, feature vectors addition performing element-wise addition. For example, let A and B be two vectors of same
size. The fusion of A and B produces a single vector C where A + B = C.
ep

Second, feature vectors concatenation. This paper distinguishes two main concatenation types. In horizontal
concatenation known as concat, let I be a row vector of dimension M1×n , and J be a row vector of dimension M1×k .
The horizontal concatenation of vectors I and J is equivalent to M1×(n+k) . In vertical concatenation denoted as stack,
let U be a row vector of dimension Md×n , and V be a row vector of dimension Me×n , such that U and V have the same
amount of column-wise elements n with same or different values. We define the stacking of vectors U and V as the
matrix T such that T = M(d+e)×n .
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
The third feature fusion technique is gated feature vectors fusion leveraged by [21] that proposes a gated fusion
unit which concatenates feature vectors as input, then combines them with an average pooling layer, a dense layer and

we
a sigmoid layer for caricature recognition.
The fourth feature fusion approach is based on attention mechanism which measures the contributions of each
individual feature towards the segmentation accuracy and can remove redundant features [22].
In this work, we weight horizontal and vertical concatenations of features over other feature fusion techniques.

ie
3. Methodology
This section highlights the two essential parts of our framework. First, how to extract features and fuse them to
get a joint multimodal representation under intramodal and intermodal settings. Second, how to input intramodal and

ev
intermodal features to AI models to design vulnerability detection strategies that provide full white-box knowledge.

3.1. Hierarchical Feature Extraction and Fusion


In this subsection, we show how to acquire features from the SC layer, the BB layer, and the EVMB layer. We
further detail how to combine those features for smart contract vulnerability detection.

r
3.1.1. Feature extraction under intramodal settings
We evaluate key aspects to acquire features at every separate layer of the intramodal settings. Moreover, Figure 1
er
illustrates the intermodal settings and a more detailed account of intramodal layers is given below.
a) Source code layer (SC): This layer manages features acquired from processing the contract source code. In
this work, we choose the function as granularity level, and parse the contract source code to extract the set of all
functions. We rely on existing tools [11] to define the ground truth binary labels for the different functions. We
leverage improvements from natural language processing (NLP) and convert function definitions into embedding
pe
vectors. Specifically, we apply two types of embedding vector models: the Word2vec model [23], and the BERT
model [24]. The word2vec embeddings (SC-W2V) and the BERT embeddings (SC-Bert) form the two features of
interest at the SC layer.
b) Built-based layer (BB): This layer manages features extracted during the contract source code compilation.
We exploit the static analyzer tool Slither [11], and extract the call flow graph (CFG) and the static single assignment
expression (SSA) of every function. We apply word2vec and BERT embeddings to SSA encodings, and generate graph
ot

embeddings over CFGs thanks to an untrained graph convolutional network (GCN). As a result, word2vec embeddings
(SSA-W2V), BERT embeddings (SSA-Bert), and graph embeddings (BB-CFG) are the three main features at the BB
layer. We set the binary label for every function at the BB layer through a simple matching with the corresponding
function from the SC layer.
tn

c) EVM Bytecode Layer (EVMB): The EVMB layer is responsible for features acquired from processing the
contract bytecode expression. We disassemble every contract EVM bytecode using the Eth2vec [25] tool, and design
a CFG generator to output CFGs from each disassembled contract. We perform label matching from SC functions and
set the corresponding label to each CFG. Regarding the features of interest at the EVMB layer, we apply an untrained
GCN over CFGs to generate graph embeddings (EVMB-CFG), as well as word2vec embeddings (EVMB-ASM) to
every function in disassembled contracts.
rin

3.1.2. Feature extraction under Intermodal settings


There are two approaches to define features of interest for vulnerability detection under intermodal settings
as shown in Figure 2: The two-by-two intermodal settings manage the combinations of features from (SC+BB),
ep

(BB+EVMB) and (SC+EVMB) and the three-by-three intermodal settings deal with features from (SC+BB+EVMB).
a) SC + BB Combination: This two-by-two intermodal combination of features from SC and BB layers in-
vestigates the associated performance in vulnerability detection. It brings together five features: SC-W2V, SC-Bert,
SSA-W2V, SSA-Bert and BB-CFG.
b) SC + EVMB Combination: Such two-by-two intermodal settings combine features from SC and EVMB
Pr

layers. It assesses the vulnerability detection performance linked to such combination, and brings together four
features of interest: SC-W2V, SC-Bert, EVMB-ASM and EVMB-CFG.
4

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
2022/10/18 08:55 【1018】fig1.svg

(a) source code layer (SC)

we
pragma solidity ^0.4.24; function A1(){
contract A{ total = total + msg.value;
function A1() { uint amount = msg.value;
…… …… } Word2Vec SC-W2V
}
function A2(address
function A2(address _add)
_add){ ……
{ uint balance =
}
balances[_add] SC-Bert
function …… BERT
…… }
} …


source code word embedding SC features
function
(b) Build-based layer (BB)

ie
function A1(){
total2=total1+msg.value;
uint amount1=msg.value;
Word2Vec SSA-W2V
…… }

function A2(address _add)


pragma solidity ^0.4.24;
{ uint balance1 =
contract A{
balances_1[_add] SSA-Bert
function A1() {

ev
…… } BERT
……
} …
function A2(address static single assignment(SSA) word embedding BB features
_add){ ……
}
function …… 𝑁1
}

𝑁2
source code 𝑁3
BB-CFG

r
𝑁4 GCN

𝑁𝑖 Node Edge
graph embedding BB features
control flow graphs(CFG)
(c) EVM bytecode layer (EVMB)
er
[1] PUSH1 0x60
[2] BLOCKHASH
[3] MSTORE
[5] PUSH1 0x04
[6] CALLDATASIZE
EVMB-ASM
606060405260043610603f [7] LT Word2Vec
576000357c01000000000 [9] PUSH1 0x3f
000000000000000000000 [10] JUMPI
000000000000000000000 [12] PUSH1 0x00
pe
00000900463ffffffff168063 …… word embedding EVMB features
1176bd96146044575b600 aseembly code(asm)
080fd5b3415604e5760008
0fd5b60626004808035906
020019091905050606456 𝑁1
5b005b80600……

EVM bytecode 𝑁2
𝑁3
EVMB-CFG
GCN
𝑁4

𝑁𝑖 Node Edge graph embedding EVMB features


ot

control flow graphs(CFG)

Figure 1: Overview of three modalities and associated features. (a) Source code layer (SC) manages features acquired from processing the contract
file:///E:/postgraduate——freshman/智能合约漏洞检测【实验增强版】/论文图表/【1018】fig1.svg 1/1
source code. (b) Built-based layer (BB) manages features extracted during the contract source code compilation. (c) The EVM Bytecode Layer
(EVMB) is responsible for features acquired from processing the contract bytecode expression.
tn

c) BB + EVMB Combination: The current combination evaluates the detection performance in mixing features
from BB and EVMB layers. It combines five main features: SSA-W2V, SSA-Bert, BB-CFG, EVMB-ASM and
EVMB-CFG.
rin

d) SC + BB + EVMB Combination: This three-by-three intermodal settings simultaneously leverages features


from SC, BB and EVMB layers. It investigates the efficacy of such intermodal vulnerability detection approach, and
combines seven features of interest: SC-W2V, SC-Bert, SSA-W2V, SSA-Bert, BB-CFG, EVMB-ASM and EVMB-
CFG.
ep

3.2. Vulnerability detection strategies


This subsection defines the multiple supervised tasks that associate the hierarchy of fused features together with
state-of-the-art AI models under intramodal and intermodal settings. Figure 3 provides an overview of such tasks.
In our framework, each task represents a vulnerability detection branch test and pipelines input feature selection,
Pr

feature dimension unification, feature fusion, model training and decision making. We aim to assess and classify the
performance of every vulnerability detection strategy.
5

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
Source Code Layer (SC)

we
Word
Extracted Embedding
Functions Code

ie
SC+BB+EVMB

Built‐Based Layer (BB) EVM Bytecode Layer (EVMB)

ev
Word Word
Static Single Embedding BB+EVMB Extracted
Embedding
Assignment Functions
Assembly Code
1
1
3 2
3 2
4

r
4
Control Flow Graph Graph
Embedding Control Flow Graph Graph
Embedding

er
Figure 2: Intermodal settings for smart contract vulnerability detection. The two-by-two intermodal settings include the combinations of features
from (SC+BB), (BB+EVMB) and (SC+EVMB), and the three-by-three intermodal settings deal with features from (SC+BB+EVMB).
pe
Appendix A details all the specific tasks leveraged to build our framework. The following is a brief description
of what happens during feature dimension unification, feature fusion, as well as during model training and testing.
a) Feature dimension unification: Recent research has revealed that at least one dimension should be equal when
combining different features of interest. To achieve such objective, dimension unification relies on three techniques.
First, we apply a max-pooling layer (MP) to input features. Second, we implement a fully connected layer also
known as dense layer (Dense) over feature candidates for fusion. Third, we experiment with spatial pyramid pooling
layer (SPP) [26, 27] to uniformize feature dimension. Recent research has revealed pooling layers choose meaningful
ot

information but cause detailed information loss [28]. Dense layers learn local and global feature information between
layers [29], but learning too many features may slow down the training and lead to overfitting.
From the three methods mentioned above, MP, Dense and SPP layers are all likely to have a positive impact in the
dimension unification stage: We evaluate the impact of each those dimension unification methods over vulnerability
tn

detection effectiveness.
b) Proper feature fusion: The proper feature fusion follows the dimension unification stage. It relies on hori-
zontal concatenation (concat) and vertical concatenation (stack) for feature fusion under intramodal and intermodal
settings.
c) model training:
rin

After the feature fusion stage, we adopt a bi-LSTM model with self attention at fusion model training stage. The
bi-LSTM model with self attention is the fixed strategy selection model in this stage.
In addition, we replaced the bi-LSTM model with VanillaRNN [30] and gated recurrent unit (GRU) [31] for
performance comparison (Table 2), respectively, in the intramodal fusion of the SC layer at the fusion model training
stage. The performance of the bi-LSTM model for smart contract vulnerability detection is significantly better.
ep

d) Decision making: The last stage is decision making. To train and evaluate the vulnerability detection, we
proceed as follows. First, we adopt a bi-LSTM model with self attention combined with a random forest (RF) model.
Second, we combine a self-attentive bi-LSTM with textCNN model.
We train RF model and textCNN model in the decision making stage as the strategy selection model in our
multimodal feature fusion network.
Vulnerability detection strategies: Through the above four stages of model selection respectively, we investigate
Pr

twelve strategies of interest at each intramodal or intermodal setting after feature selection, and we denote | as the

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
SC features multimodal feature
fusion settings

we
SC-W2V
intramodal
SC
(a) uniform vector dimensions (b) feature fusion (c) model training (d) decision making
SC-Bert

BB features BB

EVMB input MP/Dense/


SSA-W2V
SPP layer

self-attention
stack/concat
features
𝑃1 𝑃1′

bi-LSTM
intermodal ……

layer
SSA-Bert RF/textCNN yො
SC+BB or

ie
……
BB-CFG
SC+EVMB MP/Dense/
SPP layer
EVMB features 𝑃2 𝑃2′
BB+EVMB …… …… 𝑉
EVMB-ASM
SC+BB+EVMB

ev
mutimodal feature fusion network
EVMB-CFG

Figure 3: Multimodal feature fusion architecture for smart contract vulnerability detection. According to the multimodal feature fusion settings
(intramodal and intermodal), to select the processed features and input them into the multimodal feature fusion network. The network includes four
stages: (a) uniform vector dimensions, (b) feature fusion, (c) model training and (d) decision making.

r
pipeline symbol. er
• strategy 1: SPP | concat | (bi-LSTM+self-attention) | textCNN.
• strategy 2: SPP | concat | (bi-LSTM+self-attention) | RF.
• strategy 3: SPP | stack | (bi-LSTM+self-attention) | RF.
pe
• strategy 4: SPP | stack | (bi-LSTM+self-attention) | textCNN.
• strategy 5: MP | concat | (bi-LSTM+self-attention) | textCNN.
• strategy 6: MP | concat | (bi-LSTM+self-attention) | RF.
• strategy 7: MP | stack | (bi-LSTM + self-attention) | RF.
ot

• strategy 8: MP | stack | (bi-LSTM + self-attention) | textCNN.


• strategy 9: Dense | stack | (bi-LSTM + self-attention) | RF.
tn

• strategy 10: Dense | stack | (bi-LSTM+self-attention) | textCNN.


• strategy 11: Dense | concat | (bi-LSTM+self-attention) | textCNN.
• strategy 12: Dense | concat | (bi-LSTM+self-attention) | RF.
The above 12 vulnerability detection strategies are carried out separately under 7 multimodal feature fusion set-
rin

tings, and the optimal performance model under each setting is used as the final framework.
The performance of each strategy and outperforming strategies in our smart contract vulnerability detection frame-
work will be described in detail in the next experiment section.

4. Experiments
ep

4.1. Experimental Settings


To undertake empirical analysis, we develop a top-down parser with leftmost derivation, using the ANTLR tool
version 4 [32] over the solidity grammar [33]. We code a CFG generator that leverages contracts disassembled with
Eth2Vec tools [25]. We implement AI models using TensorFlow, Keras, Gensim and Stellargraph libraries. We adopt
Pr

a physical machine with the following characteristics: Intel (R) Xeon (R) Gold 6240R CPU running at 2.40 GHz, 32
GB of RAM and a hard disk drive of 6.5 TB. We conduct experiments with Python 3.6.2 under Ubuntu 20.04.
7

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
4.2. Dataset Construction

we
We exploit contracts from the SmartEmbed dataset [13]. Such dataset is made of 5000 verified smart contracts
published with source code on the Ethereum Mainnet.
Using our parser, we process the contract source code and extract the set of function definitions. We obtain a
complete dataset of 101,082 functions. we label our functions using existing tools [11] and the result is as follows:
87,641 functions are classified as non-vulnerable under the class 0, and 13,441 functions are classified as vulnerable
under the class 1. Such statistics translate into class imbalance between class 0 and class 1.
We solve the class imbalance issue at the SC layer and propagate the effects over the other layers. To achieve our

ie
objective, we evaluate several class imbalance resolution approaches summarized in Table 1 in which W represents
the weight associated to a class. First, we simply upsample and downsample examples from both classes with no
particular technique in mind. Second, we follow the inverse number of samples (INS) approach. Third, we implement

ev
the effective sample number weighting (ENS) technique. Fourth, we adopt the inverse square root of the number of
samples (ISNS) method. Fifth, we experiment with the synthetic minority oversampling technique (SMOTE).
We leverage a random forest AI model as our baseline construction to solve the imbalance issue. For every
approach, the testing set takes 20% of the dataset. We achieve the best results under SMOTE. We undersample class
0 with a factor of 28.5251% and we upsample class 1 to 25 000 samples. We propagate the SMOTE technique to

r
address class imbalance issues at the BB and the EVMB layers respectively.

Table 1: Summary of class imbalance resolution approaches


er
Strategy Embeddings Wclass 0 Wclass 1 Accuracy F1 Precision Recall AUC-ROC
None SC-Bert 1 1 0.9611 0.8386 0.9174 0.7722 0.8831
None SC-W2V 1 1 0.9605 0.8370 0.9239 0.7651 0.8740
INS SC-Bert 0.1659425 1.7340575 0.9584 0.8249 0.9172 0.7495 0.8679
pe
INS SC-W2V 0.1659425 1.7340575 0.9559 0.8093 0.9307 0.7159 0.8539
ENS SC-Bert 0.85015092 1.14984908 0.9612 0.8374 0.9267 0.7639 0.8774
ENS SC-W2V 0.85015092 1.14984908 0.9601 0.8338 0.9310 0.7550 0.8727
ISNS SC-Bert 0.56282351 1.43717649 0.9588 0.8274 0.9164 0.7541 0.8718
ISNS SC-W2V 0.56282351 1.43717649 0.9583 0.8260 0.9263 0.7453 0.8681
SMOTE SC-W2V 1.285251 1 0.9430 0.9452 0.9168 0.9754 0.9427
ot

SMOTE SC-Bert 1.285251 1 0.9416 0.9438 0.9160 0.9734 0.9413

4.3. Performance Analysis


tn

Following we propose the questions committed to provide white-box knowledge, and give corresponding answers
based on analysis of the experimental results. Among them, RQ1 and RQ2 are for intramodal settings, and RQ3 to
RQ6 are for intermodal settings. Besides, We embolden each column-wise tuple that holds the highest performance
result in tables below. Such a move aims to ease the interpretability of data.

4.3.1. Intermodal settings (RQ1 to RQ2)


rin

a) RQ1: What strategy yields highest detection results in SC, BB and EVMB separately? Is our intermodal
framework outperform state-of-the-art methods?
Regarding the SC layer, we leverage several state-of-the-art AI models: LSTM, VanillaRNN [30], RF, textCNN
and gated recurrent unit (GRU) [31] for training and testing over the balanced dataset attached to the layer. We com-
pare such state-of-art models with vulnerability detection tasks from task 1 to task 12 and Table 2 portrays the com-
ep

parison results. We observe that for selecting (SC-W2V+SC-Bert) features, strategy 7 (MP | stack | (bi-LSTM+self-
attention) | RF) outperforms existing state of the art models and aligns as the best strategy for vulnerability detection
at the SC layer.
As with the BB layer, we hook on the AMEVulDetector model [34], which promotes feature fusion through a
cross-attention layer. We further implement two types of untrained GCNs: GCN1 which produces embeddings BB-
Pr

CFG1 under the Keras library, and GCN2 realized under the Stellargraph framework, which outputs embeddings BB-
CFG2. The key difference between GCN1 and GCN2 stems from the observation that embeddings from GCN1 reveal
8

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
Table 2: SC layer performance comparison

we
Methods Acc(%) Recall(%) Precision(%) F1(%) Methods Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 98.04 97.89 98.20 98.04 MP|stack|bi-LSTM|textCNN 97.62 97.30 97.96 97.62
MP|concat|bi-LSTM|RF 97.40 96.76 98.08 97.40 MP|concat|bi-LSTM|textCNN 97.36 96.91 97.84 97.36
MP|stack|RNN[30]|RF 95.88 94.31 97.80 95.88 MP|stack|RNN[30]|textCNN 95.44 94.09 97.13 95.44
MP|stack|GRU[31]|RF 95.82 94.44 97.52 95.82 MP|stack|GRU[31]|textCNN 95.10 94.36 96.10 95.10
Dense|stack|bi-LSTM|RF 97.62 96.74 98.56 97.62 Dense|stack|bi-LSTM|textCNN 97.66 96.74 98.64 97.66

ie
Dense|concat|bi-LSTM|RF 97.48 97.52 97.44 97.48 Dense|concat|bi-LSTM|textCNN 97.43 97.06 97.84 97.44
SPP|stack|bi-LSTM|RF 96.94 96.51 97.40 96.94 SPP|stack|bi-LSTM|textCNN 96.67 96.82 96.48 96.67
SPP|concat|bi-LSTM|RF 97.08 96.52 97.67 97.08 SPP|concat|bi-LSTM|textCNN 95.92 96.67 95.12 95.92

ev
a very large sparse matrix, while embeddings from GCN2 reveal a lesser sparse matrix than GCN1. We hypothesize
such sparse matrix may result from consequent loss of information during GCN1 generation. We prioritize BB-CFG2
to increase the upshot and exhibit experimental results in Table 3. We conclude that for selecting (SSA-W2V+SSA-
Bert+BB-CFG) features, strategy 7 ( MP | stack | (bi-LSTM+self-attention) | RF) yields the highest performance

r
at the BB layer over AMEVulDetector model [34] and the range of all the strategies of BB layer.

Table 3: BB layer performance comparison

Methods
Acc(%)
er
Recall(%)
BB-CFG1
Precision(%) F1(%) Acc(%)
BB-CFG2
Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 97.80 96.72 98.96 97.80 98.27 97.56 99.01 98.27
MP|stack|bi-LSTM|textCNN 97.33 96.20 98.56 97.33 98.08 97.40 98.80 98.08
pe
MP|concat|bi-LSTM|RF 97.44 96.47 98.48 97.44 98.24 97.33 99.20 98.24
MP|concat|bi-LSTM|textCNN 97.60 96.56 98.72 97.60 98.08 97.77 98.40 98.08
Dense|stack|bi-LSTM|RF 97.59 96.48 98.77 97.59 98.13 97.35 98.96 98.13
Dense|stack|bi-LSTM|textCNN 97.24 95.95 98.64 97.24 98.13 97.33 98.99 98.13
Dense|concat|bi-LSTM|RF 97.76 97.01 98.56 97.76 97.88 96.72 99.12 97.88
Dense|concat|bi-LSTM|textCNN 97.08 96.30 97.92 97.08 97.36 96.69 98.08 97.36
ot

SPP|stack|bi-LSTM|RF 97.05 96.20 97.97 97.05 98.16 97.60 98.75 98.16


SPP|stack|bi-LSTM|textCNN 97.44 96.57 98.37 97.44 97.90 97.00 98.89 97.91
SPP|concat|bi-LSTM|RF 97.24 96.60 97.92 97.24 98.16 97.78 98.56 98.16
SPP|concat|bi-LSTM|textCNN 97.32 96.4 98.24 97.32 98.12 96.95 99.36 98.12
tn

cross-attention[34] 96.54 97.05 96.07 96.56 97.18 97.80 96.61 97.20

As regards the EVMB layer, we cling to the deep graph convolutional neural network (DGCNN) model [35]
delivering embeddings EVMB-DGCNN. We reintroduce the above-mentioned GCN1, which produces embeddings
EVMB-CFG1, and GCN2, which outputs embeddings EVMB-CFG2. We analyse the results in Table 4 and make
the following conclusion based on EVMB-CFG2 that offers higher detection effectivity: strategy 9 ( Dense | stack |
rin

(bi-LSTM+self-attention) | RF) performs better than DGCNN [35] and the range from strategy 1 to strategy 12 at
EVMB layer.
We depict in Fig. 4 the receiver operator characteristic (ROC) curves to support our findings regarding optimal
strategies in SC, BB and EVMB layers.
b) RQ2: What modality leads to better detection performance among SC, BB and EVMB?
ep

Based on empirical evidence from Table 2, Table 3, and Table 4, we conclude the following under intramodal
settings. Among SC, BB and EVMB, BB leads smart contact vulnerability assessment, followed by SC performance-
wise, while EVMB offers the least performance upshot.

4.3.2. Intermodal settings (RQ3 to RQ6)


Pr

a) RQ3: What strategy upholds highest detection results in (SC + BB), (SC + EVMB), (BB + EVMB), and
(SC + BB + EVMB) combinations separately?
9

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
we
Table 4: EVMB layer performance comparison
EVMB-DGCNN[35] EVMB-CFG1 EVMB-CFG2
Methods
Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 93.48 91.84 95.44 93.48 93.92 93.03 94.96 93.92 95.16 93.93 96.56 95.16
MP|stack|bi-LSTM|textCNN 93.16 91.53 95.12 93.16 93.38 92.61 94.28 93.38 94.60 93.93 95.36 94.60
MP|concat|bi-LSTM|RF 93.84 92.41 95.52 93.84 93.68 92.72 94.80 93.68 95.08 93.85 96.48 95.08

ie
MP|concat|bi-LSTM|textCNN 94.10 92.73 95.76 94.20 93.88 92.88 95.04 93.88 95.24 94.84 95.68 95.24
Dense|stack|bi-LSTM|RF 94.10 92.42 96.08 94.10 94.44 93.57 95.44 94.44 95.64 94.96 96.40 95.64
Dense|stack|bi-LSTM|textCNN 93.32 92.54 94.24 93.32 94.40 92.96 96.08 94.40 95.34 93.92 96.96 95.34
Dense|concat|bi-LSTM|RF 94.08 92.29 96.20 94.08 93.96 91.98 96.32 93.96 95.20 94.35 96.16 95.20

ev
Dense|concat|bi-LSTM|textCNN 93.72 93.07 94.48 93.72 93.87 92.68 95.28 93.88 94.72 92.74 97.04 94.72
SPP|stack|bi-LSTM|RF 91.24 90.69 91.92 91.24 92.56 91.92 93.32 92.56 93.76 93.13 94.48 93.76
SPP|stack|bi-LSTM|textCNN 90.50 91.62 89.16 90.50 91.32 90.13 92.80 91.32 93.56 93.11 94.08 93.56
SPP|concat|bi-LSTM|RF 91.24 90.24 92.48 91.24 91.48 90.99 92.08 91.48 94.36 93.42 95.44 94.36
SPP|concat|bi-LSTM|textCNN 91.56 89.20 94.56 91.57 91.72 90.65 93.04 91.72 93.36 94.53 92.56 93.60

r
ROC curve
er ROC curve
1.00 1.00

0.98 0.98
pe
0.96 0.96
MP|stack|bi-LSTM|RF MP|stack|bi-LSTM|RF
True positive rate

True positive rate

MP|stack|bi-LSTM|textCNN 0.94 MP|stack|bi-LSTM|textCNN


0.94 MP|concat|bi-LSTM|RF MP|concat|bi-LSTM|RF
MP|concat|bi-LSTM|textCNN MP|concat|bi-LSTM|textCNN
Dense|stack|bi-LSTM|RF 0.92 Dense|stack|bi-LSTM|RF
0.92
Dense|stack|bi-LSTM|textCNN Dense|stack|bi-LSTM|textCNN
Dense|concat|bi-LSTM|RF 0.90 Dense|concat|bi-LSTM|RF
0.90 Dense|concat|bi-LSTM|textCNN Dense|concat|bi-LSTM|textCNN
SPP|stack|bi-LSTM|RF 0.88 SPP|stack|bi-LSTM|RF
0.88 SPP|stack|bi-LSTM|textCNN SPP|stack|bi-LSTM|textCNN
SPP|concat|bi-LSTM|RF SPP|concat|bi-LSTM|RF
ot

SPP|concat|bi-LSTM|textCNN 0.86 SPP|concat|bi-LSTM|textCNN


0.86
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.00 0.01 0.02 0.03 0.04 0.05
False positive rate False positive rate

(a) SC-ROC (b) BB-ROC


tn

ROC curve
1.00

0.95

0.90 MP|stack|bi-LSTM|RF
True positive rate

MP|stack|bi-LSTM|textCNN
rin

0.85 MP|concat|bi-LSTM|RF
MP|concat|bi-LSTM|textCNN
Dense|stack|bi-LSTM|RF
0.80 Dense|stack|bi-LSTM|textCNN
Dense|concat|bi-LSTM|RF
Dense|concat|bi-LSTM|textCNN
0.75 SPP|stack|bi-LSTM|RF
SPP|stack|bi-LSTM|textCNN
0.70 SPP|concat|bi-LSTM|RF
SPP|concat|bi-LSTM|textCNN
ep

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16


False positive rate

(c) EVMB-ROC

Figure 4: ROC curves for intramodal settings


Pr

10

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
Regarding SC+BB combination, we exploit GCN1 and GCN2 from the BB layer as detailed in Answer to RQ1.
We weight the combination SC+BB-CFG2, which outputs higher results. We conclude based on Table 5 and Table

we
6 that for (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) feature selecting, strategy 9 ( Dense | stack | (bi-
LSTM+self-attention) | RF) realizes better vulnerability detection within the range from 12 strategies.
As for SC+EMVB combination, we call upon GCN1 and GCN2 from the EVMB layer as illustrated in Answer
to RQ1. We spotlight the SC+EVMB-CFG2 combination, which delivers increased outcome. We conclude based
on Table 5 and Table 6 that for (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) feature selection, strategy 10 (
Dense | stack | (bi-LSTM+self-attention) | textCNN) achieves better vulnerability detection within the range from

ie
all strategies.
In terms of BB+EVMB combination, We emphasize on BB+EVMB-CFG2 combination that renders superior
outcome, and conclude the following. For selecting (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM)

ev
features, strategy 8 MP | stack | (bi-LSTM+self-attention) | textCNN) yields higher performance within the range
from all strategies, based on Table 5 and Table 6.

Table 5: Two-by-two intermodal fusion performance comparison (GCN1-CFG1)


SC+BB-CFG1 SC+EVMB-CFG1 BB+EVMB-CFG1

r
Methods
Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 99.11 98.92 99.31 99.11 98.38 98.09 98.68 98.38 98.39 97.79 99.02 98.39
MP|stack|bi-LSTM|textCNN 99.04 99.04 99.04er 99.04 98.24 97.33 99.20 98.24 98.62 97.99 99.28 98.62
MP|concat|bi-LSTM|RF 99.16 98.81 99.52 99.16 98.24 97.86 98.64 98.24 98.60 98.33 98.88 98.60
MP|concat|bi-LSTM|textCNN 99.04 98.96 99.12 99.04 98.08 98.16 98.00 98.08 98.80 98.41 99.20 98.80
Dense|stack|bi-LSTM|RF 99.15 98.65 99.66 99.15 98.55 98.25 98.72 98.48 98.54 97.96 99.14 98.54
Dense|stack|bi-LSTM|textCNN 98.98 98.62 99.36 98.98 98.22 98.28 98.16 98.22 98.35 97.79 98.94 98.35
pe
Dense|concat|bi-LSTM|RF 98.64 98.33 98.96 98.64 98.44 98.09 98.80 98.44 97.92 97.92 97.92 97.92
Dense|concat|bi-LSTM|textCNN 98.76 98.11 99.44 98.76 98.28 97.93 98.64 98.28 98.12 97.85 98.40 98.12
SPP|stack|bi-LSTM|RF 99.00 98.50 99.52 99.00 98.18 98.16 98.20 98.18 98.27 98.00 98.56 98.27
SPP|stack|bi-LSTM|textCNN 98.93 98.97 98.90 98.93 98.36 98.01 98.72 98.36 98.49 97.66 99.36 98.49
SPP|concat|bi-LSTM|RF 99.00 98.80 99.20 99.00 97.88 97.38 98.40 97.88 98.20 98.01 98.40 98.20
SPP|concat|bi-LSTM|textCNN 99.04 98.65 99.44 99.04 97.92 98.54 97.28 97.92 98.12 97.55 98.72 98.12
ot

Table 6: Two-by-two intramodal fusion performance comparison (GCN2-CFG2)


SC+BB-CFG2 SC+EVMB-CFG2 BB+EVMB-CFG2
Methods
tn

Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 99.34 98.86 99.84 99.34 98.65 98.02 98.80 98.65 98.87 98.52 99.23 98.87
MP|stack|bi-LSTM|textCNN 99.07 98.59 99.57 99.07 98.53 98.16 98.32 98.53 99.23 98.97 99.50 99.23
MP|concat|bi-LSTM|RF 99.38 99.28 99.76 99.52 98.52 98.24 98.80 98.52 98.88 98.57 99.20 98.88
MP|concat|bi-LSTM|textCNN 99.20 98.89 99.52 99.20 98.52 98.41 98.96 98.52 98.84 98.49 99.20 98.84
rin

Dense|stack|bi-LSTM|RF 99.39 98.89 99.90 99.39 98.74 98.28 98.26 98.74 99.01 98.67 99.36 99.01
Dense|stack|bi-LSTM|textCNN 99.37 99.12 99.62 99.37 98.81 98.29 99.34 98.81 98.92 98.49 99.36 98.92
Dense|concat|bi-LSTM|RF 99.00 98.73 99.28 99.00 98.32 98.32 98.32 98.32 98.60 98.41 98.80 98.60
Dense|concat|bi-LSTM|textCNN 99.04 98.73 99.36 99.04 98.20 97.78 98.64 98.20 98.80 98.72 98.88 98.80
SPP|stack|bi-LSTM|RF 99.28 99.04 99.52 99.28 98.31 98.20 98.42 98.31 98.95 98.79 99.12 98.95
SPP|stack|bi-LSTM|textCNN 99.23 99.18 99.28 99.23 98.44 97.86 99.04 98.44 99.18 99.00 99.36 99.18
ep

SPP|concat|bi-LSTM|RF 99.24 98.73 99.76 99.24 98.52 98.40 98.64 98.52 99.08 98.96 99.20 99.08
SPP|concat|bi-LSTM|textCNN 98.84 98.26 99.44 98.84 98.12 98.15 98.08 98.12 99.00 98.57 99.44 99.00

In the case of SC+BB+EVMB three-by-three intermodal combination, we leverage GCN1 and GCN2. We com-
bine (SC+BB) with EVMB-CFG2 instead of EVMB-CFG1, to enhance results. Based on Table 7, We conclude
Pr

selecting (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) features, strategy 10 (

11

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
Dense | stack | (bi-LSTM+self-attention) | textCNN) reaches the highest performance within the range from strategy
1 to strategy 12.

we
Table 7: Three-by-three intermodal fusion performance
SC+BB+EVMB-CFG1 SC+BB+EVMB-CFG2
Methods
Acc(%) Recall(%) Precision(%) F1(%) Acc(%) Recall(%) Precision(%) F1(%)
MP|stack|bi-LSTM|RF 99.37 99.15 99.60 99.37 99.49 99.08 99.90 99.49

ie
MP|stack|bi-LSTM|textCNN 99.37 98.96 99.78 99.37 99.57 99.23 99.92 99.57
MP|concat|bi-LSTM|RF 99.40 99.20 99.60 99.40 99.40 99.12 99.68 99.40
MP|concat|bi-LSTM|textCNN 99.32 99.20 99.44 99.32 99.52 99.20 99.84 99.52
Dense|stack|bi-LSTM|RF 99.59 99.26 99.92 99.59 99.70 99.39 99.98 99.68

ev
Dense|stack|bi-LSTM|textCNN 99.56 99.28 99.84 99.56 99.71 99.36 99.92 99.64
Dense|concat|bi-LSTM|RF 99.16 99.04 99.28 99.16 99.28 98.89 99.68 99.28
Dense|concat|bi-LSTM|textCNN 99.12 98.58 99.68 99.12 99.28 99.20 99.36 99.28
SPP|stack|bi-LSTM|RF 99.05 98.64 99.49 99.05 99.50 99.36 99.64 99.50
SPP|stack|bi-LSTM|textCNN 99.44 99.22 99.68 99.44 99.28 99.04 99.52 99.28

r
SPP|concat|bi-LSTM|RF 99.20 98.66 99.76 99.20 99.44 99.28 99.60 99.44
SPP|concat|bi-LSTM|textCNN 99.24 99.04 99.44 99.24 99.28 99.20 99.36 99.28
er
b) RQ4: Which two-by-two intermodal combination yields the best detection performance towards vulner-
abilities in smart contracts?
Based on empirical evidence from Table 5 and Table 6, we conclude the following. (SC+BB) leads to higher
performances, followed by (BB+EVMB), while (SC+EVMB) yields the least detection performance in two-by-two
pe
intramodal settings.
c) RQ5: What approach appeals more between two-by-two and three-by-three intermodal settings?
We exploit the results from Table 5, Table 6 and Table 7. We observe that the three-by-three intermodal settings
achieve better vulnerability detection performance than the two-by-two settings.
d) RQ6: What settings deliver better results between intramodal and intermodal settings?
From the lines above, we learnt that three-by-three settings deliver better results than two-by-two intermodal
settings. We infer from Table 2, Table 3, Table 4, Table 5 and Table 6 that two-by-two intermodal settings outperform
ot

the intramodal configuration. We conclude that intermodal settings outperform intramodal settings in vulnerability
detection performance.
Fig. 5 illustrates the receiver operator characteristic (ROC) curves to weight our conclusions towards optimal
strategies for vulnerability detection in two-by-two and three-by-three intermodal settings.
tn

4.3.3. Outperforming Strategies In Our Framework


After the above performance analysis, we summarize the outperforming strategies smart contract vulnerability de-
tection framework in Table 8. As we can see in Table 8, the outperforming strategy of each intramodal and intermodal
feature fusion settings and its corresponding performance are displayed.
rin

Our framework provides strong white-box knowledge for intramodal, two-by-two and three-by-three intermodal
feature selection, and achieves higher vulnerability detection performance compared to existing methods.

5. Discussion
ep

5.1. Significance of Our Methodology


In smart contract vulnerability detection, feature extraction and feature fusion bear a certain complexity and de-
velopers lack a clear methodology to achieve such objectives. Also, the majority of the existing works leverages
intramodal and intermodal information without disclosing the fundamentals of their methodology. In most cases,
researchers choose and process features according to fixed rules under black-box settings. Our paper investigates
Pr

white-box methodology towards smart contract vulnerability detection.

12

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
we
ROC curve ROC curve
1.00 1.00

0.99
0.98
0.98
MP|stack|bi-LSTM|RF MP|stack|bi-LSTM|RF

ie
0.97
True positive rate

True positive rate


MP|stack|bi-LSTM|textCNN 0.96 MP|stack|bi-LSTM|textCNN
MP|concat|bi-LSTM|RF MP|concat|bi-LSTM|RF
0.96 MP|concat|bi-LSTM|textCNN MP|concat|bi-LSTM|textCNN
Dense|stack|bi-LSTM|RF Dense|stack|bi-LSTM|RF
0.95 Dense|stack|bi-LSTM|textCNN 0.94 Dense|stack|bi-LSTM|textCNN
Dense|concat|bi-LSTM|RF Dense|concat|bi-LSTM|RF
Dense|concat|bi-LSTM|textCNN Dense|concat|bi-LSTM|textCNN
0.94

ev
SPP|stack|bi-LSTM|RF 0.92 SPP|stack|bi-LSTM|RF
SPP|stack|bi-LSTM|textCNN SPP|stack|bi-LSTM|textCNN
0.93 SPP|concat|bi-LSTM|RF SPP|concat|bi-LSTM|RF
SPP|concat|bi-LSTM|textCNN SPP|concat|bi-LSTM|textCNN
0.92 0.90
0.000 0.005 0.010 0.015 0.020 0.025 0.00 0.01 0.02 0.03 0.04 0.05
False positive rate False positive rate

(a) SC+BB-ROC (b) SC+EVMB-ROC

r
ROC curve ROC curve
1.00 1.00

0.98

MP|stack|bi-LSTM|RF
er 0.99

MP|stack|bi-LSTM|RF
True positive rate

True positive rate

0.96 MP|stack|bi-LSTM|textCNN 0.98 MP|stack|bi-LSTM|textCNN


MP|concat|bi-LSTM|RF MP|concat|bi-LSTM|RF
MP|concat|bi-LSTM|textCNN MP|concat|bi-LSTM|textCNN
Dense|stack|bi-LSTM|RF Dense|stack|bi-LSTM|RF
pe
0.94 Dense|stack|bi-LSTM|textCNN 0.97 Dense|stack|bi-LSTM|textCNN
Dense|concat|bi-LSTM|RF Dense|concat|bi-LSTM|RF
Dense|concat|bi-LSTM|textCNN Dense|concat|bi-LSTM|textCNN
0.92 SPP|stack|bi-LSTM|RF 0.96 SPP|stack|bi-LSTM|RF
SPP|stack|bi-LSTM|textCNN SPP|stack|bi-LSTM|textCNN
SPP|concat|bi-LSTM|RF SPP|concat|bi-LSTM|RF
SPP|concat|bi-LSTM|textCNN SPP|concat|bi-LSTM|textCNN
0.90 0.95
0.00 0.01 0.02 0.03 0.04 0.05 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 0.0200
False positive rate False positive rate

(c) BB+EVMB-ROC (d) SC+BB+EVMB-ROC


ot

Figure 5: Two and Three layer intermodal fusion ROC curve


tn
rin

Table 8: Outperforming Strategies In Our Framework


Settings Methods Acc(%) Recall(%) Precision(%) F1(%)
SC MP|stack|bi-LSTM|RF 98.04 97.89 98.20 98.04
BB MP|stack|bi-LSTM|RF 98.27 97.56 99.01 98.2
EVMB Dense|stack|bi-LSTM|RF 95.64 94.96 96.40 95.6
ep

SC+BB Dense|stack|bi-LSTM|RF 99.39 98.89 99.90 99.39


SC+EVMB Dense|stack|bi-LSTM|textCNN 98.81 98.29 99.34 98.81
BB+EVMB MP|stack|bi-LSTM|textCNN 99.23 98.97 99.50 99.2
SC+BB+EVMB Dense|stack|bi-LSTM|textCNN 99.71 99.36 99.92 99.64
Pr

13

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
5.2. Flexibility in Feature Extraction

we
We notice that many published contracts lack their source code. Penetration testers in most cases have to deal
solely with contract bytecode, which increases the complexity of the analysis. We found that to concentrate on
the EVMB layer leads poor detection performance, and some other type of information from SC or BB layers are
necessary to improve the experience. Our framework can process contracts published with our without their source
code.

5.3. Limitations Of Our Framework

ie
Our architecture is limited by implementation rather than the choice of design. First, we think adopting the
majority of AI techniques is impractical and we investigate few AI models to design our methodology. Second, Our
work relies on the word2vec NLP technique and lacks support for out-of-vocabulary words. We propose to replace

ev
word2vec with the fastText NLP model [36] to solve such limitation.

6. Related Works

In this section, we revisit the issue of vulnerability detection in smart contracts. Our content is twofold. First,

r
we investigate the literature on smart contract vulnerability detection under rule-based static analysis. Second, we
re-explore existing data-driven approaches for smart contract vulnerability mining.
er
6.1. Rule-based approach for vulnerability detection under static analysis
To formalize the research on vulnerabilities in smart contracts, the work of Atzei et al. [9] proposed a taxonomy
of smart contract vulnerabilities. Following such an initiative, Argañaraz et al. [10] opted for the use of static
analysis over contract source codes to extract both functional and security vulnerabilities. In their work, the authors
pe
formulate some expert-based rules towards the programming language of interest. Gao et al. [13] designed the
SmartEmbed vulnerability analysis tool to detect clones, bugs, and to validate the vulnerability-freeness of a contract.
Moreover, Gao et al. [13] outlined major disadvantage of relying on expert-based rules. It is cumbersome to keep
up with the attack surface and the sophistication of attacks. Furthermore, Gao et al. [13] vowed for the use of static
analysis over more reliable but expensive techniques such as dynamic symbolic execution [7][16][17] and dynamic
analysis [14][15]. The authors in [11] designed Slither, a fantastic and incremental static analysis based vulnerability
detection framework for solidity written smart contracts. Slither can accommodate new vulnerability detectors aiming
ot

at uncovering novel vulnerabilities in the wild.

6.2. Data-driven approach for static analysis-based vulnerability detection


tn

To keep pace with the always evolving smart contracts attack surface, the authors in [37] advocate the usage of
data-driven techniques, such as machine learning, for discovering vulnerabilities in smart contracts. The interesting
work of Eth2vec [25] applies unsupervised machine learning techniques to built-based features and EVM bytecode
extracted features. It aims to automatically cluster buggy contracts as well as cloned contracts. Although Eth2vec is
an innovative work, it only supports contracts present in the training dataset and lack the use of supervise learning.
The authors in [38] further advocate the use of supervised classification, which leads to higher detection results
rin

than unsupervised classification in natural language processing . Further to support AI-based vulnerability detection,
Teng et al. [39] introduce a static time-slicing source-code based protocol based on long short term memory (LSTM)
model, to fetch contracts which behave differently from the initial one reported on the Ethereum dapps website. The
work of [39] requires to manually extract the features from data in order to characterize the dataset. The compelling
work of Qian et al. [17] leverages a bidirectional-LSTM with an attention mechanism over various embedding vector
ep

dimensions. It aims to uncover the embedding vector dimension that yields the best vulnerability detection results.
Moreover, compared to state-of-the art AI models such as Vanilla-recurrent neural network (RNN), LSTM, and bi-
LSTM without self-attention, the work of Qian et al. [17] leads to higher performances.
The work of Liu et al. [34] designs a smart contract vulnerability detection framework that fuses features from
SC and BB layers in black box settings. Such a framework exploits an attentive multi-encoder network comprising
Pr

self-attention and cross-attention layers to detect vulnerabilities and to provide feature importance through weight in-
terpretability. Although, being such a promising piece of research, the work of Liu et al. [34] exhibits two drawbacks.
14

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
First it still builds upon expert-based rules and inherits the weaknesses associated to rule-based systems. Second, it
only supports smart contracts published with source code. It therefore lacks support for the EVM bytecode processing,

we
which is an important limitation.

7. Conclusion

In this paper, we design a novel framework that detects vulnerabilities in Ethereum-based solidity written smart

ie
contracts. To that end, We leverage three modalities of interest. First, features extracted from the contract source
code (SC layer). Second, features acquired during the contract compilation (BB layer). Third, features obtained
from the contract bytecode processing (EVMB layer). Our work is different from existing schemes that leverage
intramodal or two-by-two intermodal settings under a black-box approach. We propose the following innovations.

ev
First, we dismiss expert-based patterns, manual feature fusion and leverage AI automation. Second, we define a
theorethical methodology based on multiple supervised detection tasks in multimodal learning. We further evaluate
such tasks on real datasets with several state-of-the-art AI models. Third, we provide developers and researchers
with succinct white-box knowledge to achieve high performing vulnerability detection in intramodal and intermodal
settings. Our framework tolerates the absence of one or two modalities and supports contracts published without

r
source code. We empirically found that under intramodal settings, BB provides the best performances followed by
SC, with at last, EVMB. Under two-by-two intramodal settings, SC+BB performs better, followed by BB+EVMB
and finally SC+EVMB. Furthermore, we conclude that three-by-three intermodal settings outperform two-by-two
er
intermodal settings and two-by-two intermodal settings outdo intramodal settings. Although our scheme limits the
number AI models used, it is impractical to consider all AI models for vulnerability training and inference. We believe
our framework can advance the field of research towards smart contract vulnerability detection. As future works, we
aim to solve the out-of-vocabulary issue in our framework, and dive into feature importance.
pe
Appendix A. Vulnerability detection strategies

The lines below describe the multiple supervised tasks of interest that combine the hierarchy of fused features
together with state-of-the-art AI models under intramodal and intermodal settings.

Appendix A.1. Strategies under intramodal settings


ot

a) SC layer
At the SC layer, we select SC-W2V and SC-Bert as input features. We proceed with dimension unification with SPP,
MP and Dense separately. Then we fuse features on the one hand with concat and on the other hand with stack. To
train and evaluate the vulnerability detection, we proceed as follows. First, we adopt a bi-LSTM model with self
tn

attention combined with a random forest (RF) model. Second, we combine a self-attentive bi-LSTM with textCNN
model.
We investigate twelve tasks of interest at the SC layer, and we denote | as the pipeline symbol.

• task 1: (SC-W2V+SC-Bert) | SPP | concat | (bi-LSTM+self-attention) | textCNN.


rin

• task 2: (SC-W2V+SC-Bert) | SPP | concat | (bi-LSTM+self-attention) | RF.


• task 3: (SC-W2V+SC-Bert) | SPP | stack | (bi-LSTM+self-attention) | RF.
• task 4: (SC-W2V+SC-Bert) | SPP | stack | (bi-LSTM+self-attention) | textCNN.
ep

• task 5: (SC-W2V+SC-Bert) | MP | concat | (bi-LSTM+self-attention) | textCNN.


• task 6: (SC-W2V+SC-Bert) | MP | concat | (bi-LSTM+self-attention) | RF.
• task 7: (SC-W2V+SC-Bert) | MP | stack | (bi-LSTM+self-attention) | RF.
Pr

• task 8: (SC-W2V+SC-Bert) | MP | stack | (bi-LSTM+self-attention) | textCNN.

15

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
• task 9: (SC-W2V+SC-Bert) | Dense | stack | (bi-LSTM+self-attention) | RF.

we
• task 10: (SC-W2V+SC-Bert) | Dense | stack | (bi-LSTM+self-attention) | textCNN.
• task 11: (SC-W2V+SC-Bert) | Dense | concat | (bi-LSTM+self-attention) | textCNN.
• task 12: (SC-W2V+SC-Bert) | Dense | concat | (bi-LSTM+self-attention) | RF.
b) BB layer

ie
Regarding the BB layer, we select SSA-W2V, SSA-Bert and BB-CFG as input features. We implement dimension
unification using SPP, MP and Dense separately. Then we fuse features on the one hand with concat and on the other
hand with stack. For model training and inference, first, we adopt a bi-LSTM model with self attention combined
with a random forest (RF) model. Second, we combine a self-attentive bi-LSTM with textCNN model. We inspect

ev
twelve tasks of interest at the BB layer and we signify | as the pipeline symbol.

• task 13: (SSA-W2V+SSA-Bert+BB-CFG) | MP | concat | (bi-LSTM+self-attention) | textCNN.


• task 14: (SSA-W2V+SSA-Bert+BB-CFG) | MP | concat | (bi-LSTM+self-attention) | RF.

r
• task 15: (SSA-W2V+SSA-Bert+BB-CFG) | MP | stack | (bi-LSTM+self-attention) | textCNN.
• task 16: (SSA-W2V+SSA-Bert+BB-CFG) | MP | stack | (bi-LSTM+self-attention) | RF.
er
• task 17: (SSA-W2V+SSA-Bert+BB-CFG) | Dense | concat | (bi-LSTM+self-attention) | textCNN.
• task 18: (SSA-W2V+SSA-Bert+BB-CFG) | Dense | concat | (bi-LSTM+self-attention) | RF.
• task 19: (SSA-W2V+SSA-Bert+BB-CFG) | Dense | stack | (bi-LSTM+self-attention) | textCNN.
pe
• task 20: (SSA-W2V+SSA-Bert+BB-CFG) | Dense | stack | (bi-LSTM+self-attention) | RF.
• task 21: (SSA-W2V+SSA-Bert+BB-CFG) | SPP | concat | (bi-LSTM+self-attention) | RF.
• task 22: (SSA-W2V+SSA-Bert+BB-CFG) | SPP | concat | (bi-LSTM+self-attention) | textCNN.
• task 23: (SSA-W2V+SSA-Bert+BB-CFG) | SPP | Dense | (bi-LSTM+self-attention) | RF.
ot

• task 24: (SSA-W2V+SSA-Bert+BB-CFG) | SPP | Dense | (bi-LSTM+self-attention) | textCNN.

c) EVMB layer
tn

Concerning the EVMB layer, we adopt EVMB-CFG and EVMB-ASM as input features. We leverage SPP, MP and
Dense separately to unify dimension. We fuse features on the one hand with concat and on the other hand with stack.
For model training and testing, first, we implement a bi-LSTM model with self attention combined with a random
forest (RF) model. Second, we incorporate a self-attentive bi-LSTM with textCNN model.
We evaluate twelve tasks of interest at the EVMB layer. The pipeline symbol | represents the process flow from
rin

feature selection to decision making.

• task 25: (EVMB-CFG+EVMB-ASM) | MP | concat | (bi-LSTM+self-attention) | textCNN.


• task 26: (EVMB-CFG+EVMB-ASM) | MP | concat | (bi-LSTM+self-attention) | RF.
• task 27: (EVMB-CFG+EVMB-ASM) | MP | stack | (bi-LSTM+self-attention) | textCNN.
ep

• task 28: (EVMB-CFG+EVMB-ASM) | MP | stack | (bi-LSTM+self-attention) | RF.


• task 29: (EVMB-CFG+EVMB-ASM) | Dense | concat | (bi-LSTM+self-attention) | textCNN.
• task 30: (EVMB-CFG+EVMB-ASM) | Dense | concat | (bi-LSTM+self-attention) | RF.
Pr

• task 31: (EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention) | textCNN.


16

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
• task 32: (EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention) | RF.

we
• task 33: (EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention) | textCNN.
• task 34: (EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention) | RF.

• task 35: (EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention) | textCNN.


• task 36: (EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention) | RF.

ie
Appendix A.2. Strategies under intermodal settings
a) SC+BB Combination
Regarding SC+BB combination, we set SC-W2V, SC-Bert, SSA-W2V, SSA-Bert and BB-CFG as input features. We

ev
unify feature dimension with MP, SPP and Dense separately. We fuse features on the one hand with concat and on
the other hand with stack. To train and evaluate AI models for vulnerability detection, we proceed as follows. First,
we deploy a bi-LSTM model with self attention combined with a random forest (RF) model. Second, we link a
self-attentive bi-LSTM with textCNN model.
We examine twelve tasks of interest under SC+BB combination.

r
• task 37: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | MP | concat | (bi-LSTM+self-attention) |
textCNN. er
• task 38: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | MP | concat | (bi-LSTM+self-attention) | RF.
• task 39: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | MP | stack | (bi-LSTM+self-attention) | textCNN.
pe
• task 40: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | MP | stack | (bi-LSTM+self-attention) | RF.

• task 41: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | Dense | concat | (bi-LSTM+self-attention) |


textCNN.
• task 42: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | Dense | concat | (bi-LSTM+self-attention) |
RF.
ot

• task 43: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | Dense | stack | (bi-LSTM+self-attention) |


textCNN.
• task 44: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | Dense | stack | (bi-LSTM+self-attention) |
tn

RF.

• task 45: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | SPP | concat | (bi-LSTM+self-attention) |


textCNN.
• task 46: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | SPP | concat | (bi-LSTM+self-attention) | RF.
rin

• task 47: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | SPP | stack | (bi-LSTM+self-attention) | textCNN.

• task 48: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG) | SPP | stack | (bi-LSTM+self-attention) | RF.


b) SC+EVMB Combination
ep

Regarding SC+EVMB combination, we fix SC-W2V, SC-Bert, EVMB-CFG, and EVMB-ASM as input features. We
apply dimension unification using MP, SPP and Dense separately. We fuse features with concat and stack respectively.
For AI models training and inference, first, we choose a bi-LSTM model with self attention combined with a random
forest (RF) model. Second, we adopt a self-attentive bi-LSTM with textCNN model.
We evaluate twelve tasks of interest under SC + EVMB combination.
Pr

• task 49: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | MP | concat | (bi-LSTM+self-attention) | textCNN.

17

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
• task 50: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | MP | concat | (bi-LSTM+self-attention) | RF.

we
• task 51: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-LSTM+self-attention) | textCNN.
• task 52: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-LSTM+self-attention) | RF.

• task 53: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | Dense | concat | (bi-LSTM+self-attention) | textCNN.


• task 54: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | Dense | concat | (bi-LSTM+self-attention) | RF.

ie
• task 55: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention) | textCNN.
• task 56: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention) | RF.

ev
• task 57: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention) | textCNN.
• task 58: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention) | RF.
• task 59: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention) | textCNN.

r
• task 60: (SC-W2V+SC-Bert+EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention) | RF.

c) BB+EVMB Combination er
As of BB+EVMB combination, we define SSA-W2V, SSA-Bert, BB-CFG, EVMB-CFG, and EVMB-ASM as input
features. We enforce dimension unification with SPP, MP and Dense separately. We fuse features with concat and
stack respectively. To support vulnerability detection under multimodal learning, first, we integrate a bi-LSTM model
with self attention combined with a random forest (RF) model. Second, we adopt a self-attentive bi-LSTM with
pe
textCNN model.
We explore twelve tasks of interest under the BB + EVMB combination.
• task 61: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | concat | (bi-LSTM+self-attention)
| textCNN.

• task 62: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | concat | (bi-LSTM+self-attention)


| RF.
ot

• task 63: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-LSTM+self-attention)


| textCNN.
• task 64: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-LSTM+self-attention)
tn

| RF.
• task 65: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | concat | (bi-LSTM+self-
attention) | textCNN.
• task 66: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | concat | (bi-LSTM+self-
rin

attention) | RF.
• task 67: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention)
| textCNN.
ep

• task 68: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | stack | (bi-LSTM+self-attention)


| RF.

• task 69: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention)


| textCNN.
Pr

• task 70: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | concat | (bi-LSTM+self-attention)


| RF.
18

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
• task 71: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention)
| textCNN.

we
• task 72: (SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | stack | (bi-LSTM+self-attention)
| RF.

d) SC+BB+EVMB Combination
Concerning SC+BB+EVMB full combination of layers, we set SC-W2V, SC-Bert, SSA-W2V, SSA-Bert, BB-CFG,

ie
EVMB-CFG, and EVMB-ASM as input features. We implement dimension unification with SPP, MP and Dense
separately, and fuse features with concat and stack respectively. For vulnerability detection under multimodal learning,
first, we deploy a bi-LSTM model with self attention combined with a random forest (RF) model. Second, we endorse
a self-attentive bi-LSTM with textCNN model.

ev
We explore twelve tasks of interest under SC+BB+EVMB combination.
• task 73: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | concat |
(bi-LSTM+self-attention) | textCNN.
• task 74: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | concat |

r
(bi-LSTM+self-attention) | RF.
• task 75: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-
er
LSTM+self-attention) | textCNN.
• task 76: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | MP | stack | (bi-
LSTM+self-attention) | RF.
pe
• task 77: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | concat
| (bi-LSTM+self-attention) | textCNN.
• task 78: (SC-W2V + SC-Bert + SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | concat
| (bi-LSTM+self-attention) | RF.

• task 79: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | stack |


ot

(bi-LSTM+self-attention) | textCNN.
• task 80: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | Dense | stack |
(bi-LSTM+self-attention) | RF.
tn

• task 81: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | concat |


(bi-LSTM+self-attention) | textCNN.
• task 82: (SC-W2V + SC-Bert + SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | concat
| (bi-LSTM+self-attention) | RF.
rin

• task 83: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | stack |


(bi-LSTM+self-attention) | textCNN.
• task 84: (SC-W2V+SC-Bert+SSA-W2V+SSA-Bert+BB-CFG+EVMB-CFG+EVMB-ASM) | SPP | stack |
(bi-LSTM+self-attention) | RF.
ep

References
[1] S. Liu, Global spending on blockchain solutions 2024 — statista, 2020. https://www.statista.com/statistics/800426/
worldwide-blockchain-solutions-spending.
[2] N. Szabo, Smart contracts : Building blocks for digital markets, 2018.
Pr

[3] Solidity programming language, 2018. https://github.com/ethereum/solidity.


[4] V. Buterin, Ethereum whitepaper, 2013. https://github.com/ethereum/wiki/wiki/White-Paper.

19

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
[5] M. Bartoletti, L. Pompianu, An empirical analysis of smart contracts: platforms, applications, and design patterns, in: International con-
ference on financial cryptography and data security, Springer, 2017, pp. 494–509. https://link.springer.com/chapter/10.1007/

we
978-3-319-70278-0_31.
[6] K. Delmolino, M. Arnett, A. Kosba, A. Miller, E. Shi, Step by step towards creating a safe smart contract: Lessons and insights from a
cryptocurrency lab, in: International conference on financial cryptography and data security, Springer, 2016, pp. 79–94. https://eprint.
iacr.org/2015/460.pdf.
[7] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, A. Hobor, Making smart contracts smarter, in: Proceedings of the 2016 ACM SIGSAC conference
on computer and communications security, 2016, pp. 254–269. https://dl.acm.org/doi/pdf/10.1145/2976749.2978309.
[8] T. Zimmermann, N. Nagappan, L. Williams, Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista,

ie
in: 2010 Third international conference on software testing, verification and validation, IEEE, 2010, pp. 421–428. https://ieeexplore.
ieee.org/abstract/document/5477059/.
[9] N. Atzei, M. Bartoletti, T. Cimoli, A survey of attacks on ethereum smart contracts (sok), in: International conference on principles of
security and trust, Springer, 2017, pp. 164–186. https://link.springer.com/chapter/10.1007/978-3-662-54455-6_8.
[10] M. Argañaraz, M. Berón, M. J. Pereira, P. Henriques, Detection of vulnerabilities in smart contracts specifications in ethereum platforms,

ev
in: 9th Symposium on Languages, Applications and Technologies (SLATE 2020), volume 83, Schloss Dagstuhl–Leibniz-Zentrum fuer
Informatik, 2020, pp. 1–16. https://bibliotecadigital.ipb.pt/bitstream/10198/22794/1/OASIcs-SLATE-2020-2.pdf.
[11] J. Feist, G. Grieco, A. Groce, Slither: a static analysis framework for smart contracts, in: 2019 IEEE/ACM 2nd International Workshop on
Emerging Trends in Software Engineering for Blockchain (WETSEB), IEEE, 2019, pp. 8–15. https://arxiv.org/pdf/1908.09878.
pdf.
[12] S. Kalra, S. Goel, M. Dhawan, S. Sharma, Zeus: analyzing safety of smart contracts., in: Ndss, 2018, pp. 1–12. http://pages.cpsc.
ucalgary.ca/~joel.reardon/blockchain/readings/ndss2018_09-1_Kalra_paper.pdf.

r
[13] Z. Gao, L. Jiang, X. Xia, D. Lo, J. Grundy, Checking smart contracts with structural code embedding, IEEE Transactions on Software
Engineering (2020). https://arxiv.org/pdf/2001.07125.pdf.
[14] Mythril: Security analysis tool for evm bytecode, 2018. https://github.com/ConsenSys/mythril.
er
[15] J. Krupp, C. Rossow, teEther: Gnawing at ethereum to automatically exploit smart contracts, in: 27th USENIX Security Symposium
(USENIX Security 18), USENIX Association, 2018, pp. 1317–1333. https://www.usenix.org/conference/usenixsecurity18/
presentation/krupp.
[16] I. Nikolić, A. Kolluri, I. Sergey, P. Saxena, A. Hobor, Finding the greedy, prodigal, and suicidal contracts at scale, in: Proceedings of the 34th
annual computer security applications conference, 2018, pp. 653–663. https://dl.acm.org/doi/pdf/10.1145/3274694.3274743.
[17] P. Qian, Z. Liu, Q. He, R. Zimmermann, X. Wang, Towards automated reentrancy detection for smart contracts based on sequential models,
pe
IEEE Access 8 (2020) 19685–19695. https://ieeexplore.ieee.org/abstract/document/8970384/.
[18] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, B. Murphy, Cross-project defect prediction: a large scale experiment on data vs. domain
vs. process, in: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium
on The foundations of software engineering, 2009, pp. 91–100. https://dl.acm.org/doi/pdf/10.1145/1595696.1595713.
[19] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A. Y. Ng, Multimodal deep learning, in: ICML, 2011. https://icml.cc/2011/papers/
399_icmlpaper.pdf.
[20] Y. Dai, F. Gieseke, S. Oehmcke, Y. Wu, K. Barnard, Attentional feature fusion, in: Proceedings of the IEEE/CVF Winter Conference
on Applications of Computer Vision, 2021, pp. 3560–3569. https://openaccess.thecvf.com/content/WACV2021/papers/Dai_
ot

Attentional_Feature_Fusion_WACV_2021_paper.pdf.
[21] L. Dai, F. Gao, R. Li, J. Yu, X. Shen, H. Xiong, W. Wu, Gated fusion of discriminant features for caricature recognition, in: International
Conference on Intelligent Science and Big Data Engineering, Springer, 2019, pp. 563–573. https://link.springer.com/chapter/10.
1007/978-3-030-36189-1_47.
[22] H. Zhou, Z. Fang, Y. Gao, B. Huang, C. Zhong, R. Shang, Feature fusion network based on attention mechanism for 3d semantic segmen-
tn

tation of point clouds, Pattern Recognition Letters 133 (2020) 327–333. https://www.sciencedirect.com/science/article/pii/
S0167865520300994.
[23] Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: International conference on machine learning, PMLR, 2014,
pp. 1188–1196. http://proceedings.mlr.press/v32/le14.pdf.
[24] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding,
in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 4171–4186. https:
rin

//aclanthology.org/N19-1423.
[25] N. Ashizawa, N. Yanai, J. P. Cruz, S. Okamura, Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum
smart contracts, in: Proceedings of the 3rd ACM International Symposium on Blockchain and Secure Critical Infrastructure, 2021, pp. 47–59.
https://dl.acm.org/doi/pdf/10.1145/3457337.3457841.
[26] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE trans-
actions on pattern analysis and machine intelligence 37 (2015) 1904–1916. https://link.springer.com/content/pdf/10.1007/
ep

978-3-319-10578-9_23.pdf.
[27] X. Ouyang, K. Gu, P. Zhou, Spatial pyramid pooling mechanism in 3d convolutional network for sentence-level classification,
IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (2018) 2167–2179. https://ieeexplore.ieee.org/
abstract/document/8413124/.
[28] N. Dong, Q. Feng, M. Zhai, J. Chang, X. Mai, A novel feature fusion based deep learning framework for white blood cell classifi-
cation, Journal of Ambient Intelligence and Humanized Computing (2022) 1–13. https://link.springer.com/article/10.1007/
s12652-021-03642-7.
Pr

[29] Z. Zhang, Z. Tang, Y. Wang, Z. Zhang, C. Zhan, Z. Zha, M. Wang, Dense residual network: Enhancing global dense feature flow for character
recognition, Neural Networks 139 (2021) 77–85. https://www.sciencedirect.com/science/article/pii/S0893608021000472.

20

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099
d
[30] C. Olah, S. Carter, Attention and augmented recurrent neural networks, Distill 1 (2016) e1. https://distill.pub/2016/
augmented-rnns/?spm=a2c4e.11153940.blogcont640631.83.666325f4P1sc03.

we
[31] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint
arXiv:1412.3555 (2014). https://arxiv.org/abs/1412.3555.
[32] T. Parr, The definitive antlr 4 reference, The Definitive ANTLR 4 Reference (2013) 1–326. https://www.torrossa.com/en/resources/
an/5241753.
[33] F. Bond, Solidity grammar for antlr4, 2019. https://github.com/solidityj/solidity-antlr4.
[34] Z. Liu, P. Qian, X. Wang, L. Zhu, Q. He, S. Ji, Smart contract vulnerability detection: From pure neural network to interpretable graph feature
and expert pattern fusion, in: IJCAI, 2021, pp. 2751–2759. https://www.ijcai.org/proceedings/2021/0379.pdf.

ie
[35] M. Zhang, Z. Cui, M. Neumann, Y. Chen, An end-to-end deep learning architecture for graph classification, in: Proceedings of the AAAI
conference on artificial intelligence, volume 32, 2018. https://ojs.aaai.org/index.php/AAAI/article/view/11782.
[36] A. Joulin, E. Grave, P. B. T. Mikolov, Bag of tricks for efficient text classification, EACL 2017 (2017) 427. https://aclanthology.org/
E17-2.pdf#page=459.
[37] J. A. Harer, L. Y. Kim, R. L. Russell, O. Ozdemir, L. R. Kosta, A. Rangamani, L. H. Hamilton, G. I. Centeno, J. R. Key, P. M. Ellingwood,

ev
et al., Automated software vulnerability detection with machine learning, arXiv preprint arXiv:1803.04497 (2018). https://arxiv.org/
pdf/1803.04497.pdf.
[38] F. Hill, K. Cho, A. Korhonen, Learning distributed representations of sentences from unlabelled data, in: Proceedings of the 2016 Conference
of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1367–1377.
https://aclanthology.org/N16-1162.pdf.
[39] T. Hu, X. Liu, T. Chen, X. Zhang, X. Huang, W. Niu, J. Lu, K. Zhou, Y. Liu, Transaction-based classification and detection approach
for ethereum smart contract, Information Processing & Management 58 (2021) 102462. https://www.sciencedirect.com/science/

r
article/pii/S0306457320309547.

er
pe
ot
tn
rin
ep
Pr

21

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4331099

You might also like