0% found this document useful (0 votes)
61 views11 pages

NoSQL Index Selection with DRL

Uploaded by

Naasir Ansar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views11 pages

NoSQL Index Selection with DRL

Uploaded by

Naasir Ansar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Information Sciences 561 (2021) 20–30

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

Index selection for NoSQL database with deep reinforcement


learning
Yu Yan, Shun Yao, Hongzhi Wang ⇑, Meng Gao
School of Computer Science, Harbin Institute of Technology, Harbin, China

a r t i c l e i n f o a b s t r a c t

Article history: With the development of big data technology, the data management of complex applica-
Received 6 July 2020 tions has become more and more resource intensive. In this paper, we propose an auto-
Received in revised form 31 December 2020 matic approach (DRLISA) to achieve NoSQL database index selection. For different
Accepted 3 January 2021
workloads, we automatically select its corresponding indexes and parameters which can
Available online 26 January 2021
totally improve the database performance. Our DRLISA establishes an optimal index by
building a deep reinforcement learning model which is able to adapt the dynamic change
Keywords:
of workloads. We conducted our experiments in five aspects (the impact of data manipu-
NoSQL
Index section
lation, the impact of operation count, comparison with random selection, comparison with
Reinforcement learning existing method and the robustness of DRLISA) using the open source benchmark, YCSB.
The experimental results showed that DRLISA has a high efficient index recommendation
under the dynamic workloads.
Ó 2021 Elsevier Inc. All rights reserved.

1. Introduction

The development of the Internet and cloud computing requires databases to be able to effectively store and process big
data. With the demand for high-performance when reading and writing, NoSQL databases are more and more widely used.
So far, more than 225 different NoSQL stores have been reported and the list is still growing [1]. The index recommendation
in NoSQL becomes more and more important.
In NoSQL databases, it is essential to provide scalable and efficient index services for real-time data analysis. The index
selection plays an important role in the NoSQL database performance improvement. Only if we can correctly select the index
that fits the workload can the performance of the database be effectively guaranteed.
There are two key challenges when we establish indexes in NoSQL. First, the knob of the database becomes more and
more complex for achieving more efficient data management, while finding the best result of the index configuration from
these knobd is more difficult for normal users. Secondly, adapting dynamic workloads leads to a large number of resource
consumptions, because the DBAs select indexes by consulting and counting statistical characteristics. Also, frequently
obtaining these statistical characteristics in big data is hard to afford.
Index selection needs to be compatible with most workloads and as many indexes as possible. For most of the time, the
workload is not immutable, but will change while the database is being used. In addition, the optimal index selections under
different hardware environments have wide differences.
To meet these challenges, we use the reinforcement learning (RL) model to achieve index selection. There are two reasons
why we choose the RL model rather than another ML model. The first one is that RL models are growing, they can evolve

⇑ Corresponding author.

https://doi.org/10.1016/j.ins.2021.01.003
0020-0255/Ó 2021 Elsevier Inc. All rights reserved.
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

from scratch. In the domain of database index selection, there are not enough training sets which are needed for normal
machine learning model, so we need a convenient method to conduct index selection. Another one is that RL is very flexible
and efficient; actions and states can be increased and decreased at will. We can establish the proper configurations based on
the corresponding database knob.
Different from existing methods, we focus on a general index selection method which fits various workloads under dif-
ferent hardware environments. In this paper, we explore selecting the optimal index structure and dynamically tuning its
parameters at the same time by using reinforcement learning. There are still some potential problems in the process of using
a deep reinforcement learning model for the intelligent selection decision of the index structure.

 First, defining the environment required for deep reinforcement learning in our situation is non-trivial. The reason is that
there is no intuitive relationship between index selection, the state and actions of reinforcement learning.
 Second, how to set the action of reinforcement learning and its corresponding reward function is also very worth explor-
ing, since a bad cost function will make the reinforcement learning model difficult to converge.
 Third, we should obtain the optimal index structure and its parameter from the Q value learned in the deep reinforcement
learning model. It is also worth noting that the local optimal solution that selects the maximum Q value action every time
may not be the global optimal solution.

Our solution to the NoSQL index selection problem and our developed Deep Reinforcement Learning Index Selection
Approach (DRLISA) are distinct from the related work in several ways. The uniqueness of our approach relies on the combi-
nation of:

 Recommendation of index parameters for proposed indexes while recommending the index structure.
 Using reinforcement learning model as a heuristic method to deal with index selection under the dynamic workload.
 Selecting the index based on the Q value from the reinforcement learning network.

In existing modern databases, the means for self-tuning [2,3] specialized indexes are provided. Nevertheless, none of the pre-
sent index selection approaches have recommended a general method of the recommendation of index parameters. They
only recommend index parameters of one specialized index. Whereas in this paper, we first introduce a general reinforce-
ment learning method to select the index and its parameters and then in our approach, we deal with the situation with a
dynamic workload. Having such features in such a database management accomplice will not only reduce the overhead
for the administrator of the system to choose proper indexes, but also provide a good skeleton for developing an automatic
index selection system.
In the following sections, we discuss the specific implementation of our index selection method. Section 2 provides the
necessary background of our index selection architecture. Section 3 shows our design of the index selection architecture in
detail. Section 4 is about the testing of our selected index, showing how it outperformed than the single immutable indexes.

2. Related work

Machine learning has made great strides in recent years. Yu J, Zhang J and Hong C et al. proposed some new ML models
[4–12] to process images, such as SPRNet [4], SPE-VLAD [5], MRCL, etc. In addition to processing images, machine learning is
also widely used in other fields which promotes high efficiency, such as query prediction [13], machine translation [14], sen-
timent analysis [15], anomaly detection [16], cloud computing [17], pattern detection [18], time series forecasting [19],
active learning [20], etc. These new research studies have provided new opportunities for index selection.
The index selection problem has been studied for many years. During the first stage, researchers study how to automat-
ically select an index with iterated methods. In these iterated methods [21–24], some candidate indexes were selected first,
then the best indexes found by iterating on these candidates. With the advent of big data, the cost of iterating is not able to
be afforded. Some more intelligence models [25,26] were proposed, such as NoDBA [25], AI meets [26], etc. These methods
are more efficient than these iterated methods. However, these models is not flexible enough for the existing index struc-
tures[27–29,2,30,3,31] and dynamic workload. Additionally, the current research work on the index selection usually uses
one single immutable index configuration. However, single index structure can hardly deal with workloads in many different
situations.

3. Background

3.1. Reinforcement learning

Reinforcement learning models are able to map scenarios for the appropriate actions with the goal of maximizing a cumu-
lative reward [32]. At each timestamp, t, the agent will observe a state of the environment, st , and will select an action, at .
The selected action depends on the policy, p. This policy can reenact several types of behaviors. As an example, it can either
act greedily or balance between exploration and exploitation through an -greedy (or better) approach. The policy is driven
21
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

by the expected rewards of each state, which the model must learn. Given the selected action, the model will arrive at a new
state, stþ1 . The environment then sends the agent a reward, rtþ1 , signaling the ‘‘goodness” of the selected action. The agent’s
goal is to maximize this total reward [32]. One approach is to use a value-based iteration technique, where the model records
state-action values, QL(s,a). These values specify the long-term desirability of the state by taking into account the rewards for
the states that are likely to follow [32].

3.2. Deep Q networks

A deep Q network (DQN) is a multi-layered neural network that for a given state ‘s’ outputs a vector of action values Q(s,
a;h), where h are the parameters of the network [33]. Two important ingredients of the DQN algorithm as proposed by Mnih
et al. (2015) are the use of a target network and the use of experience replay [34]. Both the target network and the experience
replay dramatically improve the performance of the algorithm.

3.3. Dueling network

The different between the dueling network and DQN is that dueling network splits one estimator into two separate esti-
mators; one for the state value function and the other for the state-dependent action advantage function [35]. The main ben-
efit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement
learning algorithm. The architecture leads to a better policy evaluation in the presence of many similar-valued actions.

3.4. Problem definition

The index selection problem [36,37] has been proposed for many years. We formulate this problem as follows.

 Input: the workload N, and the dataset, which is stored in NoSQL, and has a certain storage structure.
 Output: recommendation indexes (use C to express the result) to the corresponding input which takes the maximizing
throughout as the optimal goal.

In our paper, we use a modified RL model to solve the index selection problem. Workload N is related to the state of the RL
model (S), and the reward of the RL model is counted from the dataset which has a certain storage structure. The output C is
also a part of S and will become better and better with the process of iterating. When the current S gets enough reward, the
final C after a series action is then the best index configuration which can achieve the optimal goal for the current workload
and storage.

4. Nosql deep reinforcement learning index selection approach (drlisa)

In this paper, we use the term configuration to mean an index and its parameters. The goal of the DRLISA is to select an
index configuration that is as close to the optimal index configuration as possible for a given database and workload, where
the workload consists of a set of NoSQL data manipulation statements. Section 4.1 overviews the DRLISA. The details of the
Reward Evaluation module are shown in Section 4.2. Section 4.3 introduces the reinforcement learning model. In Section 4.4,
we discuss how to select the optimal index after the training model. In addition, Section 4.5 is the discussion about the
DRLISA.

4.1. Overview

An overview of the architecture of the DRLISA is shown in Fig. 1. The dotted line in the figure denotes the process bound-
ary between our modules and the NoSQL Database. The index selection tool takes as input a workload on a specified NoSQL
database.
This approach has a basic Reinforcement Learning Model module. Our method employs a classical RL algorithm, DDQN
[38], which is a model-free method. In our Reinforcement Learning Model module, we take the input workload and index
configuration representations as the input. In each step, the RL model will recommend an action from the action set for a
certain state. The reward and our learning strategy will be introduced in Sections 4.2 and 4.3, respectively.
Revolving around the Reinforcement Learning Model module, other modules of the DRLISA are as follows.
The Index Changing Action module is responsible for taking actions from the Reinforcement Learning Model module and
executing them in the NoSQL Database.
We use the Benchmark module to get the index performance from the NoSQL Database. The Reward Evaluation module
calculates the reward value of an index changing action by using the index performance we got from the Benchmark module.
The Index Configuration module gets the index configuration representation of current index structure and related
parameters from the NoSQL database. The Index Recommendation module is used for selecting the final optimal index con-
22
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

Fig. 1. The architecture of the DRLISA.

figuration according to the reward from the Cost Evaluation module and the index configuration representation from the
Index Configuration module.
The details of the Reward Evaluation module, Reinforcement Learning Model module, and Index Configuration module
will be introduced in Sections 4.2, 4.3 and 4.4, respectively.

4.2. Reward evaluation

The reward function is the most important part of each reinforcement learning model. The reasonableness of the reward
function generally determines whether or not the reinforcement learning model converges. The reward function should give
the value of actions based on the information fed back by the environment. Thus, in this section, we focus on the reward
evaluation.
The reward function r t of an index changing action of time t should consider the time reward r time ðtÞ, and switch index
cost cswitch ðtÞ of a new index. The performance, which we can get from the Benchmark module by testing data under the given
workload, would help us to calculate these previous functions. If the action keeps the index configuration, the reward func-
tion is set to zero. st is the state representation of time t including the index configuration and workload, and the reward
function of an action of time t is defined as
 
r time ðtÞ  cswitch ðtÞ; st – st1
rt ¼ ð1Þ
0; st ¼ st1

The time reward function r time ðtÞ needs to show the difference between the time cost of time t-1 and time t. Compared to the
method of directly calculating the function with time cost, the calculation with the throughput can better reduce the error
and is not affected by the variable of the number of operations. According to the throughput pt of time t representing the
number of NoSQL database operations per second, which we can get from the Benchmark module, where k is a constant
and the time reward function r time ðtÞ of time t is defined as follow
( )
lnððpt pk t1 Þ þ 1Þ; pt > pt1
r time ðtÞ ¼ ð2Þ
0; pt 6 pt1

The index switching cost cswitch ðtÞ should be a small negative value compared to the time reward function rtime ðtÞ. By adding a
small negative index switching cost we can reduce the actions that make the time reward function larger. Besides, we can
also add the space cost function into the reward function when we need to consider the space cost.

4.3. Reinforcement learning model

We present a sample structure of our deep Q network in Fig. 2. The reinforcement learning model module takes as the
input state representation s = N,C, where N and C represent the workload representation and index configuration represen-
tation, respectively. The output of the reinforcement learning model module is the Q values of each action from the action
set. Obviously, we can adjust the parameters of the deep Q network to fit all scenarios. Fig. 2 shows a network structure used
in YCSB[39], which is an open source benchmark. We efficiently complete the prediction of the Q values only using three
simple full connection layers. The action set includes several different actions like keeping the current configuration, chang-
ing the index structure, and changing the index parameters.
23
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

Fig. 2. The architecture of our deep Q network.

For the index selection, the value of the state representation is more important than the action advantage. In our index
selection tool, we use the dueling network architecture as the neuron network architecture of the reinforcement learning
model to train faster and make the model performance better. The reason is that, distinct from DQN, the dueling network
architecture has an independent fully connected layer to output a scalar representing the state value, which well fits our
problem. With such independent layers, we can get more accurate state values.
Due to the positive value of the reward function we used, the learning of the reinforcement learning model might show
the phenomenon of cyclical scoring. The actions it used will be cyclical, and the discounted return Rt also increases. Obvi-
ously, we do not want to see this phenomenon.
In each episode, we store the appeared state list L for recording all appeared states, and it could avoid the cyclical actions.
When one of the configuration representations appears twice, we break the episode. The reason why it works is that the
derivative of the cost function rt we used near zero is larger than away from zero, and the minus between the throughput
pt is linear. The optimal strategy to get the maximum discounted return Rt is making the index gradually better. We can then
get the optimal index configuration by finding the maximum discounted return Rt .

Algorithm 1: Training the Deep Reinforcement Learning Model


Require: Workload N
1: Initialize the environment Env
2: Build deep reinforcement learning neuron network architecture RL
3: for episode ¼ 0 . . . num  1 do
4: Reset the environment Env with workload N and a random initial index configuration C
5: Get the current state s from the environment
6: Clear the appeared state list L
7: repeat
8: Add the current state s to the appeared state list L
9: Choose the action by run RL with the current state s
10: Get the next state s0 and reward from Env
11: Store the transition ðs; a; r; s0 Þ into the experience pool
12: Sampling transitions to update network
13: s ¼ s0
14: until the current state s is already in the appeared state list L
15: end for

We propose Algorithm 1 to train the deep reinforcement learning models. As we can see in Algorithm 1, we first initialize
the environment of the data storage. In every step, we then reset the workload and a new random index configuration in the
environment. After testing the current reward, we extract the state and clear the list of states, which is stored into the expe-
rience pool as a transition. Our RL model is trained by these transitions. Next, we would introduce our detailed learning
strategy.
24
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

At the beginning, Lines 1–2 initialize the environment required for reinforcement learning and the deep reinforcement
learning neural network. Lines 3–14 then perform many episodes. For every episode, Line 4 resets the workload and a
new random index configuration in the environment. We then extract the current state from the environment and clear
the list of states that appeared in Lines 5–6. Lines 7–14 perform many steps in each episode.
At the beginning of each step, Line 8 adds the current state to the list of states that appeared. After passing the current
state to the neural network, we get recommended actions in Line 9. By performing actions in the environment, we can get
the next state and the reward value of this action in Line 10.
Since we need to train the neural network model, Line 11 puts the current state, the next state, this action and the reward
value of this action as a transition into the experience pool. After several steps, Line 12 extracts some transitions from the
experience pool for neural network learning. Line 13 assigns the next state to the current state to complete one step. Line 14
repeats each step in Lines 8–13 until the current state appears in the list of appeared states.
After executing all the episodes, the reinforcement learning neural network training corresponding to the workload is
completed.
The time complexity of the algorithm is OðP  ðT a þ T nn ÞÞ, where P is the total number of training steps, T a is the time com-
plexity required to perform actions in the environment, and T nn is affected by different neural network structures, including
the time required for the neural network to propagate forward and backward.

4.4. Index recommendation

After the training of the reinforcement learning model is completed, we only have the value corresponding to each action.
What we need is the optimal index configuration. The Index Recommendation module gives the approach to selecting the
optimal index configuration as follows.
The algorithm selecting the optimal index configuration is slightly different from the algorithm in the Training the Deep
Reinforcement Learning Model. In each episode, we discount the reward of each action, and we record the current state and
its corresponding discounted return. The discounted return can help us to find the final optimal index configuration from the
reinforcement learning model. Initializing a random configuration, we can choose an action by using the reinforcement
learning model until an episode ends. By repeating this process, we could get some configurations and their corresponding
discounted returns. What we need to do is select the configuration which has a maximum discounted return.
Algorithm 2 selects the final index configuration based on the deep reinforcement learning model. The input to the algo-
rithm is the workload. Initialization in Lines 1–3 is similar to the initialization of the training reinforcement learning model;
the difference is that we need to additionally initialize the optimal index configuration and its corresponding maximum dis-
counted reward in Line 3. Lines 4–18 can then perform many episodes.

Algorithm 2: Selecting the Optimal Index Configuration


Require: Workload N
1: Initialize the environment Env
2: Load deep reinforcement learning neuron network RL
3: Initialize the optimal index configuration C optimal and optimal maximum discounted return Rmax
4: forepisode ¼ 0 . . . num  1do
5: Reset the environment Env with workload N and a random initial index configuration C
6: Get the current state s from environment Env
7: Clear the appeared state list L
8: Reset current discounted return R
9: repeat
10: ifRmax < Rthen
11: Rmax ¼ R
12: C optimal ¼ C
13: endif
14: Add current state s to appeared state list L
15: Choose action by run RL with current state s
16: Get next state s0 and reward r from Env
17: R¼Rþr
18: s ¼ s0
19: untilcurrent state s is already in the appeared state list L
20: end for
21: return C optimal

The initialization of each episode is also the same as the initialization of the training reinforcement learning model in
Lines 5–7. We need to additionally initialize the current discounted reward to zero in Line 8. Lines 9–18 perform many steps
in each episode.
25
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

At the beginning of each step, Lines 10–12 update the maximum discounted reward of the optimal index configuration
with the current discounted reward, while maintaining the optimal index configuration. Line 13 then adds the current state
to the list of states appeared. After passing the current state to the neural network, we can obtain the recommended actions
in Line 14. With the actions in the environment, we can get the next state and the reward value of this action in Line 15. Line
16 adds the current discounted reward to the reward value of this action. Line 17 assigns the next state to the current state to
complete one step. Line 18 repeats each step until the current state appears in the list of states that have occurred.
After executing all the episodes, the final optimal index configuration is returned as the algorithm output in Line 19.
The time complexity of the algorithm is OðP0  ðT a þ T nn ÞÞ, where P 0 is the total number of trying steps, T a is the time com-
plexity required to perform actions in the environment, and T nn is affected by different neural network structures, including
the time required for the neural network to propagate forward and backward.

4.5. Discussion

DRLISA selects the index configuration more reasonable than manually selecting the index, and it could deal with the
NoSQL database intelligent index selection problem. Different from single index structure, DRLISA supports selecting the
index structure and its parameter under dynamic workloads. Besides, DRLISA shows its strong scalability, reflected in that
we can add other indexes to the index set, add actions to the action set or modify the reward evaluation function in different
situations as extensions. The strong scalability DRLISA is consistent with the concept of the NoSQL database. We believe that
the Deep Reinforcement Learning Index Selection Approach (DRLISA) can bring a new inspiration to the NoSQL index selec-
tion problem.

5. Experiment

We now show the practical performance of DRLISA. Section 5.1 presents the details of the dataset and environment. Sec-
tion 5.2 gives a brief introduction of our metrics. We also show the detailed experiment results and discussion in Section 5.4.

5.1. Dataset and environment

We use an open source benchmark to evaluate our method. The benchmark is YCSB (Yahoo! Cloud Serving Benchmark)
[39], whose workload contains common NoSQL data manipulation statements, i.e., Insert, Update, Scan, Read, and RMW
(Read-Modify-Write). The YCSB can control the proportion of the manipulation statements, and we generate all kinds of
workloads which are described in the corresponding experimental part. If there is no special introduction, we only change
the type of workloads in the default configuration of YCSB.
In our experiments, we use Python TensorFlow to implement our approach [40]. The database used is PostgreSQL with a
Foreign Data Wrapper for RocksDB. We test common-used index structures including B-Tree, Hash and LSM-Tree.
We have run this proposed model on a system having 8 processors with 8 GB RAM. This server is running with Ubuntu
18.04 LTS.

5.2. Metric

We evaluated the efficiency of our DRLISA using the average options per second (Throughput) and the average time of
query execution (ATQ), which are defined as

the number of queries


Throughput ¼ ð3Þ
the total time

the total time


ATQ ¼ ð4Þ
the number of queries
Throughout and ATQ are used widely to clarify the efficiency of an index selection model [21,23,25,28,29]. Especially,
throughout is a very representative metric in the index evaluation.
Additionally, we evaluated the average cost time of the index recommendation, which is defined as

the total costing


ACT ¼ ð5Þ
the number of times
By analyzing the ACT, we can demonstrate the time efficiency of our model.

5.3. Experimental ssetup

In our experiment, the learning rate is initially set to 0.001, the reward decay is set to 0.7, epsilon-greedy is set to 0.7 and
the memory size is set to 50,000. The number of neurons in three hidden layers are 16, 8 and 8. And we tested the robustness
26
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

of these parameters. We used the YCSB to demonstrate the efficiency of our model. First, we train five workloads, which have
only one data manipulation, with 1000 episodes each. Secondly, we select 150 random workloads with 300 episodes to make
the model more universal. We then use the trained model to predict the best index configuration of the dynamic workloads.

5.4. Experimental results and discussion

We tested our model from five aspects containing the impact of data manipulation, the impact of operate count, compar-
ison with a single index structure, comparison with random selection and the robustness of our model.

5.4.1. The impact of data manipulation


We used the trained model to test the performance of DRLISA in five extreme and different workloads (with 100% read,
update, scan, insert or RWM), and we compared the results with three single traditional index structures. As one can see in
Table 1, the index selected by our approach has a better performance than a single index including B-Tree, Hash and LSM-
Tree. Specifically, the DRLISA increased by a 10% throughout in the 100% scan, and even in the worst case, our method still
increased by a 1% throughout than the normal single index cases. This is because single index structures unavoidably have
shortcomings in certain data manipulations, and DRLISA can merge their strengths and make some decisions in the index
parameter.

5.4.2. The impact of operation count


Fig. 3 shows the impact of count of the operations on the throughput of the workload that the insertion proportion is
100%. The relatively stable throughput indicates the stability of the test environment. This is, the performance of the indexes
is the stability with the number of operations.

5.4.3. Average time of index selection


In Fig. 4, we test the average time of index selection with different operation counts in our methods. The black line is the
real value and the red line is a corresponding linear line which demonstrates the cost of the DRLISA is a linear growth with
the operation count. The reason why our model has a high time efficiency is that DRLISA is able to quickly reach convergence
on the condition of some useful training.

5.4.4. Comparison with random selection


For the common NoSQL workload, in Fig. 5, we test the performance comparison between the indexes selected by DRLISA
and randomly selected indexes under the six workloads provided by YCSB by default. The results show that our model is able
to efficiently recommend indexes, and performs better than random indexes.

Table 1
The impact of data manipulation.

Selected B-Tree Hash LSM-Tree


Workload ACT (ms) Throughput ACT (ms) Throughput ACT (ms) Throughput ACT (ms) Throughput
100% Read 0.0004 2403.8400 0.0004 2325.5800 0.0005 2212.3800 0.0036 278.5000
100% Update 0.0255 39.2500 0.0266 37.6000 0.0258 38.8000 0.0550 18.1800
100% Scan 0.0002 4524.9000 0.0002 4048.6000 0.0002 4065.0000 0.0332 30.1000
100% Inserted 0.0004 2415.5000 0.0256 39.0000 0.0254 39.3000 0.0004 2352.9000
100% RMW 0.0258 38.7000 0.0275 36.3000 0.0261 38.3000 0.0990 10.1000

Fig. 3. The impact of operation count.

27
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

Fig. 4. Average time of index selection.

Fig. 5. Comparison with random selection.

5.4.5. Comparison with NoDBA [25]


For some workloads we never trained before, the approach also worked and the performance we got is better than the
NoDBA indexes structures. We tested 100 workloads, and the efficiency by using our method can improve about 5% com-
pared to NoDBA. Table 2 shows partial results of our experiments which introduces the types of workloads and throughout
and ACT in two different index structures. Our method, which is based on reinforcement learning, is applicable to all cases
with only limited training.

5.4.6. The robustness of DRLISA


In our model, there are many parameters. We tested four main parameters to clarify the robustness of our model. We use
the Variable-controlling approach to conduct our test, and only change one parameter in a test. As one can see in Fig. 6, the
throughput of indexes recommended by DRLISA is stability with the change in the learning rate, the reward decay, epsilon-
greedy and the memory size. The four results could totally demonstrate the robustness of our method.

5.4.7. Summary
Based on experiments, we confirmed that the NoSQL deep reinforcement learning index selection method (DRLISA) has
good performance under different workloads. After sufficient training, under the changing workload, the performance of the
NoSQL deep reinforcement learning index selection method (DRLISA) is improved to different degrees according to the tra-
ditional single index structure. For the common NoSQL database workload, the NoSQL deep reinforcement learning index
selection method (DRLISA) has a good performance. Therefore, the index selection method combined with deep reinforce-
ment learning can effectively select the index corresponding to the workload.

6. Conclusion

In this study, we described a model that uses deep reinforcement learning for index selection for the NoSQL database.
Based on our index selection architecture, we use the deep Q network with a dueling network architecture to incrementally
learn the actions of the workloads.
As future studies, we propose to use the index selection architecture with reinforcement learning in conjunction with the
traditional index structure and other new index structures like the learned index structure, etc, to learn the optimal index for
the SQL or NoSQL databases.
28
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

Table 2
Comparison with NoDBA.

Workloads Selected NoDBA


Read Update Scan Insert RMW Throughput ACT (ms) Throughput ACT (ms)
50% 0% 50% 0% 0% 2352.94 0.0004 2141.32 0.0005
40% 60% 0% 0% 0% 323.62 0.0031 316.75 0.0032
0% 70% 0% 0% 30% 205.8 0.0049 202.67 0.0049
20% 20% 40% 0% 0% 445.03 0.0022 440.14 0.0023
0% 0% 80% 0% 20% 730.46 0.0014 700.77 0.0014
60% 20% 0% 0% 20% 447.02 0.0022 418.76 0.0024
0% 20% 0% 40% 40% 226.03 0.0044 207.38 0.0048
40% 0% 0% 0% 60% 313.18 0.0032 308.26 0.0032
20% 20% 60% 0% 0% 712.25 0.0014 676.58 0.0015
20% 0% 0% 20% 60% 242.65 0.0041 238.94 0.0042
0% 0% 40% 20% 40% 316.65 0.0032 291.12 0.0034

Fig. 6. The robustness of DRLISA.

CRediT authorship contribution statement

Yu Yan: Conceptualization, Methodology, Software, Validation, Data curation, Writing - original draft, Writing - review &
editing. Shun Yao: Conceptualization, Methodology, Software, Validation, Data curation, Writing - original draft. Hongzhi
Wang: Conceptualization, Methodology, Writing - original draft, Writing - review & editing. Meng Gao: Software, Validation,
Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.

Acknowledgements

This paper was supported by NSFC grant U1866602.


29
Y. Yan, S. Yao, H. Wang et al. Information Sciences 561 (2021) 20–30

References

[1] Nosql database list, [EB/OL], https://hostingdata.co.uk/nosql-database/..


[2] D. Ohene-Kwofie, E.J. Otoo, G. Nimako, O2-tree: a fast memory resident index for nosql data-store, in: 2012 IEEE 15th International Conference on
Computational Science and Engineering, 2012, pp. 50–57.
[3] C. Feng, X. Yang, F. Liang, X.H. Sun, Z. Xu, Lcindex: a local and clustering index on distributed ordered tables for flexible multi-dimensional range
queries, International Conference on Parallel Processing (2015).
[4] J. Yu, J. Yao, J. Zhang, Z. Yu, D. Tao, Single pixel reconstruction for one-stage instance segmentation, arXiv preprint arXiv:1904.07426..
[5] J. Yu, C. Zhu, J. Zhang, Q. Huang, D. Tao, Spatial pyramid-enhanced netvlad with weighted triplet loss for place recognition, IEEE Trans. Neural Networks
Learn. Syst. 31 (2) (2020) 661–674.
[6] J. Zhang, J. Yu, D. Tao, Local deep-feature alignment for unsupervised dimension reduction, IEEE Trans. Image Process..
[7] J. Yu, M. Tan, H. Zhang, D. Tao, Y. Rui, Hierarchical deep click feature prediction for fine-grained image recognition, IEEE Trans. Pattern Anal. Mach.
Intell. PP (99) (2019) 1–1..
[8] C. Hong, J. Yu, J. Zhang, X. Jin, K.-H. Lee, Multimodal face-pose estimation with multitask manifold deep learning, IEEE Trans. Industr. Inf. 15 (7) (2018)
3952–3961.
[9] J. Yu, D. Tao, M. Wang, Y. Rui, Learning to rank using user clicks and visual features for image retrieval, IEEE Trans. Cybern. 45 (4) (2015) 767–779.
[10] J. Yu, M. Tan, H. Zhang, D. Tao, Y. Rui, Hierarchical deep click feature prediction for fine-grained image recognition, IEEE Trans. Pattern Anal. Mach.
Intell. PP (99) (2019) 1–1..
[11] C. Hong, J. Yu, J. Wan, D. Tao, M. Wang, Multimodal deep autoencoder for human pose recovery, IEEE Trans. Image Process. A Publ. IEEE Signal Process.
Soc. 24 (12) (2015) 5659–5670.
[12] C. Hong, J. Yu, D. Tao, M. Wang, Image-based 3d human pose recovery by multi-view locality sensitive sparse retrieval, IEEE Trans. Ind. Electron. 99..
[13] Z. Chu, J. Yu, A. Hamdulla, A novel deep learning method for query task execution time prediction in graph database, Fut. Gen. Comput. Syst. 112 (2020)
534–548, http://www.sciencedirect.com/science/article/pii/S0167739X19321156.
[14] M. Araújo, A. Pereira, F. Benevenuto, A comparative study of machine translation for multilingual sentence-level sentiment analysis, Inf. Sci. 512 (2020)
1078–1102, http://www.sciencedirect.com/science/article/pii/S0020025519309879.
[15] B. Peng, J. Wang, X. Zhang, Adversarial learning of sentiment word representations for sentiment analysis, Inf. Sci. 541 (2020) 426–441, http://
www.sciencedirect.com/science/article/pii/S0020025520306320.
[16] P. Tang, W. Qiu, Z. Huang, S. Chen, M. Yan, H. Lian, Z. Li, Anomaly detection in electronic invoice systems based on machine learning, Inf. Sci. 535 (2020)
172–186, http://www.sciencedirect.com/science/article/pii/S0020025520302693.
[17] D. Mrozek, A. Koczur, B. Maysiak-Mrozek, Fall detection in older adults with mobile iot devices and machine learning in the cloud and on the edge, Inf.
Sci. 537 (2020) 132–147, http://www.sciencedirect.com/science/article/pii/S0020025520304886.
[18] I. González-Carrasco, J.L. Jiménez-Márquez, J.L. López-Cuadrado, B. Ruiz-Mezcua, Automatic detection of relationships between banking operations
using machine learning, Inf. Sci. 485 (2019) 319–346, http://www.sciencedirect.com/science/article/pii/S0020025519301409.
[19] A.R.S. Parmezan, V.M. Souza, G.E. Batista, Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-
art and the best conditions for the use of each model, Inf. Sci. 484 (2019) 302–337, http://www.sciencedirect.com/science/article/pii/
S0020025519300945.
[20] D. Wu, C.-T. Lin, J. Huang, Active learning for regression using greedy sampling, Inf. Sci. 474 (2019) 90–105, http://www.sciencedirect.com/science/
article/pii/S0020025518307680.
[21] M. Hammer, A. Chan, Index selection in a self-adaptive data base management system, in: The 1976 ACM SIGMOD International Conference, 1976..
[22] K.U. Sattler, I. Geist, E. Schallehn, Quiet: continuous query-driven index tuning, Proceedings 2003 VLDB Conference (2003) 1129–1132.
[23] M. Schkolnick, The optimal selection of secondary indices for files, Inf. Syst. 1 (4) (1975) 141–146.
[24] M. Stonebraker, The choice of partial inversions and combined indices, Int. J. Comput. Inf. Sci. 3 (2) (1974) 167–188.
[25] A. Sharma, F.M. Schuhknecht, J. Dittrich, The case for automatic database administration using deep reinforcement learning, arXiv preprint
arXiv:1801.05643..
[26] B. Ding, S. Das, R. Marcus, W. Wu, V.R. Narasayya, Ai meets ai: leveraging query executions to improve index recommendations, in: The 2019
International Conference, 2019.
[27] J. Wang, Y. Zhang, Y. Gao, C. Xing, plsm: a highly efficient lsm-tree index supporting real-time big data analysis, IEEE Computer Software and
Applications Conference (2013).
[28] Y. Li, G.B. Kim, L.R. Wen, H.Y. Bae, Mhb-tree: a distributed spatial index method for document based nosql database system, Lec. Notes Electr. Eng. 214
(2013) 489–497.
[29] X. Guan, C. Bo, Z. Li, Y. Yu, St-hash: an efficient spatiotemporal index for massive trajectory data in a nosql database, International Conference on
Geoinformatics (2017).
[30] R. Mayuram, R. Mayuram, R. Mayuram, R. Mayuram, Nitro: a fast, scalable in-memory storage engine for nosql global secondary index, Proc. Vldb
Endowment 9 (13) (2016) 1413–1424.
[31] A. Tai, M. Wei, M.J. Freedman, I. Abraham, D. Malkhi, Replex: A scalable, highly available multi-index data store, USENIX Annual Technical Conference
(USENIX ATC 16), Denver, CO 2016 (2016) 337–350.
[32] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.
[33] H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, Comput. Sci..
[34] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G.A. Ostrovski, Human-level control
through deep reinforcement learning, Nature..
[35] Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, N. Freitas, Dueling network architectures for deep reinforcement learning, Vol. 48 of Proceedings
of Machine Learning Research, New York, New York, USA, 2016, pp. 1995–2003..
[36] D.P. Brown, J. Chaware, M. Koppuravuri, Index selection in a database system..
[37] E. Barcucci, A. Chiuderi, R. Pinzani, M.C. Verri, Index Selection in Relational Databases, Springer Berlin Heidelberg, 1989.
[38] Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, N. Freitas, Dueling network architectures for deep reinforcement learning, Vol. 48 of Proceedings
of Machine Learning Research, New York, New York, USA, 2016, pp. 1995–2003..
[39] B.F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, R. Sears, Benchmarking cloud serving systems with ycsb, in: Proceedings of the 1st ACM
Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10–11, 2010, 2010..
[40] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D.G.
Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, Tensorflow: a system for large-scale machine learning, in:
Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, USENIX Association, USA, 2016, pp. 265–283.

30

You might also like