0% found this document useful (0 votes)
24 views15 pages

DQNAS: Neural Architecture Search Using Reinforcement Learning

The paper presents DQNAS, an automated Neural Architecture Search framework that utilizes Reinforcement Learning and One-shot Training to design Convolutional Neural Networks (CNNs) efficiently. It addresses the challenges of manual architecture design, which is time-consuming and requires expert knowledge, by automating the selection of hyperparameters and configurations. The research highlights the limitations of existing methodologies and proposes a framework that aims to improve performance while minimizing computational resource consumption.

Uploaded by

tanmaysarode37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views15 pages

DQNAS: Neural Architecture Search Using Reinforcement Learning

The paper presents DQNAS, an automated Neural Architecture Search framework that utilizes Reinforcement Learning and One-shot Training to design Convolutional Neural Networks (CNNs) efficiently. It addresses the challenges of manual architecture design, which is time-consuming and requires expert knowledge, by automating the selection of hyperparameters and configurations. The research highlights the limitations of existing methodologies and proposes a framework that aims to improve performance while minimizing computational resource consumption.

Uploaded by

tanmaysarode37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DQNAS: Neural Architecture Search using

Reinforcement Learning
Anshumaan Chauhan
Department of Computer Science and Engineering
Florida Institute of technology
Melbourne, United States
[email protected]

Siddhartha Bhattacharyya
Department of Computer Science and Engineering
Florida Institute of technology
Melbourne, United States
[email protected]

S. Vadivel
Department of Computer Science
BITS Pilani, Dubai Campus
Dubai, United Arab Emirates
[email protected]

Abstract— Convolutional Neural Networks have been used in a from student attendance system using face recognition to
variety of image related applications after their rise in popularity medical image processing.
due to ImageNet competition. Convolutional Neural Networks
A CNN typically consists of the following layers:
have shown remarkable results in applications including face
recognition, moving target detection and tracking, classification Convolutional Layers, Pooling Layers and Fully Connected
of food based on the calorie content and many more. Designing of Layers. The problem in designing a Neural Network involves
Convolutional Neural Networks requires experts having a cross manually selecting several parameters including number of
domain knowledge and it is laborious, which requires a lot of hidden layers, number of neurons in each hidden layer,
time for testing different values for different hyperparameter
objective function to be minimized, learning rate, dropout
along with the consideration of different configurations of
existing architectures. Neural Architecture Search is an rates, activation function to be used [2] and hyperparameters
automated way of generating Neural Network architectures such as stride, padding and filter size in case of CNN [3]. As
which saves researchers from all the brute-force testing trouble, the range of values that can be used for these parameters is
but with the drawback of consuming a lot of computational huge, there are infinite number of combinations that can be
resources for a prolonged period. In this paper, we propose an
made, and this makes the manual selection quite difficult.
automated Neural Architecture Search framework DQNAS,
guided by the principles of Reinforcement Learning along with Therefore, there are state-of-the-art architectures that are
One-shot Training which aims to generate neural network handcrafted by the experts who have expertise in the field [1],
architectures that show superior performance and have cross domain knowledge of Deep Learning, Computer Science
minimum scalability problem. and Optimization. However, a CNN which is performing well
Keywords—Neural Architecture Search; Convolutional Neural
on one dataset, may not perform well on the other. Therefore,
Networks; Reinforcement Learning; Recurrent Neural Networks;
One-shot Training there is a high demand for the automated framework that takes
the data as input and gives a well performing architecture as
I. INTRODUCTION an output [4].
The origin of Convolution Neural Networks (CNNs) dates Designing of a Neural Network architecture can be
to 1950, when Perceptron algorithm was invented [1]. categorized as a model selection problem [5]. Direct solution
However, Deep Learning neural networks, especially, CNNs can be applying hyperparameter optimization, that is, getting
became extremely popular after the introduction of ImageNet the optimized values for parameters such as number of layers,
competition by University of Toronto in 2012. CNNs have activation function, etc. Hyperparameter optimization is also
shown promising results in the task of Image classification known as black box optimization problem because there is no
and Image segmentation in a variety of applications ranging explanation of the mapping done between the architecture that
has been created, the performance achieved and the learning
task. Three meta-modelling aspects that are used for automatic to stop either if the number of layers exceeded the maximum
generation of CNN architecture are Hyperparameter layer allowed or the Controller epochs have reached maimum
Optimization, Evolutionary Algorithms and Reinforcement number of iterations.
Learning. The current research area of NAS can be divided into three
In the past few years, there have been many fully automated types as shown in Figure 1.
algorithms and frameworks that have been developed for this
task, such as:
1) MetaQNN [3]
2) BlockQNN [21]
3) GeNet [28]
4) AutoKeras [47]

and many more. Most of the research is focused on using


Evolutionary Methods and Reinforcement Learning.
The paper is organized as follows: Section II contains
theoretical explanation of different algorithms used for this Fig. 1. Research Areas in NAS
task. Section III illustrates drawbacks faced by currently
existing methodologies. Section IV briefly summarizes the Search Space: This research area focuses on what should be
working and performance of different frameworks integrated present in the search space. There are many methods such as
for this research and finally Section V comprised of the Layer-by-Layer, Cell-wise and Block structures that can be
conclusion which concludes the paper. used to fill up the search space. Each method has their own
drawbacks, Layer-by-Layer makes the search space too large,
II. LITERATURE SURVEY and it takes a lot of time for the searching algorithm to find a
In this section, we will discuss about Network Architecture good architecture, whereas for block and cell-based search
Search (NAS), and different Meta-modelling techniques that space, they face the drawback of limited options. For example,
are used for the automatic generation of CNN architectures all the search space is made using ResNet and DenseNet block
along with their respective drawbacks. structures, then there are endless possibilities that are not even
For classical machine learning algorithms the problem of being considered in this search space. It can be thought of as a
finding optimal values for hyperparameter was resolved using trade-off between the search space and generalizability.
techniques such as Grid Search, Random Seach, Meta
Learning and Bayesian Optimization. But these methods are Search Strategy: Optimization algorithms that should be used
difficult to implement for the optimization of deep learning to explore the search space for finding out the best possible
architecture parameters [6]. Drawback of these algorithms is architecture faster while maintaining a good accuracy is
that they take too much time to find the optimal values for the decided in this phase. Mostly, all the research has been
hyperparameters. broadly categorized into three parts as shown in Figure 2.
More details regarding these algorithms have been provided in
A. Concept of Neural Architecture Search
the following sub sections.
Whenever we are developing a deep learning architecture
for a particular application, data engineers are expected to
have knowledge about what type of architecture might
perform well on the given data. But, the possible number of
architectures that can show good results are infinte, and hence
the need for automatic architecture selection came up. The
main aim behind the concept of NAS was the automation of
the process of finding an architecture that show good results
for a given dataset.
Zoph et al. [7] was the first one to use Reinforcement
Learning to generate the values for different layers of a CNN.
These different values of hyperparaters will as a whole formed
the search space. They used a Controller which used to
develop architetcures, which were further trained on CIFAR-
Fig. 2. Search Algorithms used to explore the Search Space of NAS
10 dataset for a number of epochs. The reward function used
to take the validation accuracies of last 5 epochs of the created
architecture and then calculate a discounted reward which was
given to Controller for training it. Finally, the algorithm used
There are also two other approaches: Network Morphing and
Game theory that are used, but not much research has been
done in that area.
Evaluation Method: The way in which we should train and
evaluate our models such that it takes minimum time is the
main objective of this research area. There are many
techniques that are proposed such as extrapolation of the
accuracy curve and predicting the final accuracy, training the
models for a smaller number of epochs, training the model on
small dataset [8], or this is also addressed by sometimes
limiting the size of Neural Network to a particular number of Fig. 3. Bayesian Optimization
hidden layers. But results show that whenever we apply these
strategies, there is an effect on the accuracy of the model and Kandaswamy et al. [10] proposed an approach which used
is also misleading when we are ranking these algorithms based Gaussian processes for generating architectures for simple
on the validation accuracy. This shows that there is inverse multi-layer perceptron. To find the best value from the search
relationship (trade-off) between performance and latency. The space, they used a distance-based optimization function,
need for an approach that speeds up the search process without which was optimized using a transport-based algorithm.
affecting the accuracy has also been mentioned in [9]. Although TSPE have shown better results than Grid Search
The bottleneck problem of Neural Architecture Search is and Random Forest, a Deep Neural Network was used by the
the high computational time and hardware resources required authors for architecting a surrogate model.
by the algorithms for finding an efficient architecture. Most of Motta et al. [22] used Grid Search and Random Search for
the models that have shown to generate an architecture with the optimization of their proposed CNN’s hyperparameters
high accuracy were susceptible to high resource consumption that would increase its accuracy for morphological
due to the substantial number of parameters associated with classification of mosquitoes. Their proposed methodology
the Neural Network. showed 93.5% balanced accuracy in detecting the target
Jaafra et al. [1] briefly describes different state-of-the-art mosquito out of all the other insects.
CNN architectures and different layers that composes it. A Gülcü at al. [23] applied a variant of Simulated Annealing
summary along with reviews of all the different meta- for getting optimized values of hyperparameters and searching
modelling approaches that have been applied for the task of of CNN architecture called Microcanonical Optimization
automating the Convolutional Neural Network Architecture Algorithm. Main advantage of using this approach was there a
using Reinforcement Learning is given. Different approaches smaller number of parameters in the CNN architecture
that can be used to accelerate the search process such as Early generated as compared to CNNs generated by other
stopping, usage of distributed asynchronous framework and methodologies. Like [3], they limited the values for the
network transformation was mentioned in the paper. hyperparameters used for the architecture generation. But they
claim, if random values are used for the hyperparameter, then
B. Hyperparameter Optimization the architectures generated are quite inefficient.
Grid Search, Random Search Bayesian optimization-based Yang et al. [24] applied Particle Swarm Optimization (PSO)
methods were some of the algorithms that were used for algorithm to fine-tune the number and size of the filters used
Hyperparameter Optimization. Discussed earlier, we have in Convolutional and Pooling layers. They used
seen the huge time complexity of Grid Search and Random Reinforcement Learning for the initial CNN architecture
Search, so researchers started working on Bayesian which is later fine-tuned using PSO. To have better
Optimization methods for Hyperparameter Optimization and it performing architectures they have included skip connections
showed impressive results in the earlier phase of Neural and multi-branch as primitive operations. Additionally, replay
Architecture Search. It works on the concept of optimizing the memory algorithm was used to break the correlation between
acquisition function and maintaining a surrogate model that the experiences and enhancement of exploration of search
learns from the previous evaluations. Surrogate models that space.
have shown promising results are: Li et al. [25] proposed a system to optimize the values for
1) Gaussian Processes hyperparameters in a massively parallel fashion called ASHA.
2) Random Forests This approach targets early stopping and parallel computing as
3) Tree Structured Parzen Estimator (TSPE) its base methodologies for tuning values of a lot of
hyperparameters. Results showed that ASHA performed better
Pseudocode of the Bayesian Optimization model is shown in
Figure 3. than all state-of-the-art optimization algorithms and was
suitable for a great extent of parallelism.
C. Evolutionary Algorithms was proposed in Dufoura et al. [29]. The chromosome was
Traditionally, evolutionary algorithms were used for composed of two values, one was learning rate and other was
designing the architectures for Neural Networks. Evolutionary the CNN architecture. On top of having a drawback of fixed
algorithms are also known as Neuro-Evolution strategy. In size encoding scheme, pooling layers and skip connections
[11] it was mentioned that gradient-based method can easily were not incorporated in the architecture composition.
outperform Evolutionary Algorithms for weight optimization H. Cai et al. [30] designed a hardware efficient framework
of Neural Networks. It was concluded that Evolutionary named AutoML, which was not only responsible for automatic
Algorithms should be considered only for the optimization of design of CNN networks but also focused on model
the architecture. There are many algorithms classified as compression aspects using techniques such as Quantization
Evolutionary algorithms [12]: and Pruning, which lets the framework have a flexible
1) Genetic Algorithms
bandwidth and reduces the memory footprint. Though this
2) Evolution Strategies
methodology is now used a lot in the name of AutoKeras, it
3) Differential Evolutions
required a lot of hardware resources for computation and
4) Evolution of Distribution Algorithm
AutoKeras in general was used for making architecture and
However, only Genetic Algorithm have been used for
hyperparameter value changes in the existing state-of-the-art
this. Encoding schemes used can be broadly classified into
models. Polonskaia et al. [31] introduced an evolutionary
two types: Direct and Indirect encoding. In direct encoding
algorithm-based NAS approach called FEDOT-NAS, which
information such as number of neurons, connectivity between
uses FEDOT framework of AutoML. It focuses on reducing
layers and activation functions are stored as genotype,
the time complexity in the pipeline where training and
whereas in indirect encoding the generation rules of the
evaluation of the network created is done. They introduced
Neural Network architectures are used in the genotypes. The
methodologies like testing on small dataset, training for a
main advantage that led to the usage of Evolutionary
smaller number of epochs. They also compared their results
algorithms was their ability to handle both continuous and
with other frameworks such as AutoKeras, Auto-PyTorch,
discrete type of data. In NAS, discrete data can be number of
AutoCNN and CNN-GA.
layers, number of neurons, type of layer, etc. and continuous
data will have hyperparameters such as Learning rate and D. Reinforcement Learning
dropout rates. In the past few years, Reinforcement Learning algorithms
Sun et al. [4][26] designed a methodology using Genetic are the most used methodologies for creating a best
Algorithm, which uses ResNet and DenseNet blocks as the architecture from the available search space in an efficient
initial population and proposed a variable length encoding manner. Q-learning [13] and Proximal Policy Optimization
scheme that would speed up the architecture design process [14] processes are used for the task of exploring the search
for these blocks. Drawback with this process was that it only space. A simple implementation of Reinforcement Learning
uses two block structures and this in addition limits the for Neural Architecture Search is represented in Figure 4.
capability of the algorithm to explore other architectural
elements. Additionally, it requires a lot of computation
resources for the completion.
Chen et al. [27] also proposed an Evolutionary Algorithm
(EA) based meta-modelling approach, but they overcame two
of the limitations of EA by focusing more on generation of
Lightweight CNN architectures and using Ensemble learning
on the best performing architectures. Usage of a modified
squeeze fire module (inspired from SqueezeNet architecture)
led to reduced number of parameters in generated architecture,
Fig. 4. NAS implementation using Reinforcement Learning
and high accuracy results were achieved when tested on
validation dataset. Unlike a fixed length encoding scheme
One of the main decisions one must make is to what reward
used in GeNet [28], a variable length multi-level chromosome
function can be used that calculated a reward which guides the
was used in this paper for encoding the CNN architecture and
generator in an optimal way. For example, a reward function
connectivity. In GeNet, CNN architecture was generated using
to guide generator in a way that it uses minimum energy while
a graph evolution methodology, where they linked different
maintain a good accuracy can be:
convolutional nodes using links and the pooling layers acted
as intermediate nodes. The limitation of search space due to
𝑅 = (𝛼 ∗ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦) − ((1 − 𝛼) ∗ 𝐸𝑛𝑒𝑟𝑔𝑦) ()
fixed length encoding scheme is one of the major drawbacks
of the proposed approach. Another genetic algorithm-based
approach EDEN, which used a fixed length encoding scheme
Zoph et al. [7] was the first one to apply Reinforcement usage of compact modules and stacking them sequentially
Learning for the task of Neural Architecture Search (NAS). rather than searching in a layer-by-layer fashion. To promote
Proposed methodology NASNet uses a Gradient-based maximum flow of information between layers and
approach for training the Recurrent Neural network that acts strengthening of feature propagation, dense connectivity
as a controller, for designing better architectures. They also between layer was incorporated. They observed that when
have used skip connections in the search space, but value of epsilon was decreased then the accuracy of CNNs
irrespective of the high accuracy achieved using multi-branch were increased.
and skip connections, they are computationally expensive, Unlike most of the research using Q-learning as their
time expensive and requires a lot of rule-based systems for reinforcement algorithm, Mortazi et al. [35] proposed policy
creating valid architectures (as are susceptible to compilation gradient reinforcement learning which assigns a dice index to
failure during generation of CNN). each reward function. The average of the dice index for the
Baker, Bowen, et al. [3] proposed a novel model MetaQNN, last five validation accuracies of the model are averaged out
using Q-learning, a reinforcement learning algorithm for and is given as input for the reinforcement learning algorithm.
navigating through the search space. For efficient search they A new activation function names “Swish” was used instead of
incorporated experience replay and ε-greedy algorithm, which ReLU function. The results showed that a high accuracy was
indeed also tackles the issue of exploration versus maintained by the models while maintaining a low
exploitation. Validation accuracy is used to calculate the computational cost.
reward and updating the values in Q-table. Some constraints Tan et al. [36] used actual latency that was observed while
are put on the values that can be used for designing the running model on mobiles in the optimization function rather
architecture to reduce the search space, nevertheless the than using FLOPs as a measure for time. A factorised
enormous number of possible combinations leads to high time hierarchical search space was used in the proposed approach
complexity and huge computational resources required for MNasNet to enhance layer diversity. Proximal Policy Gradient
running the algorithm even on a small dataset. Combined with Optimization algorithm was used to update the weights of
Ensemble Learning algorithm the architectures designed using Recurrent Neural Network (Controller).
the proposed methodology outperforms the state-of-the-art Hsu et al. [37] proposed MONAS, a multi-objective Neural
architectures on Image classification datasets. Architecture Search that uses Reinforcement Learning for
Zhong et al. [21] focused on creation of network using generation of CNN architecture. In the paper, they proposed a
network blocks rather than focusing on a method where a mixed reward function that tried to optimize the scalability
network is designed in a layer-by-layer fashion and proposed a problem which will reduce the consumption of hardware
methodology called BlockQNN. They introduced a new resources, while maintaining a good accuracy score.
encoding scheme for representation of CNNs called Network Gou et al. [38] used Inverse Reinforcement Learning, a
Structure Code (NSC) which consists of information related to paradigm which relies on Markov Decision Process for layer
the five parameters they considered for network generation. selection of CNN topology. They proposed a mirror stimuli-
Using the concept of Q-learning and ε-greedy algorithm they function, which acts as a heuristic during the designing of
were able to create blocks, which were stacked sequentially to CNN architecture. Inverse Reinforcement Learning is used to
make a CNN that was efficient and effective. The drawback train the mirror stimuli function, which will enhance the
this approach faced was that it was too data specific, it cannot capability of navigating the search space efficiently without
be transferred to a dataset that have different input image size. being too restricted.
Y. Chen at al. [32] focused on creating a faster algorithm die Cai et al. [26] came up with a unique approach inspired
to their limited hardware resources (1 Graphical Processing from Net2Net [39] called Efficient Architecture Search (EAS),
Unit available). They generated a computational cell that and used Network Transformations (addition, deletion or
consisted of different layers including Convolutional Layer editing) on the layers of the models that have been already
(can be 1x1, 1x3 or 1x5), Pooling Layer, Multi-branch generated by the Bi-directional Long Short-Term Memory
connections, Skip connections and Fully Connected Layer. (Bi-LSTM). They used simple Convolutional Networks, as
This cell was later stacked sequentially for a bigger CNN their base networks, and performed operations on these
architecture. Gradient based methods were used for faster networks to generate an efficient network. They applied path-
optimization. Instead of using a dropout layer, they used level network transformation in this approach, in which
cutout [33] layer for better regularization and easier addition, modification and deletion of layers was performed
implementation. using network transformations [40]. They also made changes
Y. Chen et al. [34] deployed a learning agent that used in the meta-controller that was trained using Reinforcement
epsilon greedy algorithm and experience replay for efficient Learning. A new tree-structured LSTM was used as a meta-
navigation through the search space and generation of high controller and new operations such as Depth-separable
performing CNN architecture. Difference from MetaQNN was
Convolution (DSC) and replication, were introduced in the there were too hard coded, in the problem of generalizability
search space. arised.
Progressive Neural Architecture Search (PNAS), an When we are using Evolutionary Algorithms as Search
approach given by Liu et al. [41] uses a more complex Strategy approach, in this case the issue is that Genetic
modular structure called cell, which is composed of a few Algorithm are suspectible to high complexity, considering
blocks. A block is further composed of two out of the 8 valid different concepts such as curiosity, multi-objective
operations. Like [21], the cells are then stacked sequentially to optimization, etc. Some approaches have used fixed length
create a CNN architecture. PNAS was more than five times chromosomes for the architecture representation which
faster as compared to NASNet, while performing with the inhibits the Evolutionary algorithm to find the opitimal
same level of accuracy. This was achieved by using a strategy architecture, as we cannot know the length of architecture well
called Sequential model-based optimization (SMBO) process, in advance. Solving this problem by using a variable length
which instead of leading to a direct search in search space, chromosome, rises the problem of making updated mutation
performs a block-by-block search, that is, first a 1-block cell is and crossover operations. A concept named Pareto Optimality
created, which is then exploded by adding another block in [17] can be referred to in this search strategy. It is a concept
that cell, and this cycle further goes on until the threshold of that says we cannot enhance one objective without having an
maximum number of blocks a cell can contain is reached. adverse affect on the other. Here tradeoff can be seen between
Concept of sharing weights between the architectures the scalability and accuracy of the solution found [9]. Either
created using a large computational graph was introduced in we can find the solution quickly and deal with low accuracy or
ENAS [42]. They created architectures similarly as done in deal with scalability problem and get am optimal model with
PNAS, architectures were a subgraph that the larger graph and high accuracy. Further, being search bassed models,
hence shared weights of previous trained networks. To train Evolutionary algorithms are quite slow, if they are not
the Recurrent Neural Network they used policy gradient provided with some heuristics for faster convergence. In the
algorithm that focuses on maximizing the reward on the recent researches they have started using Lightweight
validation dataset. Due to sharing the weights and training Convolutional Neural Network blocks such as ShuffleNet [18]
only the best architectures on the whole validation dataset (the and EffNet [19], and Ensemble Learning to reduce the
architectures created were initially trained only on a mini- computational complexity.
batch dataset taken from validation dataset) this approach was The main drawback of using Reinforcement Learning is
proven to be a lot faster and less expensive than NASNet [7]. the high time complexity. Approach proposed in [7] took 800
GPUs for 28 days to run the whole algorithm.
III. ISSUES INVOLVED IN NEURAL ARCHITECTURE SEARCH The drawback that the expertise is required even the
There are many challenges involved related to the structure automated CNN architecture design algorithm, for example, in
of a Neural Network when we are generating the architecture [20] the base networks they use already are made by the
in an automated fashion. Setting up constraints for different experts and perform well on the given task. In [21], they have
hyperparameters such as what can be the maximum length of created block architectures that are purely based on
the architecture, which layer cannot be the first layer, etc. Not Convolutional Layers and Dense Layers, exclusion of Pooling
only does the structural issues related to Neural Network are Layer has decreased the search space leading to faster
there, the dataset used for training the intermediate models generation of CNN architecture but also had a negative impact
generated also plays an important role. on its accuracy. It also suffers from the high computational
Apart from these, we have the problem of deciding the cost problem specified above. Reinforcement Learning
search strategy approaches to be used as mentioned in Section algorithms also suffer from the problem of exploitation versus
II. Issues that the existing approaches face are discussed in exploration. Exploitation is the event when policy algorithm
this section. Firstly considering we use Hyperparamter (Q-learning, Proximal Policy Gradient, etc.) choses the action
Optimization, the limitation of using Bayesian Optimization which will give them the best reward repeatedly. Exploration
was the problem of optimizing the acquisition function due to is the event when a random action is chosen by the policy
the large number of parameters involved in Deep Neural algorithm to explore the large search space. While making a
Networks [1]. Formation of a empirical function is better as it CNN using a Reinforcement Learning we also must decide a
will be optimized in a faster manner as compared to an actual rule-based constraint system which will be used to design a
optimization function [15]. Another limitation of this approach valid architecture from the sequence given as an output by the
is that they can only search the models to be generated from a Controller. Constraints will involve checking whether the
fixed length space. This issue of having difficulty in input image size is matching with the layer, whether the filter
incorporating the connectivity of different layers in the size is valid according to input size, etc. Summary of the
network was disregarded when methods capable of searching drawbacks are formulated in Figure 5.
in a non fixed length architecture was proposed in [16]. But
Figure 5: Drawbacks of Different Meta-Modelling techniques

IV. DATA COLLECTION CIFAR100 50000 10000 32x32x3 RGB


Keras [43] is one of the most widely used library for
creating different Neural Networks. Apart from the in-built V. THEORITICAL BACKGROUND ON MODEL DEVELOPMENT
functions provided for Neural Network creation, it also
provides some of the commonly used datasets in Machine A. Double Deep Q Networks (Double DQN)
Learning Community such as MNIST digits classification Double DQN [44] approach was proposed by a group of
dataset, IMDB move review dataset and many more. We are scientists working in Google DeepMind, to overcome the
using MNIST, CIFAR10 and CIFAR100 datasets which are drawback faced by Q-learning algorithm of overestimating the
used for the task of training and testing a Neural Network for Q-values for some actions under specific conditions. Q-
object detection. learning faced the drawback of overestimating because the
MNIST, is a digit classification dataset, consisting of 60000 algorithm is trained after each step to update the Q-value based
images for training and 10000 images for testing. It consists of on the reward it gets for the action chosen for a given current
state. Additionally, Q-learning is a very resource extensive
total 10 classes, 1 for each integer in the range 0-9. Each
algorithm if the number of states is large in number, as to
image is of size 28x28 and is in a grayscale mode, therefore
update weights in the Q-table using Bellman Equation takes a
has only one filter layer. CIFAR10 and CIFAR 100 are object lot of memory and time. To overcome these drawbacks Double
detection datasets consisting of created by University of DQN methodology was proposed, which uses a combination of
Toronto consisting of 10 and 100 classification objects, two neural networks, which can be called Main and Target
respectively. Each of these datasets have 50000 images for model, respectively. These models are trained in a fashion that
training and 10000 images for testing. Images of CIFAR they are not susceptible to the drawback of overestimating the
datasets are of size 32x32x3, as they are RGB images, Q-values.
therefore one filter for each color is required.
Information for the datasets is summarized in Table 1. In Double DQNs, Main Q network is trained after each
prediction and the updated weights are calculated using a
TABLE I. DATASETS Bellman Equation. But the Target Q- network is trained only
Dataset Training Testing Image Type of once after a few predictions (Figure 6). Rather than being
trained on some data, Target Q network sets its weights to that
Images Images Size Image of the Main Q network at that instant. This solves the problem
MNIST 60000 10000 28x28x1 Grayscale of overestimating the Q-values.
CIFAR10 50000 10000 32x32x3 RGB
Figure 6: Double Deep Q Networks

B. Bellman Equation D. Prioritized Experience Replay


Bellman Equation is the optimality equation used for Experience replay [45] is a methodology used for the
updating the Q-values for a particular state-action pair. In training of reinforcement learning. In experience replay, a
Double DQN, Bellman Equation calculates the updated Q- memory buffer is created, which stores all the experiences of
value for the Main Q network by using Reward for the action the reinforcement learning algorithm, but they are sampled
A taken by Main Q network for State S and a discounted Q uniformly, which might contain negative experiences as well.
value for the best associated action for the new state S.’ In Prioritized Experience Replay [46], the most important
𝑄𝑀 (𝑆, 𝐴) = 𝑅 + 𝛾𝑄𝑇 (𝑆 ′ , 𝐴′) () transitions, are the ones sampled frequently, for an efficient
training of the Reinforcement Learning algorithm. In our
Where S is the current state, A is the action taken by Main Q proposed methodology, CNN models that showed high
network (QM), R is the reward achieved based on the action A, accuracies will be taken frequently from the replay memory
𝛾 is the discount factor, S’ is the new state and A’ is the best and used for the training of Main Q Network.
action associated with the state S’ according to the Target Q E. One-Shot Training
network (QT). One-Shot Training was basically introduced as an
C. Epsilon Greedy Algorithm algorithm for knowledge transfer between the networks for the
task of object recognition. The main ideology of One-Shot
Earlier discussed, another problem faced by
Reinforcement Learning algorithm is that of Exploration vs Training is to train a CNN with some initial knowledge rather
Exploitation. To solve this issue, Epsilon Greedy algorithm is than from scratch. Throughout the process, we will be storing
used, to make a balance between randomly exploring the the weights of CNN that are trained, later if there is a CNN
search space and exploiting the best possible action. In it a that has a subpart that is like that of a previously trained
random probability is generated, if it has a value greater than model, then weights for that part will be transferred from the
ε, then the action with maximum Q-value will be chosen by previously trained model to newly created model. After the
the Main Q network, else a random action is taken. The training and evaluation of this new model, the weights of the
working of Epsilon Greedy Algorithm is shown in Figure 7. sub-graph will be updated to the subgraph weights of the new
model. This approach helps the model in fast evaluation and
reaching its minimal error state in a shorter time interval.
Due to large combinations of layers from the search space,
Neural Architecture Search faces the issue of large space
complexity. Most of the existing methodologies have either
reduced the search space for the algorithm or have used GPUs
with high computational power, which results in not using the
algorithm up to its full capacity and is not easily available to
all researchers, respectively. Our contribution is in handling
this gap of reducing down the time complexity without
advanced hardware resources and without moderating the size
of search space. In our approach, we have proposed the use of
Double Deep Q- Networks, along with prioritizing experience
replay and epsilon greedy algorithm, which handles the issue
Figure 7: Working of Epsilon Greedy Algorithm of learning from bad examples and exploration versus
exploitation respectively. Our focus has been on the search
space, by including various layers performing similar
operations in a slightly different way, as they have huge effect
in the model accuracy. Now once the layer is decided, the Step 11: Filter the top performing model architectures from
parameters associated with it are decided, for which we have the memory buffer and train the Main Q network using
variety of options available to choose from a set of values. Bellman Equation.
Once the model is created and validated, we use One Shot Step 12: After every M training epoch of Main Q network,
training algorithm to transfer weights from a previously transfer the weights to Target Q network.
trained model if there are some common layers amongst them. Step 13: Repeat steps 2-12, until N’ reaches the maximum
By doing this, we save the time of training the model from number of Controller epochs.
start and are imparting some knowledge to the model Figure 8 represents the various steps of the proposed
beforehand. algorithm.
VI. PROPOSED METHODOLOGY
Step 6: Create CNN architectures from the sequences
In this section we will discuss about the different generated. If a model is having the error of negative
components of the Proposed Methodology, including the dimensions while, skip the creation of the current architecture
algorithm that we have executed. This section is divided into 4 and save the model with a validation accuracy of -10.0 in the
subparts: (A) Algorithm for DQNAS, (B) Search Space – memory buffer.
description of different layers and parameters used for creation Step 7: Once the model is created, compile it using each
of search space, (C) Search Strategy and Action Space – combination from learning rate value and optimizer in Table
model used to navigate through search space and take action 4.
(selection of layers) and (D) Evaluation metrics – metrics used Step 8: Before training the model, use One-Shot Training for
for the evaluation of generated CNN. transferring the weights of some layers that were common and
A. Algorithm for DQNAS have the same parameter values in the previously trained
models.
The steps performed by the algorithm for generation of an Step 9: After training all different versions of the model on
efficient and high accuracy CNN for the provided dataset is dataset, store the model architecture and the maximum
given below: achieved validation accuracy (amongst all the versions) in
Step 1: Create the Search Space for the Reinforcement memory buffer.
Algorithm Double DQN (Combination of Main and Target Q Step 10: Repeat Steps 6-9 for all the sequences generated by
network). Double DQN will be using this Search Space for Main Q Network.
generation of CNN model sequences. Step 11: Filter the top performing model architectures from
Step 2: Main Q Network starts generating a sequences of CNN the memory buffer and train the Main Q network using
models. For each model, it generates a random probability Bellman Equation.
value p, which is then compared with the value of epsilon. If Step 12: After every M training epoch of Main Q network,
p< ε, then a random action will be taken, otherwise the action transfer the weights to Target Q network.
with maximum probability (Q-value) will be taken. Step 13: Repeat steps 2-12, until N’ reaches the maximum
Step 3: Check if the chosen action satisfies all the constraints number of Controller epochs.
specified (Table 3), if it does then add the layer to the Figure 8 represents the various steps of the proposed
sequence, otherwise do a new prediction. algorithm.
Step 4: Repeat Step 2, until the maximum length of the
architecture is achieved.
Step 5: Repeat Step 2-4 for N number of times.
Step 6: Create CNN architectures from the sequences
generated. If a model is having the error of negative
dimensions while, skip the creation of the current architecture
and save the model with a validation accuracy of -10.0 in the
memory buffer.
Step 7: Once the model is created, compile it using each
combination from learning rate value and optimizer in Table
4.
Step 8: Before training the model, use One-Shot Training for
transferring the weights of some layers that were common and
have the same parameter values in the previously trained
models.
Step 9: After training all different versions of the model on
dataset, store the model architecture and the maximum
achieved validation accuracy (amongst all the versions) in Figure 8: Proposed Methodology
memory buffer.
Step 10: Repeat Steps 6-9 for all the sequences generated by
Main Q Network.
B. Search Space
Many of the previous approaches [3,51-54] have narrowed
down the size of the search space to decrease the time
complexity of the algorithm. But this reduced search space
also directly affects the ability of the Reinforcement algorithm
to sample efficient architectures. This reduction in Search
Space was either in the form of limiting the type of layers that
were used to create the CNN or by reducing the different
parameters of a layer and their respective values.
In our proposed framework, we have focused on reducing the
time complexity of the algorithm without having any
reduction in the possible search space. Convolutional Neural
Network usually consists of many different layers which can Figure 9: Deep Q network architecture
be divided into following 5 categories:
1) Convolutional Layer While designing a Convolutional Neural Network we have
2) Pooling Layer some constraints and restrictions which will lead us to create
3) Regularization Layer valid CNN architecture. Firstly, there are a lot of possibility
4) Flatten Layer that the CNN architecture generated by the algorithm leads to
5) Dense Layer negative dimensions of the input image, so therefore it throws
which further can have many unique layers. an error. To prevent the execution from stopping, we store the
In future sections, we will be discussing about the model architecture, and the corresponding validation accuracy
limitations that we have put on the number of layers that
is set to a negative value.
constitute the Q networks, those will have no effect on the
Search Space that we have defined for the CNN models. This We earlier saw different parameter and their respective
limitation on the number of layers in Q networks was done to valid values that can be used while generating a CNN model.
reduce the training time. As it takes more time to train a Now there is a possibility where we encounter an error saying
network with many layers. Our Q network is a simple RNN that image has negative dimensions. This could be due to
constituting of few LSTM layers only. In Table 2, we have many reasons, such as, after some processing in the CNN,
summarized different layers that are constitutes the Search image dimensions are reduced to 5x5, and at this point it must
Space along with the possible parameter values. be processed by a layer having kernel size of 7. This would be
C. Search Strategy and Action Space impossible, as we cannot put a 7x7 processing on a 5x5
In the proposed methodology, two Recurrent Neural dimensional image. For preventing our execution to stop due
Network (RNN) are created for the task of generating CNN to this error, we skip that model’s training and testing cycle,
sequence that will later be validated into a CNN model. In a and save its architecture with a negative validation value, so
CNN as the layer to be added further is dependent on the that our reinforcement learning algorithm learns to avoid
layers that are already added previously, therefore LSTM creating such models.
layers are used in the RNN model. Figure 9 represents the Next, we introduced few constraints [48] on the
architecture used for the Deep Q networks. Only a few layers occurrence of several types of layers, such as, there should not
of RNN are used to have a faster computation that would be a Dense Layer before the occurrence of a Flatten Layer. All
result in less time complexity of the algorithm. the constraints are tabulated in Table 3.

TABLE II. SEARCH SPACE

Layer Parameters Values of the parameters

Convolution Layers

Conv2D Filters ϵ {16, 32, 64, 96, 128, 160, 192, 224, 256}
(Not for
DepthwiseConv2D)
Conv2DTranspose
Kernel Size ϵ {3, 5, 7, 9, 11}
SeparableConv2D
Strides ϵ {2, 3} (By default value is set to 1)
DepthwiseConv2D
Padding ϵ {‘same’, ‘valid’}

Kernel Initializer ϵ {‘HeNormal’, ‘HeUniform’, ‘RandomNormal’,


‘RandomUniform’}
Bias Initializer ϵ {‘HeNormal’, ‘HeUniform’, ‘RandomNormal’,
‘RandomUniform’}
Kernel Regularizer ϵ {‘L1’, ‘L2’, ‘L1_L2’}

Pooling Layers

MaxPooling2D Pool size ϵ {2, 3, 4, 5}

AveragePooling2D Strides ϵ {2, 3, 4, 5}

GlobalMaxPooling2D Padding ϵ {‘same’, ‘valid’}

GlobalAveragePooling2D (Not for


GlobalAveragePooling2D
and GlobalMaxPooling2D)
Regularization Layers

Dropout Dropout rate ϵ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}

BatchNormalization (Default values for the parameters were used)

Flatten Layer

Flatten (No parameter values to be (No parameter values to be changed)


changed)
Dense Layer

Dense Number of neurons ϵ {8, 16, 32, 64, 128, 256, 512}

Activation function ϵ {‘sigmoid’, ‘tanh’, ‘relu’, ‘elu’, ‘selu’, ‘swish’}

Following the above-mentioned constraint once the CNN Pooling layer after a Flatten Layer
model is generated, it must be trained and tested on the dataset
to know how well the model is performing. Metrics used to 3 Final Layer should always be a Dense layer
rate the model is validation accuracy, which is stored and is having either a Softmax or a Sigmoid
later used in the Prioritized Experience Replay to filter the activation function depending on number of
better performing models from the memory buffer and train target classes in the dataset
the Main Q Network on this data. 4 Dropout Layer must be inserted only after a
Pooling Layer (more efficient in this
TABLE III. CONSTRAINTS FOR A VALID CNN ARCHITECTURE
manner)

Sr. Restriction 5 No Dense Layer can be there before a


No. Flatten Layer
1 First layer of the CNN architecture should
be a Convolutional layer
D. Evaluation Metrics
2 There should not be a Convolutional and
Training and Testing time taken by the Convolutional
Neural Network has a major part in the time complexity of the
whole algorithm. It takes a lot of time to train each CNN on generated model was trained for 10 epochs, and the validation
the whole dataset for a suitable number of epochs, as it is very accuracy was set to the maximum of all the possible
time consuming. Therefore, we reduced the number of training combinations generated. Our experiments were performed on
epochs for the CNN model and trained it on the whole dataset. CPU of Alienware Aurora R11, Intel(R) Core (TM) i9-10900F
Each model generated is then trained with several CPU @ 2.80GHz 2.81 GHz, to generate 200 models for each
combinations of Learning rates and Optimizers (Table 4). The dataset and took around 3-4 days for the execution.
values of Learning rate values are comparatively more than After each model is trained and validated, its model
recommended because we are training the generated architecture, layer weights and accuracy are stored. When a
architectures in a way that we know which architecture the new model is generated that has some common combination
best is performing amongst the ones generated. Later the best of layers with the previously trained models, then one shot
architecture is trained and tested using Learning rate value of training is used to transfer the parameter weights to this new
0.01, Adam as the optimizer and 40 number of epochs. This model, this imparts some knowledge to the new generated
final test accuracy will be used to compare the performance of model, and it does not start with some random initialization of
our model with other existing methodologies. weights. Table 5 contains the architecture and the accuracy of
top 5 models generated for all the datasets.
Out of all these models, the top model is then trained on a
TABLE IV. LEARNING RATE AND OPTIMIZERS learning rate value of 0.01 for 40 epochs. The top models have
Learning rate ϵ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6} a huge variation in the number of learnable parameters, but
Optimizers ϵ {Adam, RMSProp, SGD} they all perform with high accuracy in the respective datasets.
We have compared the accuracy obtained by DQNAS with
other existing technologies and models and have tabulated the
VII. RESULTS AND DISCUSSION results in Table 6.
As seen in Table 6, it is evident that our proposed algorithm
In the initial phases of the algorithm, the epsilon value is set
creates models that have comparable accuracies with the
to 1, and the decay constant has a value of 0.05. Every time
existing architectures, that were generated by scientists having
when the model takes a random action epsilon value was
cross domain knowledge of CNN, software engineering and
decreased by this constant. For each of the 3 datasets, the
data used. These models contain complex layers and
validation dataset was 10% of the training dataset, that is,
connections and takes huge time and large computational
6000 images for MNIST and 5000 images for CIFAR 10 and
resources for training, whereas models generated using
CIFAR100 datasets. During the model generation phase, the
proposed methodology are simple, takes 1 hour for training
batch size was set to 4 and different combinations of
and have accuracy difference of mere 5% as compared to
optimizers and learning rates were used to train and test the
complex deep learning CNN architectures.
model (Table 4). From all the possible combinations, each

TABLE V. TOP 5 ARCHITECTURES CREATED FOR EACH DATASET

MNIST dataset
1 [(‘conv2dtranspose’, 160, 7, 2, ‘valid’, ‘HeNormal’, ‘RandomUniform’, ‘l1_l2’), (‘conv2dtranspose’, 128, 9, 84.76%
2, ‘valid’, ‘RandomNormal’, ‘RandomNormal’, ‘l1’), (‘separableconv2d’, 96, 5, 3, ‘same’, ‘HeNormal’,
‘RandomNormal’, ‘l1’), (‘conv2dtranspose’, 192, 7, 3, ‘same’, ‘HeUniform’, ‘HeUniform’, ‘l1’), (‘conv2d’,
96, 5, 3, ‘valid’, ‘HeUniform’, ‘RandomUniform’, ‘l2’), (‘separableconv2d’, 128, 9, 2, ‘same’,
‘RandomUniform’, ‘RandomUniform’, ‘l1’), ‘Flatten’, (10, ‘softmax’)]
2 [(‘conv2dtranspose’, 16, 3, 3, ‘valid’, ‘RandomNormal’, ‘HeNormal’, ‘l2’), (‘conv2d’, 224, 9, 2, ‘same’, 82.65%
‘RandomNormal’, ‘RandomUniform’, ‘l2’), (‘conv2dtranspose’, 256, 5, 2, ‘same’, ‘HeNormal’,
‘RandomNormal’, ‘l1’), (‘separableconv2d’, 192, 5, 2, ‘same’, ‘HeNormal’, ‘HeUniform’, ‘l1_l2’),
(‘conv2d’, 128, 5, 3, ‘valid’, ‘RandomUniform’, ‘HeNormal’, ‘l1’), (‘conv2dtranspose’, 128, 5, 3, ‘valid’,
‘HeUniform’, ‘HeUniform’, ‘l1_l2’), ‘Flatten’, (10, ‘softmax’)]
3 [(‘conv2d’, 224, 3, 3, ‘valid’, ‘HeUniform’, ‘HeUniform’, ‘l1_l2’), (‘conv2dtranspose’, 128, 9, 3, ‘valid’, 81.97%
‘RandomNormal’, ‘RandomNormal’, ‘l1_l2’), (‘conv2dtranspose’, 160, 7, 3, ‘valid’, ‘HeNormal’,
‘RandomUniform’, ‘l1’), (‘conv2d’, 32, 7, 2, ‘same’, ‘RandomNormal’, ‘RandomUniform’, ‘l2’),
(‘depthwiseconv2d’, 7, 3, ‘valid’, ‘HeUniform’, ‘HeUniform’, ‘l1_l2’), (‘conv2dtranspose’, 32, 3, 3, ‘same’,
‘HeNormal’, ‘HeNormal’, ‘l1_l2’), ‘Flatten’, (10, ‘softmax’)]
4 [(‘conv2dtranspose’, 192, 3, 3, ‘same’, ‘HeNormal’, ‘RandomUniform’, ‘l1_l2’), (‘depthwiseconv2d’, 3, 2, 81.82%
‘same’, ‘RandomNormal’, ‘HeNormal’, ‘l1’), (‘conv2d’, 192, 5, 2, ‘same’, ‘HeUniform’, ‘HeUniform’,
‘l1_l2’), (‘conv2d’, 192, 7, 2, ‘same’, ‘HeUniform’, ‘HeNormal’, ‘l1_l2’), (‘depthwiseconv2d’, 9, 2, ‘same’,
‘RandomNormal’, ‘RandomNormal’, ‘l2’), (‘conv2d’, 96, 3, 3, ‘valid’, ‘RandomUniform’, ‘HeUniform’,
‘l1’), ‘Flatten’, (10, ‘softmax’)]
5 [(‘conv2dtranspose’, 128, 7, 3, ‘same’, ‘RandomUniform’, ‘HeUniform’, ‘l1’), (‘separableconv2d’, 224, 9, 3, 81.04%
‘same’, ‘HeUniform’, ‘HeUniform’, ‘l2’), (‘conv2dtranspose’, 96, 5, 3, ‘same’, ‘HeUniform’, ‘HeNormal’,
‘l1’), (‘conv2d’, 192, 9, 3, ‘valid’, ‘HeUniform’, ‘RandomNormal’, ‘l1’), (‘conv2d’, 128, 5, 3, ‘same’,
‘HeNormal’, ‘HeUniform’, ‘l1_l2’), (‘conv2dtranspose’, 160, 7, 3, ‘same’, ‘HeUniform’, ‘HeNormal’, ‘l1’),
‘Flatten’, (10, ‘softmax’)]
CIFAR 10 dataset
1 [(‘conv2dtranspose’, 160, 5, 2, ‘valid’, ‘RandomUniform’, ‘RandomUniform’, ‘l2’), (‘depthwiseconv2d’, 9, 63.05
2, ‘same’, ‘RandomNormal’, ‘HeUniform’, ‘l1’), (‘maxpool2d’, 5, 2, ’valid’) , (‘dropout’, 0.3), (‘conv2d’,
192, 7, 2, ‘same’, ‘RandomUniform’, ‘HeUniform’, ‘l1’), (‘globalavgpool2d’, ‘valid’), ‘Flatten’, (10,
‘softmax’)]
2 [(‘conv2dtranspose’, 160, 9, 3, ‘same’, ‘RandomNormal’, ‘RandomNormal’, ‘l2’), (‘conv2dtranspose’, 192, 61.21
3, 3, ‘valid’, ‘RandomNormal’, ‘HeNormal’, ‘l2’), (‘avgpool2d’, 7, 2, ‘same’), (‘conv2dtranspose’, 224, 3, 3,
‘same’, ‘RandomNormal’, ‘HeUniform’, ‘l2’), (‘depthwiseconv2d’, 9, 3, ‘same’, ‘RandomUniform’,
‘RandomUniform’, ‘l1_l2’), , ‘Flatten’, (128, ‘elu’), (10, ‘softmax’)]

3 [(‘conv2d’, 128, 7, 2, ‘same’, ‘RandomUniform’, ‘HeNormal’, ‘l2’), (‘depthwiseconv2d’, 9, 2, ‘valid’, 60.79


‘RandomNormal’, ‘HeNormal’, ‘l2’), (‘conv2d’, 256, 3, 2, ‘same’, ‘HeNormal’, ‘HeNormal’, ‘l1’),
(‘maxpool2d’, 9, 2, ‘valid’), (‘conv2dtranspose’, 192, 7, 3, ‘valid’, ‘RandomUniform’, ‘HeNormal’, ‘l1_l2’),
(‘avgpool2d’, 9, 2, ‘valid’), ‘Flatten’, (10, ‘softmax’)]

4 [(‘conv2d’, 192, 7, 3, ‘same’, ‘RandomNormal’, ‘HeUniform’, ‘l1_l2’), (‘maxpool2d’, 7, 2, ‘valid’), 60.41


(‘maxpool2d’, 9, 2, ‘same’), (‘conv2d’, 16, 9, 3, ‘same’, ‘RandomUniform’, ‘HeUniform’, ‘l1’), (‘conv2d’,
192, 5, 2, ‘same’, ‘HeUniform’, ‘RandomNormal’, ‘l1_l2’), ‘Flatten’, (256, ‘tanh’), (10, ‘softmax’)]
5 [(‘conv2d’, 16, 3, 2, ‘valid’, ‘HeNormal’, ‘RandomNormal’, ‘l2’), (‘depthwiseconv2d’, 5, 2, ‘same’, 60.02
‘RandomNormal’, ‘HeNormal’, ‘l1_l2’), (‘conv2dtranspose’, 224, 7, 2, ‘same’, ‘HeUniform’, ‘HeUniform’,
‘l1’), (‘globalavgpool2d’, ‘same’), (‘dropout’, 0.5), (‘conv2dtranspose’, 160, 3, 3, ‘valid’, ‘RandomNormal’,
‘RandomNormal’, ‘l1_l2’), ‘Flatten’, (10, ‘softmax’)]
CIFAR 100 dataset
1 [(‘conv2d’, 64, 9, 2, ‘same’, ‘RandomNormal’, ‘HeNormal’, ‘l2’), (‘avgpool2d’, 3, 3, ‘valid’), 38.98
(‘conv2dtranspose’, 256, 3, 3, ‘valid’, ‘RandomNormal’, ‘RandomUniform’, ‘l1_l2’), (‘maxpool2d’, 3, 3,
‘valid’), ‘BatchNormalization’, ‘Flatten’, (256, ‘swish’), (100, ‘softmax’)]
2 [(‘conv2d’, 96, 3, 3, ‘valid’, ‘HeNormal’, ‘RandomNormal’, ‘l2’), (‘conv2d’, 192, 7, 2, ‘same’, 29.04
‘RandomUniform’, ‘HeUniform’, ‘l1’), (‘maxpool2d’, 3, 3, ‘valid’), (‘conv2dtranspose’, 256, 9, 3, ‘same’,
‘HeUniform’, ‘RandomNormal’, ‘l1’), (‘conv2d’, 16, 9, 3, ‘same’, ‘HeNormal’, ‘HeUniform’, ‘l1’),
(‘maxpool2d’, 7, 3, ‘same’), ‘Flatten’, (100, ‘softmax’)]
3 [(‘conv2d’, 160, 9, 2, ‘valid’, ‘RandomNormal’, ‘HeUniform’, ‘l2’), (‘globalavgpool2d’, ‘same’), 26.93
(‘depthwiseconv2d’, 9, 2, ‘valid’, ‘HeNormal’, ‘RandomNormal’, ‘l2’), (‘depthwiseconv2d’, 5, 3, ‘valid’,
‘HeUniform’, ‘RandomNormal’, ‘l1_l2’), (‘conv2dtranspose’, 224, 7, 2, ‘same’, ‘RandomUniform’,
‘RandomUniform’, ‘l2’), (‘maxpool2d’, 3, 3, ‘valid’), ‘Flatten’, (100, ‘softmax’)]
4 [(‘conv2dtranspose’, 64, 3, 2, ‘valid’, ‘RandomNormal’, ‘HeUniform’, ‘l1_l2’), (‘depthwiseconv2d’, 3, 3, 25.56
‘same’, ‘HeUniform’, ‘RandomUniform’, ‘l1’), (‘separableconv2d’, 192, 7, 2, ‘same’, ‘RandomUniform’,
‘HeNormal’, ‘l1_l2’), (‘maxpool2d’, 5, 3, ‘same’), (‘dropout’, 0.2), ‘BatchNormalization’, ‘Flatten’, (100,
‘softmax’)]
5 [(‘conv2d’, 224, 7, 3, ‘same’, ‘RandomUniform’, ‘RandomUniform’, ‘l1_l2’), (‘depthwiseconv2d’, 3, 2, 25.52
‘valid’, ‘HeUniform’, ‘HeNormal’, ‘l1’), (‘conv2dtranspose’, 16, 5, 3, ‘valid’, ‘HeNormal’,
‘RandomNormal’, ‘l2’), (‘conv2dtranspose’, 32, 9, 3, ‘valid’, ‘HeUniform’, ‘RandomNormal’, ‘l1_l2’),
(‘depthwiseconv2d’, 7, 3, ‘valid’, ‘RandomNormal’, ‘RandomNormal’, ‘l2’), ‘Flatten’, (128, ‘relu’), (100,
‘softmax’)]

TABLE VI. ACCRUACY PERCENTAGE COMPARISON WITH EXISTING STATE-OF-THE-ART ARCHITECTURES AND METHODOLOGIES

Models/Methodologies MNIST CIFAR10 CIFAR100

MetaQNN (12 layers) 99.56 92.68 72.86


VGGNet (16 layers) - 92.75 -
FitNet (19 layers) 99.49 91.61 64.96
Proposed Methodology 97.91 88.07 69.15
(8 layers)
VIII. CONCLUSION
Use of Neural Networks has been increased exponentially [11] Floreano, Dario, Peter Dürr, and Claudio Mattiussi. "Neuroevolution:
from architectures to learning." Evolutionary intelligence 1.1 (2008): 47-
in the past few years, but it also generates the problem of 62.
creating domain specific high performance neural network [12] Corne, David, and Michael A. Lones. "Evolutionary algorithms."
architecture. There are a lot of meta-modelling approaches Handbook of Heuristics. Springer, Cham, 2018. 409-430.
used for automatic generation of neural network architecture, [13] Singh, Satinder P., and Richard C. Yee. "An upper bound on the loss
but either they have a requirement of training the algorithm on from approximate optimal-value functions." Machine Learning 16.3
the dataset using a lot of GPUs on several days or they restrict (1994): 227-233.
the search space make force the algorithm to follow a pattern. [14] Byun, Ju-Seung, Byungmoon Kim, and Huamin Wang. "Proximal
Policy Gradient: PPO with Policy Gradient." arXiv preprint
Our DQNAS approach, solves the problem of time complexity arXiv:2010.09933 (2020).
as well as the resource requirements. Moreover, the top [15] Klein, Aaron, et al. "Fast bayesian optimization of machine learning
architectures created can also be tested and used on other hyperparameters on large datasets." Artificial intelligence and statistics.
related datasets. While the DQNAS is a simple algorithm PMLR, 2017.
using reinforcement learning to generate CNN architectures, [16] Bergstra, James, Daniel Yamins, and David Cox. "Making a science of
model search: Hyperparameter optimization in hundreds of dimensions
users can easily modify the constraints specified to create a for vision architectures." International conference on machine learning.
sequence that imitates other well performing architecture. PMLR, 2013.
In the future work, we would like to minimize the time [17] Luc, Dinh The. "Pareto optimality." Pareto optimality, game theory and
complexity by incorporating the concept of predicted equilibria (2008): 481-515.
accuracies, which would eliminate all those architectures that [18] Ma, Ningning, et al. "Shufflenet v2: Practical guidelines for efficient cnn
the model predicts will perform poorly on the given dataset. architecture design." Proceedings of the European conference on
computer vision (ECCV). 2018.
We will incorporate other techniques for fastened training of
[19] Freeman, Ido, Lutz Roese-Koerner, and Anton Kummert. "Effnet: An
the CNN architecture, such as usage of REINFORCE gradient efficient structure for convolutional neural networks." 2018 25th ieee
for updating weights and trying out the effect of reducing the international conference on image processing (icip). IEEE, 2018.
training dataset on model training. There might be instances [20] Cai, Han, et al. "Efficient architecture search by network
where the total number of learnable parameters in the CNN transformation." Proceedings of the AAAI Conference on Artificial
architecture are huge, and the system may fail to create such Intelligence. Vol. 32. No. 1. 2018.
model. To handle such cases, we will put a threshold, and if [21] Zhong, Zhao, et al. "Practical block-wise neural network architecture
generation." Proceedings of the IEEE conference on computer vision
the parameters are exceeding that number, then that model and pattern recognition. 2018.
should be skipped. [22] Motta, Daniel, et al. "Optimization of convolutional neural network
hyperparameters for automatic classification of adult mosquitoes." Plos
REFERENCES one 15.7 (2020): e0234959.
[1] Jaafra, Yesmina, et al. "A review of meta-reinforcement learning for [23] Gülcü, Ayla, and Zeki Kuş. "Hyper-parameter selection in convolutional
deep neural networks architecture search." arXiv preprint neural networks using microcanonical optimization algorithm." IEEE
arXiv:1812.07995 (2018). Access 8 (2020): 52528-52540.
[2] Chauhan, Anshumaan, et al. "LPRNet: A Novel Approach for Novelty [24] Yang, Yuxuan, et al. "A CNN identified by reinforcement learning-
Detection in Networking Packets." International Journal of Advanced based optimization framework for EEG-based state evaluation." Journal
Computer Science and Applications 13.2 (2022). of Neural Engineering 18.4 (2021): 046059.
[3] Baker, Bowen, et al. "Designing neural network architectures using [25] Li, Liam, et al. "Massively parallel hyperparameter tuning." arXiv
reinforcement learning." arXiv preprint arXiv:1611.02167 (2016). preprint arXiv:1810.05934 5 (2018).
[4] Sun, Yanan, et al. "Automatically evolving cnn architectures based on [26] Sun, Yanan, et al. "Completely automated CNN architecture design
blocks." arXiv preprint arXiv:1810.11875 (2018). based on blocks." IEEE transactions on neural networks and learning
[5] Suganuma, Masanori, Shinichi Shirakawa, and Tomoharu Nagao. "A systems 31.4 (2019): 1242-1254.
genetic programming approach to designing convolutional neural [27] Chen, Yushi, et al. "Automatic design of convolutional neural network
network architectures." Proceedings of the genetic and evolutionary for hyperspectral image classification." IEEE Transactions on
computation conference. 2017. Geoscience and Remote Sensing 57.9 (2019): 7048-7066.
[6] Liashchynskyi, Petro, and Pavlo Liashchynskyi. "Grid search, random [28] Xie, Lingxi, and Alan Yuille. "Genetic cnn." Proceedings of the IEEE
search, genetic algorithm: a big comparison for NAS." arXiv preprint international conference on computer vision. 2017.
arXiv:1912.06059 (2019). [29] Dufourq, Emmanuel, and Bruce A. Bassett. "Eden: Evolutionary deep
[7] Zoph, Barret, and Quoc V. Le. "Neural architecture search with networks for efficient machine learning." 2017 Pattern Recognition
reinforcement learning." arXiv preprint arXiv:1611.01578 (2016). Association of South Africa and Robotics and Mechatronics (PRASA-
[8] Real, Esteban, et al. "Regularized evolution for image classifier RobMech). IEEE, 2017.
architecture search." Proceedings of the aaai conference on artificial [30] Cai, Han, et al. "Automl for architecting efficient and specialized neural
intelligence. Vol. 33. No. 01. 2019. networks." IEEE Micro 40.1 (2019): 75-82.
[9] Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. "Neural [31] Polonskaia, Iana S., Ilya R. Aliev, and Nikolay O. Nikitin. "Automated
architecture search: A survey." The Journal of Machine Learning evolutionary design of CNN classifiers for object recognition on satellite
Research 20.1 (2019): 1997-2017. images." Procedia Computer Science 193 (2021): 210-219.
[10] Kandasamy, Kirthevasan, et al. "Neural architecture search with [32] Chen, Yushi, et al. "Automatic design of convolutional neural network
bayesian optimisation and optimal transport." Advances in neural for hyperspectral image classification." IEEE Transactions on
information processing systems 31 (2018). Geoscience and Remote Sensing 57.9 (2019): 7048-7066.
[33] DeVries, Terrance, and Graham W. Taylor. "Improved regularization of [40] Cai, Han, et al. "Path-level network transformation for efficient
convolutional neural networks with cutout." arXiv preprint architecture search." International Conference on Machine Learning.
arXiv:1708.04552 (2017). PMLR, 2018.
[34] Chen, Yifang, et al. "Automated design of neural network architectures [41] Liu, Chenxi, et al. "Progressive neural architecture search." Proceedings
with reinforcement learning for detection of global manipulations." of the European conference on computer vision (ECCV). 2018.
IEEE Journal of Selected Topics in Signal Processing 14.5 (2020): 997- [42] Pham, Hieu, et al. "Efficient neural architecture search via parameters
1011. sharing." International conference on machine learning. PMLR, 2018.
[35] Mortazi, Aliasghar, and Ulas Bagci. "Automatically designing CNN [43] Chollet, François. "keras." (2015).
architectures for medical image segmentation." International Workshop
[44] Mnih, Volodymyr, et al. "Asynchronous methods for deep
on Machine Learning in Medical Imaging. Springer, Cham, 2018.
reinforcement learning." International conference on machine learning.
[36] Tan, Mingxing. "MnasNet: Towards Automating the Design of Mobile PMLR, 2016.
Machine Learning Models." (2018).
[45] Zhang, Shangtong, and Richard S. Sutton. "A deeper look at experience
[37] Hsu, Chi-Hung, et al. "Monas: Multi-objective neural architecture search replay." arXiv preprint arXiv:1712.01275 (2017).
using reinforcement learning." arXiv preprint arXiv:1806.10332 (2018).
[46] Schaul, Tom, et al. "Prioritized experience replay." arXiv preprint
[38] Guo, Minghao, et al. "Irlas: Inverse reinforcement learning for arXiv:1511.05952 (2015).
architecture search." Proceedings of the IEEE/CVF Conference on
[47] Jin, Haifeng, Qingquan Song, and Xia Hu. "Auto-keras: An efficient
Computer Vision and Pattern Recognition. 2019.
neural architecture search system." Proceedings of the 25th ACM
[39] Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. "Net2net: SIGKDD international conference on knowledge discovery & data
Accelerating learning via knowledge transfer." arXiv preprint mining. 2019.
arXiv:1511.05641 (2015).
[48] Alzubaidi, Laith, et al. "Review of deep learning: Concepts, CNN
architectures, challenges, applications, future directions." Journal of big
Data 8.1 (2021): 1-74.

You might also like