DNN-Opt: An RL Inspired Optimization For Analog Circuit Sizing Using Deep Neural Networks
DNN-Opt: An RL Inspired Optimization For Analog Circuit Sizing Using Deep Neural Networks
Abstract—Analog circuit sizing takes a significant amount of methods are generally fast, developing accurate expressions
manual effort in a typical design cycle. With rapidly developing for circuit performances is not easy and deviates largely from
technology and tight schedules, bringing automated solutions for the actual values. On the other hand, simulation-based methods
sizing has attracted great attention. This paper presents DNN-
arXiv:2110.00211v1 [cs.LG] 1 Oct 2021
Opt, a Reinforcement Learning (RL) inspired Deep Neural Net- employ black-box or learning-based optimization techniques to
work (DNN) based black-box optimization framework for analog explore design space. These methods make guided exploration
circuit sizing. The key contributions of this paper are a novel in the search space and target a global minimum using the real
sample-efficient two-stage deep learning optimization framework evaluations from circuit simulators.
leveraging RL actor-critic algorithms, and a recipe to extend it on Traditionally, there have existed various model-free opti-
large industrial circuits using critical device identification. Our
method shows 5–30x sample efficiency compared to other black- mization methods such as particle swarm optimization (PSO)
box optimization methods both on small building blocks and on [6] and advanced differential evolution [7]. Although these
large industrial circuits with better performance metrics. To the methods have good convergence behavior, they are known
best of our knowledge, this is the first application of DNN-based to be sample-inefficient (i.e., SPICE simulation intensive).
circuit sizing on industrial scale circuits. Recently surrogate model-based and learning-based methods
Index Terms—Analog Circuit Sizing Automation, Blackbox
Optimization, Reinforcement Learning, Deep Neural Network are becoming increasingly popular due to their efficiency in
exploring solution space. In surrogate model-based methods,
I. I NTRODUCTION Gaussian Process Regression (GPR) [8] is generally used
Analog Integrated Circuit (IC) design is a complex process for design space modeling, and the next design point is
involving multiple steps. Billions of nanoscale transistor de- determined through model predictions. For example, GASPAD
vices are fabricated on a silicon die and connected via intricate method is introduced into Radio Frequency (RF) IC synthesis
metal layers during those steps. The final product is an IC, where GPR predictions guide evolutionary search [9]. WEIBO
which powers much of our life today. An essential aspect of method proposed a GPR based Bayesian Optimization [10]
IC design is analog design, which continues to suffer from algorithm where a blended version of weighted Expected Im-
long design cycles and high design complexity due to lack of provement (wEI) and the probability of feasibility is selected
automation in analog Electronic Design Automation (EDA) as acquisition function to handle constrained nature of analog
tools compared to digital flows. In particular, “circuit sizing” sizing [11]. The main drawback of Bayesian Optimization
tends to consume a significant portion of analog designers’ methods is scalability as GP modeling has cubic complexity
time. In order to tackle this labor-intensive nature and reduce in the number of samples, O(N 3 ).
time-to-market requirements, analog circuit sizing automation Recently, reinforcement learning algorithms are applied in
has attracted high interest in recent years. the area as learning-based methods. GCN-RL [12] leverages
Prior work on analog circuit sizing automation can be di- Graph Neural Networks (GNN) and proposes a transferable
vided into two categories: knowledge-based and optimization- framework. Despite reporting superior results over various
based methods. In the knowledge-based approach, design methods and human-designer, a) it requires thousands of
experts transcribe their domain knowledge into algorithms and simulations for convergence (without transfer learning) and
equations [1], [2]. However, such methods create dependency b) it suffers from engineering effort to determine observation
on expert human-designers, circuit topology, and technology vector, architecture selection, and reward engineering. AutoCkt
nodes. Thus, these methods are highly time-consuming and [13] is a sparse sub-sampling RL technique optimizing the
not scalable. circuit parameters by taking discrete actions in the solution
Optimization-based methods are further categorized into space. AutoCkt shows more efficiency over random RL agents
two classes: equation-based and simulation-based methods. and Differential Evolution. Still, it requires to be trained with
Equation-based methods try to express circuit performance via thousands of SPICE simulations before deployment, which is
posynomial equations or regression models using simulation costly.
data. Then the equation-based optimization methods such as In this paper we introduce DNN-Opt, a two-stage deep
Geometric Programming [3], [4] or Semidefinite Programming learning black-box optimization scheme, where we merge the
(SDP) relaxations [5] are applied to convex or non-convex for- strengths of Reinforcement Learning (RL), Bayesian Opti-
mulated problems to find an optimal solution. Although those mization (BO), and population-based techniques in a novel
two-stage DNN
Circuit Info Actor
Training algorithm [14], which is an RL actor-critic algorithm [15]
developed for continuous action spaces. However, actor-critic
algorithms are not directly applicable to analog circuit sizing
Topology, Specs, Bounds since it is not a Markov Decision Processes (MDP) [16], which
is a necessary condition for any RL problem. Therefore we
Next Sample: adapt DDPG algorithm with significant modifications tailored
for analog circuit sizing.
Circuit Simulator Actor - Network
In the context of analog circuit sizing, we will keep some
of the RL notation but replace many for simplicity and clarity.
Fig. 1. DNN-Opt Framework Design: A design is a set of circuit parameters which we
denote by x and it is a vector of size d where each element
way. The key features of the DNN-Opt framework are below.
corresponds to a particular design variable. The optimization
• We tailored a two-stage Deep Neural Network (DNN) ar-
goal is to find optimal xopt which satisfies Eq. 1.
chitecture for black-box optimization tasks inspired by the Population: A population is set of multiple designs.
actor-critic algorithms developed in the RL community. Design Population Matrix: We define a design population
• To leverage convergence behavior of population-based
matrix as X ∈ RN ×d , where N is the population size. The
methods, DNN-Opt adopts a population-based search parameters of ith design is a row in the design population
space control mechanism. matrix X, which is denoted as xi .
• We introduce a recipe for extending our work for large in-
State Space: Our work maps optimization parameters (circuit
dustrial designs using sensitivity analysis. In collaboration design variables) to state representation in RL notation. A state
with a design house, we demonstrate that our work can of k th design is transformed as sk = xk .
also efficiently size large circuits with tens of thousands Action Space: Each action ak in our new architecture corre-
of devices in addition to small building blocks. sponds to change in optimization parameters vector, xk , which
The rest of the paper is organized as follows. We formulate can be denoted as ak = ∆xk . An intuitive explanation of this
analog circuit sizing problem in Section II and introduce DNN- choice is that an ideal action for an optimization task should
Opt with its RL core and other details. In Section III, the propose change in each design variable to have a better design.
performance of DNN-Opt is demonstrated on small building Critic-Network: Originally, a critic-network parameterized by
blocks and large industrial circuits. We also provide per- θQ approximates the return value of an MDP Return =
formance comparisons of DNN-Opt with other optimization Q(st , at |θQ ). We modify its role and use this network as a
methods. The conclusions are provided in Section IV. proxy in lieu of expensive SPICE simulator. Our modified
II. DNN-O PT F RAMEWORK critic-network provides a vector-to-vector mapping by taking
an (x, ∆x) ∈ D2d as input and providing performance
A. Analog Circuit Sizing: Problem Formulation
predictions Q(x, ∆x|θQ ) ∈ Rm+1 at output, one-dimension is
We formulate analog circuit sizing task as a constrained for objective specification and m for constraint specifications.
optimization problem succinctly as below. Actor-Network: An actor-network parameterized by θµ would
minimize f0 (x) take a state as its input and determine an action to take
(1) ak = µ(sk |θµ ). In the context of analog circuit sizing, actor-
subject to fi (x) ≤ 0 for i = 1, . . . , m
network provides change in design parameter vector for design
where, x ∈ Dd is the parameter vector and d is the number k as: ∆xk = ak = µ(xk |θµ ).
of design variables of sizing task. Thus, Dd is the design space. Critic-Network Training: We utilize critic-network for mod-
f0 (x) is the objective performance metric we aim to minimize. eling design variable to circuit performance relationship. For
Without loss of generality, we denote ith constraint by fi (x). effective training, we use data augmentation techniques to
generate N 2 pseudo-samples (ps) using original N samples.
B. DNN-Opt Core: RL Inspired Two-Stage DNN Architecture
In order to generate pseudo-samples, we use two-samples
The overall framework of DNN-Opt is shown in Figure xi and xj and corresponding spec vectors f (xi ) and f (xj ),
1. DNN-Opt comprises a two-stage deep neural network as follows:
architecture that interacts with a circuit simulator during the
optimization process. The flow starts from generated samples xps
ij = [xi , ∆xij ] = [xi , xj − xi ]
(2)
in the design space; then, a critic-network is used to predict f ps (xps
ij ) = f (xj )
This leads to change in the input dimensionality of critic- where, xi is the column vector of size Nes consisting of ith
network from d to 2d since we now have to use (x, ∆x) parameter of all designs in the elite population.
instead of x or (x+∆x). Our experiments conducted on The hyperparameters (number of layers, number of nodes,
Bayesmark [17] benchmark problems showed that using learning rate, etc.) of the architecture for the actor and critic
2d inputs and training with pseudo-samples boosted critic- networks were found based on empirical studies.
network’s accuracy significantly over a network trained with
C. Sensitivity Analysis
d inputs and original samples.
For a batch-size of Nb pseudo-samples, the following Mean We use sensitivity analysis to prune design search space for
Squared Error (MSE) loss function is used to train the critic efficiently finding an optimized solution. A blind search space
network. exploration may lead to wasted circuit simulations during
PNb Pm+1 2 optimization. For example, in a classical seven transistor
1
L θQ = Nb (m+1) k=1 l=1 Q(xk , ∆xk )l − f (xk + ∆xk )l (3) Operational Amplifier (OpAmp) [4] power dissipation does
l
where Q(xk , ∆xk ) is the critic-network’s approximation for not depend on the differential pair devices once they are in
kth pseudo-sample’s lth performance and f (xk + ∆xk )l is saturation. Thus, if we want to size a circuit for reducing
the SPICE simulated value for the same design-performance power, we should not make device properties of the differential
pair. To clarify, we have SPICE simulation values for pseudo- pair devices as variables. To use sensitivity analysis in practice
samples because the way they are constructed. for any generic circuit, we first traverse the circuit hierarchy
Actor-Network Training: Training of actor-network is done and collect all unique device design variables, d. Then, we
after critic-network is trained and its hyperparameters are perform sensitivity analysis by perturbing each of the design
fixed. The training of actor-network corresponds to search in variables around its nominal value and observing its impact
design space for better designs. We come up with a Figure on objective and constraints, fi . More formally, we compute
of Merit (FoM) function, g(·), based on performance-vector sensitivity Sij as
to objectively quantify how better a design is with respect to δfi
others. Sij = , ∀i = 0, . . . , m; j = 1, . . . , d. (7)
δdj
Xm
g [f (x)] = w0 ×f0 (x)+ min (1, max(0, wi × fi (x))) (4) We only need to consider design variables for which Sij >
i=1 thresh, where thresh is a user-defined number. Empirically,
where wi is the weighting factor. Note, a max(·) clipping this analysis prunes design search space effectively, allowing
used for equating designs after constraint are met and min(·) us to work on large scale circuits.
clipping is used for practical purposes to prevent single We are now ready to present the overall framework of DNN-
constraint violation to dominate g(·) value. We train actor- Opt in the next subsection.
network parameters by using g(·) function and replacing D. DNN-Opt: Overall Framework
SPICE simulation values f (·) by the critic-network predictions
The overall framework for DNN-Opt is provided in Algo-
Q(x, ∆x). We will further use a population of “elite” solutions
rithm 1. As a prerequisite, we apply sensitivity analysis for a
(es) of size Nes to restrict search space for actor network.
large design and reduce number of design variables to a work-
Population of elite solutions is a subset of total population
able range. We then randomly sample Ninit points from the
determined based on the FoM ranking.
design search space to build initial population. For optimiza-
For a batch-size of Nb samples the following loss-function
tion iteration t, first step is to initialize actor-critic parameters
is used to train actor network.
followed by pseudo-sample generation. Next actor-network
Nb
µ 1 X and critic-network are trained. After this, an elite-population
L (θ ) = (g [Q(xk , µ(xk | θµ ))] + kλ ∗ violk k2 ) (5) is constructed based on FoM of total-population (this elite-
Nb
k=1
population will be updated with optimization iterations). The
where µ(xk | θµ ) is proposed parameter change vector ∆xk next query point is generated from elite-population, Xes , using
by the actor network. (λ ∗ violk ) is an element-wise vector pre-trained actor-critic as follows. We use every design, xes i ,
multiplication where λ is weighting coefficient chosen to be in the pool of elite-population as input to actor-network. The
very large to prevent any boundary violation and keep the output of actor-network, ∆xes es
i = µ(xi ), is proposed change
search in the restricted search region. The total boundary for design parameters in search of an optimal solution. With
violation violk for action k is defined as follows: the imposed exploration noise (N ), a candidate design point
violk = max(0, lbrest − (xk + ∆xk )) + max(0, (xk + ∆xk ) − ubrest ) (6) is naturally formed as: xca es es
i = xi + µ(xi ) + N . At this step,
we have exactly the same number of proposed candidates,
where lbrest and ubrest are the restriction boundary vectors Xca = [xca ca
i , . . . , xNes ], as the size of elite-population. Once
for design variables determined by the population of elite the population pairs, Xes and Xca , are formed the next sample
solutions given by: point for iteration t is selected using Eq. 8.
lbirest =min(xi ) ∀i = 1, . . . , d xsample
= xca es ca es
t k for k = arg mini (g[Q(xi , xi − xi )]) (8)
ubirest =max(xi ) ∀i = 1, . . . , d
Algorithm 1 DNN-Opt Algorithm M4 M5
VDD
VBP2
Require: Dimensionality reduction with sensitivity analysis if (N1+N2)
W4
L4
(N1+N2)
W4
L4
W3
1: Define total population Xtot = Xinit
VOP W2 W2 VON
N2 VBN N2
M17
L2 L2 W4 L3
M10
L4
2: for t = 1, 2, . . . , tmax do MCAP
VBN2
2*N1
W1 MCAP
M13 M8 L1 M9 M14
3: Initialize actor & critic network parameters θµ and θQ N9
W1
N2
W1 VCMFB
N2
W1
N9
W1
VBP2
L1 L1 L1 L1 W3
4: Generate pseudo-samples using existing design Gain Stage L3
VDD
Xtot → Eqn. 2 VON VBP2 M22 VBN VBP1
W1
5: Train critic-network → Eqn. 3 N1
L1
R M15 M18 M20
6: Train actor-network → Eqn. 5 M23
W6
M24
W6 VREF
W2 W2 W2
L2
Calculate FoM for each design by FoM = g[f (Xtot )] L2 L2
7: R
VCMFB
L6 L6
1.00
3
FoM
FoM
0.75
2
0.50
1
0.25
0.00 0
0 100 200 300 400 500 0 100 200 300 400 500
number of simulations number of simulations
Fig. 3. The average FoM (lower is better) curve for 500 simulations Fig. 4. The average FoM (lower is better) curve for 500 simulations
TABLE II
S TATISTICS FOR DIFFERENT ALGORITHMS : F OLDED C ASCODE OTA
Algorithm DE BO-wEI GASPAD DNN-Opt
success rate 10/10 2/10 4/10 10/10
# of simulations 3200 >500 >500 132
Min power (mW ) 0.75 0.91 0.72 0.62
Max power (mW ) 1.53 1.62 1.75 0.77
Mean power (mW ) 1.14 1.25 0.96 0.71
Modeling time (h) NA 30 6.5 0.6
Simulation time (h) 54 2.7 2.7 2.7
Total runtime (h) 54 32.7 8.2 3.3