IEEE 2021 DQL Extracting Decision Tree

This paper discusses the extraction of decision trees from trained deep reinforcement learning (DRL) models to enhance interpretability in traffic signal control systems. The authors propose a method that utilizes the Deep Q-network (DQN) to learn an expert policy, collects state-action pairs, and employs decision tree algorithms like ID3 and CART to explain the learned policy. Experimental results demonstrate that the extracted decision trees effectively clarify the decision-making process of DRL agents in traffic management scenarios.

Uploaded by

Jablan M Karaklajic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views7 pages

IEEE 2021 DQL Extracting Decision Tree

Uploaded by

Jablan M Karaklajic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Extracting Decision Tree from Trained Deep

Reinforcement Learning in Traffic Signal Control

Yuanyang Zhu Xiao Yin
Department of Control and Systems Engineering Department of Control and Systems Engineering
Nanjing University, Nanjing, China Nanjing University, Nanjing, China
[email protected] [email protected]

Ruyu Li Chunlin Chen

Department of Control and Systems Engineering Department of Control and Systems Engineering
2021 International Conference on Cyber-Physical Social Intelligence (ICCSI) | 978-1-6654-2621-3/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICCSI53130.2021.9736263

Nanjing University, Nanjing, China Nanjing University, Nanjing, China

[email protected] [email protected]

Abstract—Deep reinforcement learning (DRL) has achieved advance for mimicking the cognitive process of human experts
promising results on traffic signal control systems. However, due to solve decision-making problems. Although most of these
to the complexity of the decisions of deep neural networks, methods are good at explainability, their performance may not
it is a great challenge to explain and visualize the policy
of reinforcement learning (RL) agents. The decision tree can be sufficient to meet the requirements due to the complexity
provide useful information for experts responsible for making of high-dimension problems.
reliable decisions. In this paper, we employ decision trees to In recent years, the notion of eXplainable Artificial Intel-
extract models with readable interpretations from expert policy ligence (XAI), especially eXplainable reinforcement learning
achieved by DRL methods. We evaluate our methods via a (RL) has renewed interest with the recent successes in the
single-intersection traffic signal control task on the simulation
platform of Urban MObility (SUMO). The experimental results technique of deep neural networks. It is a great demand
demonstrate that the extracted decision trees can be used to that non-expert users are to trust and manage the intelligent
understand the learning process and the learned optimal policy system. A major factor contributing to the tension between
of the DRL methods. performance and interpretability is the increasing opacity of
Index Terms—Deep reinforcement learning, Decision tree, machine learning models, especially neural networks defined
Traffic signal control systems
by complex structures and a large number of parameters [11].
I. I NTRODUCTION Currently, many state-of-the-art deep reinforcement learning
(DRL) methods employ neural networks technique for ap-
With the increasing traffic volumes and limited road ca- proximating their function. With stronger function approxi-
pacities, inefficient traffic light cycle control causes numer- mators, DRL has shown great promise in various domains,
ous problems, such as long delay and waste of energy. To e.g. complex high dimensional games [12], [13] and high-
address the traffic congestion problem, researchers developed dimension continuous control tasks [14]. It may be acceptable
intelligent transportation systems (ITSs) that utilize machine in some cases when losing explanation capability in support of
learning techniques to control traffic flow over collected traffic RL performance, especially in video games or board games.
information [1], [2]. However, in many other practical applications, especially
Recent development in sensing and communication tech- medicine, law, finance, and transportation, the assurance about
nologies have obtained great progress in the field of ITSs the reasoning behind these complex models is necessary to
owning to the ability to collect traffic data in real time, reduce decision risk. Generally, the real context is simulated
especially in the traffic signal control systems (TSCS) [3], [4]. by oversimplifying complex dynamics formatting the digital
Many algorithms have been proposed for systems operating in environment.
TSCS. Fuzzy logic methods were first proposed to solve traffic Compared to reinforcement learning, decision trees are
congestion over a single [5] and multiple intersections [6]. Ex- regarded as models with readable interpretations for humans
pert systems were designed to solve congested traffic task with by visualizing the decision path [15]. But decision trees suffer
a knowledge-based methodology [7]. Besides, the techniques from low accuracy for weak expressivity. The soft decision tree
inspired by biological behavior, such as genetic algorithms [8], method creates a type of soft decision tree by using the training
swarm optimization algorithms [9] and ant colony optimization neural networks, which generalizes better than the one learned
algorithms [10], are employed to solve traffic congestion. directly from the training data [16]. Besides, the verifiable
These methods need to know the prior knowledge of the reinforcement learning via policy extraction (VIPER) achieves
system model and consider extensive possible situations in performance equal to that of the original policy [17]. Hence,
978-1-6654-2621-3/21$31.00 © 2021 IEEE we employ the VIPER method to improve the interpretations

Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech. Downloaded on February 16,2025 at 08:32:46 UTC from IEEE Xplore. Restrictions apply.
of the trained DRL model for traffic signal control systems. for all actions. After training the network, we will obtain the
In this paper, we propose an algorithm of extracting decision optimal policy by selecting the action with the largest Q-
trees from trained reinforcement learning in TSCS to improve function.
the interpretability. First, we employ the Deep Q-network
B. Decision tree: ID3 and CART
(DQN) algorithm to learn the expert policy in TSCS. Second,
we utilize the expert policy model to collect state-action pairs Decision trees provide a white box structure for approxi-
by interacting with the TSCS. Third, we use the decision tree mating the target function, in which the learned function is
method, e.g., iterative dichotomiser 3 (ID3) and classification represented by a binary tree. A decision tree is a flowchart-
and regression trees (CART), to fit the dataset of the collected like structure in which each internal node represents a test
state-action pairs for explaining the expert policy. Finally, of an attribute, each branch represents the result of the test,
we evaluate the proposed method in a single-intersection and each leaf node represents a class label. The regression
traffic signal control task on the simulation platform of Urban and classification rule are generated as it is traversed down
MObility (SUMO). from root to leaf nodes. According to decision tree algorithms,
we can split each node recursively. Iterative dichotomiser 3
II. PRELIMINARIES (ID3) [18] selects the highest information gain attribute to
A. Reinforcement learning split attribute by using entropy function and information gain:
m
We model the traffic signal control task as a Partially X
Observable Markov Decision Process (POMDP). POMDP is Entropy = −pi log2 pi , (7)
i=1
composed of six components: 1) a set of real state space Z;
2) a set of observed state space S; 3) a set of actions A; 4) where pi is the probability of class i and m is the number of
a state transition function P; 5) a reward function R; 6) a classes for the target attribute.
discount factor γ ∈ [0, 1), which can be represented as a tuple The classification and regression trees (CART) method [19]
(Z, S, A, P, R, γ). measures the divergences between the probability distributions
A policy, π : S × A → [0, 1], is defined as a mapping from of the target attribute’s values with the Gini index:
the state space S to the action space A by interacting with the k
X
environment, and we have Gini index = 1 − p2i , (8)
X i=1
π(a|s) = 1, ∀s ∈ S. (1)
a∈A
where pi is the probability of class i. The tree selects recur-
sively the attribute with the lowest Gini index [20].
The goal is to obtain an optimal policy
III. E XTRACTING D ECISION T REE FROM T RAINED DRL
π ∗ = arg maxJ(π), (2)
π In this section, we present the framework of extracting
which maximizes the discounted reward function decision trees from trained DRL with specific implementations
"∞ # of obtaining the expert policy model, getting the trajectories
X
t from environments and extracting the decision tree.
J(π) = Eτ ∼π(τ ) [r(τ )] = Eτ ∼π(τ ) γ rt , (3)
t=0 A. Overall architecture
where τ = (s0 , a0 , s1 , a1 , ...) is the learning episode, π(τ ) = We focus on the interpretability of the deep reinforcement
p(s0 )Π∞t=0 π(at |st )p(st+1 |st , at ), rt is the immediate reward learning (DRL) in the traffic signal control system. Reinforce-
received on the transition from st to st+1 under action at . ment learning techniques promise to produce traffic signal
Deep Q-network (DQN) is a variation of the classic Q- controls that perceive, learn, decide, and act on their own.
learning algorithm, which utilizes the deep learning architec- However, they will be unable to explain their decisions and
ture with parameter θ to approximate Q-function Q(s, a) [12], actions to human users. We propose the extracting decision
i.e., tree from the trained DRL model in the traffic signal control
Q(s, a) ≈ Q(s, a; θ). (4) system to improve the interpretability of DRL.
The whole learning process in our methods is shown in
It learns the action-value function Q∗ corresponding to the Fig. 1. First, we employ the DQN algorithm to learn the
optimal policy by minimizing the loss: expert policy π ∗ for traffic signal control. Considering that the
h i
2
L(θ) = Es,a,r,s0 (Q∗ (s, a | θ) − y) , (5) different levels of the expert policy may have different effects
on collecting the state-action pairs dataset, we save the expert
y = r + δ max
0
Q∗ (s0 , a0 ) , (6) policy when the total rewards are up to a higher level with tiny
a fluctuation. Next, after obtaining the expert policy, we employ
where y is a target Q function whose parameters are pe- policy π ∗ to collect data by interacting with the environment
riodically updated with the most recent θ to help stabilize for a few episodes. To better explain the expert policy, we
the learning process. In general, the agent trains the neural simulate all the cases in the same traffic signal control task.
network in such a way that it predicts the cumulative rewards Finally, we employ the decision tree algorithm to fit the

Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech. Downloaded on February 16,2025 at 08:32:46 UTC from IEEE Xplore. Restrictions apply.
Traffic light control task control problems. When the total reward of the agent gets
closer to the optimal without divergence, we consider it better
to achieve the efficiency and quality of the learned policy.
C. Getting trajectories
After obtaining the expert policy π ∗ , we are still unable to
understand why the traffic signal control system takes these
decisions and actions. To better explain the expert policy, we
employ the policy extraction method to transform the expert
policy into a different format which could be a smaller neural
network or the other format such as a decision tree. There are
Deep Q-network many methods to replicate the expert policy. Considering that
Get the expert policy π* the VIPER method fits a single decision tree regardless of the
internal structure of the expert policy, we employ the VIPER
method to implement policy extraction.
The VIPER method achieves trajectories by executing the
expert policy π ∗ in the environment E. The agent observes
state st and then takes action at by querying the expert
policy π ∗ at every time step. We obtain a state-action pair
Expert policy π* (st , at ) which we will save to fit a decision tree. The agent
will transit to a new state st+1 by executing the action
Collect the trajectories
at , and the process repeats until reaching the terminal. We
record the sequence of the states and actions as the trajectory
T = {(s1 , a1 ), (s2 , a2 ), . . . , (st , at ), sterminal }. Thus we can
obtain a dataset of state-action pairs via interacting with the
environment using the expert policy for each episode.
D. Extracting the decision tree
Dataset Decision tree The dataset is then utilized to learn a decision tree with a
Extract the decision tree
supervised learning method where the states and actions are
used as features and labels, respectively. The decision tree is
Explainable policy built by splitting the dataset with CART and ID3 methods,
learning from the root node of the tree and selecting the
Fig. 1. Overall architecture of extracting decision trees from the trained deep successor children.2 To measure the divergence between the
Q-network.
probability distributions of the features, we employ the Gini
index as an impurity-based criterion. Besides, we utilize the
collected dataset and explain the DRL model disregarding its pre-pruning methods to prevent the generation of unimportant
inner processing or internal representations. branches resulting in overfitting the training data and poorly
generalizing to test data. Compared to neural networks, the
B. Obtaining the expert policy model decision tree can explain how they arrive to a decision. The
In this stage, we can employ any learning methods to obtain obtained decision tree is a binary tree, in which the branch
the expert policy. For example, it could be created by utilizing nodes represent a condition regarding the input feature (which
any traffic signal control algorithms (e.g., evolutionary algo- is the state space), and the leaf nodes represent the action
rithms and artificial neural networks) or reinforcement learning labels.
algorithms. The core task of this stage is utilizing any existing E. Integrated Algorithm
method to obtain a traffic signal control policy. The expert
The integrated algorithm of extracting the decision tree from
policy, π ∗ : S × A → [0, 1], defines how a model or object
DRL is summarized as in Algorithm 1. First, in the traffic
interacts with the environment by mapping from perceived en-
P signal control task, the agent obtains the expert policy π ∗
vironmental states to actions, and a∈A π(a|s) = 1, ∀s ∈ S.
by using the DQN algorithm (Line 3). Second, we employ
That is, at each step t, the traffic signal controller can observe
the expert policy π ∗ to generate state-action pairs dataset by
state st , query the expert policy π ∗ to obtain action at , receive
interacting with the environment (Lines 4-12). Third, after
a new observation st+1 by taking action at , and repeat.
obtaining the dataset, we employ the decision tree methods
To obtain the expert policy, we employ DRL, specifically the
to fit the expert policy and improve the explainability of the
DQN algorithm which combines Q-learning and deep neural
DRL model (Lines 13 and 14).
networks. 1 The DRL algorithm is widely used for discrete
2 See Appendix B and C for more details of the ID3 and CART algorithms,
1 See Appendix A for more details of the DQN algorithm. respectively.

Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech. Downloaded on February 16,2025 at 08:32:46 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1: Extracting Decision Tree from DRL
Input: Learning rate α; replay memory size M ;
greedy ε; minibatch size B; discount factor γ.
Output: expert policy π ∗ , decision tree
0
1 Initialize parameters θ, θ in the primary and target
neural networks with random values.
2 Initialize the replay memory m to be empty.
∗
3 π ← Get expert policy using the DQN algorithm.
4 for total reward keep a high and stable performance
do each episode
(a) Case A (b) Case B
5 Initialize state s
6 for η up to Tmax do each step of episode
7 Choose an action a according to the expert
policy π ∗ .
8 Take action a, observe reward r0 , new state s0 .
9 Store the state-action pair (st , at ) into the
dataset. D.
10 s ← s0
11 end
12 end
13 Clean the data.
(c) Case C (d) Case D
14 Learn the decision tree using the CART or ID3
algorithms. Fig. 2. Cases A, B, C and D are four possible actions.

IV. E XPERIMENT all-yellow phase which appears lasting for ty [s] seconds
before the second action taken. Here, we prefer the
In this section, we conduct experiments in a single- duration of ty = 2s to ensure a safe switch between
intersection traffic signal control task on the Urban MObility two different continuous phases.
(SUMO) platform [21]. The SUMO platform can achieve real- • Reward setting: The setting of reward is formulated as
time visualization on the network and simulated entities. It follows:
also has various vehicle behaviour models, roads with several X
lanes and traffic lights. Besides, SUMO supports built-on rth = − (queuet+∆t [l] + a · waitt+∆t [l]) , (9)
computational frameworks for RL and provides APIs for road ih∈ε,l∈Lih
network design, traffic volume simulation and traffic light
control. Specifically, SUMO can control the time of traffic where a is a trade-off coefficient, waitt and queue
signals according to the given policy. In addition, we can measure the cumulative delay of the first vehicle and the
import real-world road networks from OpenStreetMap to it, queue length along each incoming lane, respectively. We
and the vehicles can get into the system from any position in set ∆t = 5s, which means the interaction between the
the network. RL agents and the traffic environment appears every 5s.

A. RL settings B. Experimental settings

To implement the DQN algorithm for the single-intersection As shown in Fig. 2a, the single-intersection is made up of
traffic signal control task, the settings of the states, actions and both N-S and E-W four-lane arterial streets, with speed limits
reward functions are listed as follows. of 20m/s and 11m/s, respectively. Besides, the length of each
• State space: The definition of the state combines the lane is 750m and the limit of turn speed is 5.5m/s. There are
ideas of [22] and [23]. The state is defined for one 1000 cars appearing randomly at four lanes for each episode.
intersection with a piece of position information within To better explain the expert policy, we employ the DQN
150m to the intersection. The position dimension is a algorithm to learn the expert policy in the single-intersection
binary value, in which if there is a vehicle in a grid, the traffic signal control task, where the neural network has 1419
value in the grid is 1; otherwise it is 0. parameters consisting of two hidden layers with 24 and 48
• Action space: As shown in Fig. 2, in our experimental units with ReLu activation. The hyperparameters are the same
settings, there are four phases for the traffic lights, for all tested algorithms in each group of experiments: learning
including N-S straight phase, N-S left-turn phase, E-W rate α = 0.01, memory size M = 50000, batch size 100 and
straight phase and E-W left-turn phase. If the continuous discount factor γ = 0.75. The max steps of each episode are
two executing the actions are different, we will take an set as 5400, which means that each episode lasts 5400 seconds.

Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech. Downloaded on February 16,2025 at 08:32:46 UTC from IEEE Xplore. Restrictions apply.
the E-W lanes, the decision tree model gives action 2, which
means executing the E-W straight phase to reduce the traffic
waiting time. To better visually present the explainability of
the expert policy, we set the max depth of CART algorithms as
3. As shown in Fig. 4, we can easily understand the decision
process in this reduced case.
When lower complexity brings more understandability, the
accuracy of the decision tree is slipped. As shown in Fig. 5,
the CART achieves better performance when the max depth
of the CART increases. The max depth of the decision tree 5
is specified by 190 parameters, which means this tree requires
about 13% of the parameters needed to specify the DQN.
Besides, with less parameters learned, decision tree methods
require less training time than the DQN. Generally, using an
appropriate tree depth setting of the decision tree can have a
Fig. 3. The performance of the tested methods implemented by DQN, ID3 better explanation without lossing the performance.
and CART in a single-intersection traffic signal control task.
V. C ONCLUSION

C. Experiment Results In this paper, we propose a method of extracting decision

tree from deep reinforcement learning for the traffic signal
In the experiment, we first utilize the DQN algorithm to control task. We obtain the expert policy from the trained
obtain the expert policy. As shown in Fig. 3, we can easily DQN method and employ the ID3 and CART methods to
find that the performance of the DQN algorithm keeps stable extract the domain knowledge for the traffic signal control
when the number of the episode is up to 200, and save the task. The achieved decision tree has a simpler structure and is
model as the expert policy model. more understandable regardless of the structure of the complex
We employ the expert policy to generate 20 trajectories for expert policy model, which can push them toward algorith-
training the decision tree. When training the decision tree, the mically transparent models. Our future work will focus on
dataset is divided into training and testing set according to more explainable models for more practical machine learning
8 : 2. In ID3 algorithm, we employ the information gain as application scenarios.
the impurity measure. To prevent overfitting, we control the
size of the tree by limiting the max tree depth to 9. To avoid A PPENDIX A: DQN ALGORITHM
low-variance in regression problem, we set the minimal split Algorithm 2 shows the pseudocode of DQN and more
samples and samples leaf to create arbitrarily small leaves and details can be found in [12].
guarantee that each leaf has a minimum size, respectively. In
the CART algorithm, we employ the Gini index to measure the A PPENDIX B: ID3 ALGORITHM
divergence between the probability distributions of the target
Algorithm 3 shows the pseudocode of ID3 and more details
attribute’s values. We set the minimal split samples as 10 and
can be found in [17].
the other parameters are the same as those of ID3.
First, we employ the ID3 and CART methods to fit the A PPENDIX C: CART ALGORITHM
training dataset. To avoid the overfitting problem, we set
more limit parameters, resulting in reducing the accuracy, and Algorithm 4 shows the pseudocode of CART and more
obtain 83.32% and 84.25% accuracy in the testing dataset. details can be found in [18].
Then we utilize the learned ID3 and CART model to verify
ACKNOWLEDGMENT
their performance in the traffic signal control task. As shown
in Fig. 3, compared to the expert policy π ∗ , the ID3 and This work was supported in part by the National
CART methods obtain slightly superior performance. The total Key Research and Development Program of China (No.
rewards of ID3 and CART methods are increased by 4.93% 2018AAA0101100) and the National Natural Science Foun-
and 3.94%, respectively, which may benefit from its structure dation of China (Nos. 62073160 & 62006111).
for learning the domain knowledge of the expert policy.
R EFERENCES
With the achieved decision tree, we can easily fulfill ev-
ery constraint for transparency. For example, when the state [1] Z. Cao, H. Guo, J. Zhang, D. Niyato, and U. Fastenrath, “Finding the
shortest path in stochastic vehicle routing: A cardinality minimization
satisfies the condition if state[40] = 1 and state[49] = 1 approach,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 6, pp. 1688–
and state[50] = 0 and state[20] = 1 and state[1] = 1 1702, 2015.
and state[2] = 0 and state[21] = 0 and state[41] = 1 [2] D. Wilkie, J. Van den Berg, M. Lin, and D. Manocha, “Self-aware traffic
route planning,” in Proc. 25th AAAI Conf. Artif. Intell., 2011.
and state[42] = 0, the agent will take action 2 given by the [3] S. Sen and K. L. Head, “Controlled optimization of phases at an
decision tree. Owing to the fact that there are more vehicles in intersection,” Transp. Sci., vol. 31, no. 1, pp. 5–17, 1997.

True False

state[9] ≤ 0.5 state[40] ≤ 0.5

0.562 0.686
7432 6420
[1148, 4570, 1361, 353] [2565, 777, 2251, 827]
action: 1 action: 0

state[29] ≤ 0.5 state[60] ≤ 0.5 state[30] ≤ 0.5 state[50] ≤ 0.5

0.424 0.669 0.625 0.594
5998 1434 3492 2928
[676, 4441, 727, 154] [472, 129, 634, 199] [1941, 531, 545, 475] [624, 246, 1706, 352]
action: 1 action: 2 action: 0 action: 2

0.317 0.631 0.628 0.582 0.554 0.633 0.503 0.678

5317 681 1023 411 2584 908 2009 919
[472, 4349, 381, 115] [204, 92, 346, 39] [226, 92, 554, 151] [246, 37, 80, 48] [1608, 106, 464, 406] [333, 425, 81, 69] [454, 185, 1328, 42] [170, 61, 378, 310]
action: 1 action: 2 action: 2 action: 0 action: 0 action: 1 action: 2 action: 2

Fig. 4. Demonstration of how a CART makes a decision given the information of the state. The max depth of the CART is set as 3.

Algorithm 2: Deep Q-learning with experience replay

Input: Learning rate α; replay memory size M ; greedy ε; minibatch size B; discount factor γ.
Output: expert policy π ∗ , decision tree
1 Initialize replay memory D to capacity N
2 Initialize action-value function Q with random weights θ
−
3 Initialize traget action-value function Q̂ with weights θ = θ
4 for episode = 1, M do
5 Initialize sequence s1 = {x1 } and preprocessed sequence φ1 = φ (s1 )
6 for t = 1, T do
7 With probability select a random action at
8 Otherwise select at = argmaxa Q (φ (st ) , a; θ)
9 Execute action at in emulator and observe reward rt and observation st1
10 Set st+1 = st , at , st+1 and preprocess φt+1 = φ (st+1 )
11 Store transition (φt , at , rt , φt+1 ) in D
12 Sample random
minibatch of transitions (φj , aj , rj , φj+1 ) from D
rj if episode terminates at step j + 1
13 Set yj =
rj + γ maxa0 Q̂ (φj+1 , a0 ; θ− ) otherwise
2
14 Perform a gradient descent step on (yj − Q (φj , aj ; θ)) with respect to the network parameters θ
15 Every C steps reset Q̂ = Q
16 end
17 end

[4] S. G. Shelby, D. M. Bullock, D. Gettman, R. S. Ghaman, Z. A. Sabra, [7] N. V. Findler, S. Surender, and Ş. Catrava, “On-line decisions about
and N. Soyke, “An overview and performance evaluation of acs lite–a permitted/protected left-hand turns in distributed traffic signal control,”
low cost adaptive signal control system,” in Transp. Res. Board, vol. Eng. Appl. Artif. Intell., vol. 10, no. 3, pp. 315–320, 1997.
190, 2008, pp. 130–137. [8] Z. Liu, “A survey of intelligence methods in urban traffic signal control,”
[5] C. P. Pappis and E. H. Mamdani, “A fuzzy logic controller for a trafc Int. J. Comput. Sci. Netw. Secur., vol. 7, no. 7, pp. 105–112, 2007.
junction,” IEEE Trans. Sys., Man, Cybern., vol. 7, no. 10, pp. 707–717, [9] C. Dong, Z. Liu, and X. Liu, “Chaos-particle swarm optimization
1977. algorithm and its application to urban traffic control,” Int. J. Comput.
[6] W. Choi, H. Yoon, K. Kim, I. Chung, and S. Lee, “A traffic light Sci. Netw. Secur., vol. 6, no. 1B, pp. 97–101, 2006.
controlling flc considering the traffic congestion,” in Proc. AFSS Int. [10] Y. Wen and T. Wu, “Reduced-order rolling horizon optimization of traffic
Conf. Fuzzy Syst. Springer, 2002, pp. 69–75. control based on ant algorithm,” J. Zhejiang Univ. Eng. Sci., vol. 39,

Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech. Downloaded on February 16,2025 at 08:32:46 UTC from IEEE Xplore. Restrictions apply.
Algorithm 4: Classification and Regression Trees
(CART)
Input: Attributes set A = {a1 , a2 , . . . , ad }; Training
set D = {(x1 , y1 ) , (x2 , y2 ) , . . . , (xm , ym )}.
1 Function: TreeGenerate (D, A)
Output: A decision tree with node as the root
2 Generate node for the tree
3 if all stopping criteria have been met then
4 TreeGenerate has one node with the most
common class in D as label
5 else
6 Find a ∈ A that best splits D using impurity
function
7 Label node with a
8 for possible value v of a do
Fig. 5. The performance of the CART methods implemented with different 9 D = the subset of X that have v = a
max tree depth settings in a single-intersection traffic signal control task. 10 A = attribute set A with the best split
attribute a
Algorithm 3: Iterative Dichotomiser 3 (ID3) 11 Growing TreeGenerate (Dv , A)
12 Connect the new node to the root node
Input: Attributes set A = {a1 , a2 , . . . , ad }; Training
with label v
set D = {(x1 , y1 ) , (x2 , y2 ) , . . . , (xm , ym )}. 13 end
1 Function: TreeGenerate (D, A) 14 end
Output: A decision tree with node as the root 15 end
2 Generate node for the tree
3 if all members of examples are in the same class C
then
et al., “Human-level control through deep reinforcement learning,”
4 Root node = C nature, vol. 518, no. 7540, pp. 529–533, 2015.
5 else if attributes is empty then [13] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van
6 Set root node with the most common value of Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,
M. Lanctot et al., “Mastering the game of go with deep neural networks
target attributes in examples and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
7 else [14] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust
8 Set A with the element in attributes that region policy optimization,” in Int. Conf. Mach. Learn. PMLR, 2015,
pp. 1889–1897.
maximizes information gain value [15] Z. Wang, C. Chen, H.-X. Li, D. Dong, and T.-J. Tarn, “Incremental
9 end reinforcement learning with prioritized sweeping for dynamic environ-
10 end ments,” IEEE/ASME Trans. Mechatronics, vol. 24, no. 2, pp. 621–632,
11 end 2019.
[16] N. Frosst and G. Hinton, “Distilling a neural network into a soft decision
12 A is the decision attribute for root node tree,” arXiv preprint arXiv:1711.09784, 2017.
13 for each possible value v of A do [17] O. Bastani, Y. Pu, and A. Solar-Lezama, “Verifiable reinforcement
14 Add a branch below root node, testing for A = v learning via policy extraction,” arXiv preprint arXiv:1805.08328, 2018.
[18] J. R. Quinlan, “Induction of decision trees,” Mach. learn., vol. 1, no. 1,
15 Set Dv as the subset of examples with A pp. 81–106, 1986.
16 if Examples v is empty then [19] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification
17 Set below branch add leaf with the most and regression trees. Routledge, 2017.
[20] M. Bramer, Principles of data mining. Springer, 2007, vol. 180.
common value of target attribute in examples [21] M. Behrisch, L. Bieker, J. Erdmann, and D. Krajzewicz, “Sumo–
else simulation of urban mobility: an overview,” in Pro. 3rd Int. Con. Ad.
18 Below branch add sub node Syst. Sim. ThinkMind, 2011.
[22] L. Prashanth and S. Bhatnagar, “Reinforcement learning with function
TreeGenerate (Dv , A) approximation for traffic signal control,” IEEE Trans. Intell. Transp.
19 end Syst., vol. 12, no. 2, pp. 412–421, 2010.
20 end [23] M. Aslani, M. S. Mesgari, and M. Wiering, “Adaptive traffic signal
control with actor-critic methods in a real-world traffic network with
21 end
different traffic disruption events,” Transp. Res. Pt. C-Emerg. Technol.,
vol. 85, pp. 732–752, 2017.

no. 6, p. 835, 2005.

[11] R. K.-M. Sheh, ““ why did you do that?” explainable intelligent robots,”
in Proc. 31th AAAI Conf. Artif. Intell., 2017.
[12] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski

Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech. Downloaded on February 16,2025 at 08:32:46 UTC from IEEE Xplore. Restrictions apply.

Deep Reinforcement Learning For Power System
No ratings yet
Deep Reinforcement Learning For Power System
13 pages
Explainability in Deep Reinforcement Learning, A Review Into Current Methods and Applications
No ratings yet
Explainability in Deep Reinforcement Learning, A Review Into Current Methods and Applications
32 pages
Transfer Learning in Deep Reinforcement Learning A Survey
No ratings yet
Transfer Learning in Deep Reinforcement Learning A Survey
19 pages
Comprehensive Survey of Reinforcement Learning From Algorithms To Practical Challenges
No ratings yet
Comprehensive Survey of Reinforcement Learning From Algorithms To Practical Challenges
79 pages
Deep Reinforcement Learning For Traffic Signal Control A Review - 2020
No ratings yet
Deep Reinforcement Learning For Traffic Signal Control A Review - 2020
29 pages
Deep Learning Wireless Networks
No ratings yet
Deep Learning Wireless Networks
27 pages
Deep Reinforcement Learning For Smart Grid Operations Algorithms Applications and Prospects
No ratings yet
Deep Reinforcement Learning For Smart Grid Operations Algorithms Applications and Prospects
42 pages
Explainable Reinforcement Learning: A Survey and Comparative Review
No ratings yet
Explainable Reinforcement Learning: A Survey and Comparative Review
36 pages
Explainable Reinforcement Learning A Survey
No ratings yet
Explainable Reinforcement Learning A Survey
21 pages
1 s2.0 S0004370223000437 Main
No ratings yet
1 s2.0 S0004370223000437 Main
21 pages
Modern Deep Reinforcement Learning Algorithms
No ratings yet
Modern Deep Reinforcement Learning Algorithms
56 pages
Survey On Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
No ratings yet
Survey On Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
22 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Decision Trees in Reinforcement Learning
No ratings yet
Decision Trees in Reinforcement Learning
6 pages
A Deep Reinforcement Learning Approach For Fair Traffic Signal Control
No ratings yet
A Deep Reinforcement Learning Approach For Fair Traffic Signal Control
7 pages
Deep Learning and Reinforcement Learning Approach On Microgrid
No ratings yet
Deep Learning and Reinforcement Learning Approach On Microgrid
19 pages
1 s2.0 S0952197623018043 Main
No ratings yet
1 s2.0 S0952197623018043 Main
20 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
155 pages
Survey of Deep Reinforcement Learning
No ratings yet
Survey of Deep Reinforcement Learning
16 pages
Challenges and Opportunities in Deep Reinforcement Learning With Graph Neural Networks: A Comprehensive Review of Algorithms and Applications
No ratings yet
Challenges and Opportunities in Deep Reinforcement Learning With Graph Neural Networks: A Comprehensive Review of Algorithms and Applications
19 pages
Akash Soft Computing
No ratings yet
Akash Soft Computing
28 pages
Deep Reinforcement Learning in Cybersecurity
No ratings yet
Deep Reinforcement Learning in Cybersecurity
30 pages
Enhanced DDPG Car-Following Model
No ratings yet
Enhanced DDPG Car-Following Model
34 pages
The History Began From AlexNet - A Comprehensive Survey On Deep Learning Approaches
No ratings yet
The History Began From AlexNet - A Comprehensive Survey On Deep Learning Approaches
39 pages
Advancements in Deep Reinforcement Learning and Inverse Reinforcement Learning For Robotic Manipulation Toward Trustworthy Interpretable and Explainable Artificial Intelligence
No ratings yet
Advancements in Deep Reinforcement Learning and Inverse Reinforcement Learning For Robotic Manipulation Toward Trustworthy Interpretable and Explainable Artificial Intelligence
19 pages
Deep Learning Book by François Fleuret
No ratings yet
Deep Learning Book by François Fleuret
149 pages
Chen and Wang 2023
No ratings yet
Chen and Wang 2023
11 pages
Neural Logic Networks For Interpretable Classification: Vincent Perreault
No ratings yet
Neural Logic Networks For Interpretable Classification: Vincent Perreault
49 pages
Deep Reinforcement Learning Based Approach For Tra C Signal Control Deep Reinforcement Learning Based Approach For Tra C Signal Control
No ratings yet
Deep Reinforcement Learning Based Approach For Tra C Signal Control Deep Reinforcement Learning Based Approach For Tra C Signal Control
8 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
144 pages
Technologies 10 00005
No ratings yet
Technologies 10 00005
11 pages
SDN Load Balancing with DRL Safety
No ratings yet
SDN Load Balancing with DRL Safety
42 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
7 pages
Neural Networks: Dawid Połap, Marcin Woźniak
No ratings yet
Neural Networks: Dawid Połap, Marcin Woźniak
11 pages
Deep Neural Decision Trees: Yongxin Yang Irene Garcia Morillo Timothy M. Hospedales
No ratings yet
Deep Neural Decision Trees: Yongxin Yang Irene Garcia Morillo Timothy M. Hospedales
7 pages
Machine Learning and Deep Learning Techniques Used in Cybersecurity and Digital Forensics: A Review
No ratings yet
Machine Learning and Deep Learning Techniques Used in Cybersecurity and Digital Forensics: A Review
91 pages
Abstract-Deep Learn-Ing (DL) Is Becoming A Powerful Method To Add Intelligence To
No ratings yet
Abstract-Deep Learn-Ing (DL) Is Becoming A Powerful Method To Add Intelligence To
4 pages
2023.05.03 The Little Book of Deep Learning
No ratings yet
2023.05.03 The Little Book of Deep Learning
143 pages
Supporting Future Electrical Utilities - Infotech
No ratings yet
Supporting Future Electrical Utilities - Infotech
6 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
UNIT-2 Goyal Question & Answer
No ratings yet
UNIT-2 Goyal Question & Answer
5 pages
Deep Learning Essentials
No ratings yet
Deep Learning Essentials
143 pages
Electronics 08 00292 PDF
No ratings yet
Electronics 08 00292 PDF
67 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
Alexnet Paper
No ratings yet
Alexnet Paper
39 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
Deep Learning Fundamentals Guide
No ratings yet
Deep Learning Fundamentals Guide
142 pages
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
100% (7)
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
368 pages
10 1016@j Tej 2020 106890
No ratings yet
10 1016@j Tej 2020 106890
8 pages
Behave-XAI: Deep Explainable Learning of Behavioral Representational Data
No ratings yet
Behave-XAI: Deep Explainable Learning of Behavioral Representational Data
5 pages
Rule Extraction Algorithm For Deep Neural A Review
No ratings yet
Rule Extraction Algorithm For Deep Neural A Review
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Adaptive Traffic Light Control Using Deep Reinforcement Learning Technique
No ratings yet
Adaptive Traffic Light Control Using Deep Reinforcement Learning Technique
22 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
Deep Learning: A Technical Survey
No ratings yet
Deep Learning: A Technical Survey
75 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
DRL Applications in 5G Network Optimization
No ratings yet
DRL Applications in 5G Network Optimization
9 pages
LBDL
No ratings yet
LBDL
185 pages
Performance of Parallel Flow Intersections
No ratings yet
Performance of Parallel Flow Intersections
12 pages
IEEE 2021 DQL Double Deep
No ratings yet
IEEE 2021 DQL Double Deep
6 pages
IEEE DQL Regional Network
No ratings yet
IEEE DQL Regional Network
5 pages
Performance of Parallel Flow Intersections
No ratings yet
Performance of Parallel Flow Intersections
7 pages
Unconventional Arterial Intersection Designs Initiatives: Minseok Kim, Xiaorong Lai, Gang-Len Chang, and Saed Rahwanji
No ratings yet
Unconventional Arterial Intersection Designs Initiatives: Minseok Kim, Xiaorong Lai, Gang-Len Chang, and Saed Rahwanji
6 pages
08 Vojo Andjus - Put - Potencijalni Uzrog Dogadjanja Saobracajnih Nezgoda
No ratings yet
08 Vojo Andjus - Put - Potencijalni Uzrog Dogadjanja Saobracajnih Nezgoda
1 page
Traffic Signal Control Strategy
No ratings yet
Traffic Signal Control Strategy
8 pages
Rutiranje Vozila U Skladistima
No ratings yet
Rutiranje Vozila U Skladistima
6 pages
VAP 2.16 Interface User Manual: PTV Planung Transport Verkehr AG
No ratings yet
VAP 2.16 Interface User Manual: PTV Planung Transport Verkehr AG
42 pages
The MMF Rerouting Computation Problem
No ratings yet
The MMF Rerouting Computation Problem
7 pages
Container Management in Multimodal Transport
No ratings yet
Container Management in Multimodal Transport
12 pages
Ekonomski Uticaj Saobracajnih Nezgoda
No ratings yet
Ekonomski Uticaj Saobracajnih Nezgoda
304 pages
Equity Offset Exampe
No ratings yet
Equity Offset Exampe
61 pages
Logistics Tracking Innovations
No ratings yet
Logistics Tracking Innovations
10 pages
(Ebook PDF) Mastering Competencies in Family Therapy: A Practical Approach To Theory and Clinical Case Documentation 3rd Edition Instant Download
100% (7)
(Ebook PDF) Mastering Competencies in Family Therapy: A Practical Approach To Theory and Clinical Case Documentation 3rd Edition Instant Download
39 pages
Muhammad Bilal
No ratings yet
Muhammad Bilal
2 pages
Nikhil-Rajani Resume
No ratings yet
Nikhil-Rajani Resume
2 pages
45 - 01.10.2021 - EPSC Presentation Bilfinger Tebodin
No ratings yet
45 - 01.10.2021 - EPSC Presentation Bilfinger Tebodin
31 pages
Shaswat Resume (1) - 1
No ratings yet
Shaswat Resume (1) - 1
1 page
Algorithmic Auditorium Design Model
No ratings yet
Algorithmic Auditorium Design Model
10 pages
Owner's Manual (This Document) PDF Manual (Download From The Web) To Obtain The PDF Manual I
No ratings yet
Owner's Manual (This Document) PDF Manual (Download From The Web) To Obtain The PDF Manual I
18 pages
Task 2 Module 9
No ratings yet
Task 2 Module 9
4 pages
Report LAB04
No ratings yet
Report LAB04
5 pages
Kits 1Hz - 50MHz Crystal Oscillator Frequency Counter Meter PDF
No ratings yet
Kits 1Hz - 50MHz Crystal Oscillator Frequency Counter Meter PDF
4 pages
DT50 Accessories & Charging Solutions
No ratings yet
DT50 Accessories & Charging Solutions
10 pages
C18T2S3 - Adam and Michelle's Presentation On Laki Eruption
No ratings yet
C18T2S3 - Adam and Michelle's Presentation On Laki Eruption
2 pages
Unit 4 5TH Basic
No ratings yet
Unit 4 5TH Basic
12 pages
Hydraulic Structures and Design of Open Channels Solved Problems
No ratings yet
Hydraulic Structures and Design of Open Channels Solved Problems
19 pages
Group 9 Doors Wood Aluminum and Steel
No ratings yet
Group 9 Doors Wood Aluminum and Steel
3 pages
How To Break Software: James A. Whittaker
No ratings yet
How To Break Software: James A. Whittaker
8 pages
Piaget's Stages of Cognitive Development
No ratings yet
Piaget's Stages of Cognitive Development
23 pages
09 General English - Business Travel
No ratings yet
09 General English - Business Travel
15 pages
XVR User Manual
No ratings yet
XVR User Manual
172 pages
Pradhan Mantri Matru Vandana Yojana
No ratings yet
Pradhan Mantri Matru Vandana Yojana
3 pages
Zontes 310V Leaflet - FA
No ratings yet
Zontes 310V Leaflet - FA
2 pages
Grade 9 Technology Final Exam
No ratings yet
Grade 9 Technology Final Exam
8 pages
Aztec Empire Thesis Statement
100% (3)
Aztec Empire Thesis Statement
6 pages
Tanglewood HR Hiring Strategy Overview
0% (2)
Tanglewood HR Hiring Strategy Overview
4 pages
Modal Verbs for Past Speculation
No ratings yet
Modal Verbs for Past Speculation
4 pages
Soil Mechanics - KNS2123: Tutorial Compaction
No ratings yet
Soil Mechanics - KNS2123: Tutorial Compaction
5 pages
CS401 Quiz 1 Solved by VU Answer
No ratings yet
CS401 Quiz 1 Solved by VU Answer
24 pages
Biography Text
No ratings yet
Biography Text
9 pages
Machine Element Design Assignment
No ratings yet
Machine Element Design Assignment
3 pages
Short Dump MESSAGE - TYPE - X - Delete Project
No ratings yet
Short Dump MESSAGE - TYPE - X - Delete Project
29 pages