0% found this document useful (0 votes)
343 views1 page

Reinforcement Learning Syllabus

Uploaded by

veeraanusuyacse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
343 views1 page

Reinforcement Learning Syllabus

Uploaded by

veeraanusuyacse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

REINFORCED LEARNING L T P C

3 0 0 3
Course Outcomes (COs):
Upon completion of this course, the students will be able to
CO1:Understand the basics of reinforcement learning (CDL1)
CO2:Apply deep learning architectures to solve the problem (CDL2)
CO3:Analyze reinforcement Learning algorithms in uncertain conditions
(CDL2)
CO4:Apply Dynamic Programming for Markov Decision Process (CDL2)
CO5:Analyze Temporal Difference Methods (CDL2)

CO 1 – Understand the basics of reinforcement learning(CDL1)


Reinforcement Learning - A Preamble - Reinforcement Learning Frameworks: Problems and
Solutions –An extended example: Tic Tac Toe-Limitations and scope

CO 2 – Apply deep learning architectures to solve the problem (CDL2)


Build and Train Neural Networks, Convolutional Neural Networks - Bandit Algorithms -
Deep Q-Learning – DeepQ-Network - Double Deep Q-Network - Dueling-DQN - Case
Study: Leveraging Neural Networks to predict machine failures that learns intelligent
behaviors from sensory data

CO 3 – Analyze reinforcement Learning algorithms in uncertain conditions(CDL2)


Evolutionary Algorithms, Stochastic Policy Search, Reinforcement Algorithms - Improving
Policy Gradient Methods - Generalised Advantage Estimation - Policy Optimization method:
Trust Region PolicyOptimization (TRPO) - Case Study: Deep Reinforcement Learning for
Robotics (Robotic arm/ four legged creaturewalk
CO 4 – Apply Dynamic Programming for Markov Decision Process(CDL2)
Dynamic Programming (DP): Overview of dynamic programming for MDP, principle of
optimality, Policy Evaluation, Policy Improvement, policy iteration, value iteration,
Generalized Policy Iteration. Monte Carlo Methods for Prediction and Control

CO 5 – Analyze Temporal Difference Methods(CDL2)


Temporal Difference Methods: TD Prediction, Optimality of TD (0), TD Control methods -
SARSA, QLearning and their variants
L: 45; TOTAL: 45 PERIODS
TEXT BOOKS:
1. Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT
Press 2020/Bradford Books 2018, Second Edition.
2. Dong, Hao, Ding, Ziha, Zhang, Shanghang (Eds.), Deep Reinforcement Learning
Fundamentals, Research and Applications 2020
REFERENCES
1. Warren B. Powell, Reinforcement Learning and Stochastic Optimization: A Unified
Framework for Sequential Decisions, Wiley, 2022,
2. Csaba Szepesvari, Algorithms for Reinforcement Learning, Morgan & Claypool, 2010, First
edition.

You might also like