CS-866 Deep Reinforcement Learning
Introduction
Nazar Khan
Department of Computer Science
University of the Punjab
Introduction Supervised ML Unsupervised ML Reinforcement Learning
What is Deep Reinforcement Learning?
Deep RL studies how to solve complex problems that require making a
sequence of good decisions.
I
I These problems often live in high-dimensional state spaces:
I Many variables must be considered simultaneously.
I Example: In chess, the position of each piece denes the state; there are
more possible states than atoms in the universe.
I Example: In robotics, sensors may produce hundreds or thousands of
readings per time step.
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Examples of Sequential Decision-Making
Making Tea: wait until water is boiling, add tea leaves, adjust milk,
control sweetness, simmer for avor, strain before serving.
I
Tic-Tac-Toe: sequences of moves, opponent's responses, and planning
ahead.
I
Chess: much more complex version of tic-tac-toe with astronomical state
space.
I
Having a Conversation: listen to the other person, interpret context,
choose a relevant response, maintain ow, achieve an agenda.
I
Success comes from a sequence of decisions, not a single one. Each
decision has an immediate consequence and a long-term consequence.
An RL agent learns through trial-and-error.
Introduction Supervised ML Unsupervised ML Reinforcement Learning
State Spaces
X
O
O X
Tic-Tac-Toe Chess
3 = 19,683 possible boards.
9 ≈ 10 possible states.
47
Go Conversation
≈ 10 possible states.
170 Innite possible states.
Introduction Supervised ML Unsupervised ML Reinforcement Learning
What is Deep Reinforcement Learning?
I Combination of deep learning + reinforcement learning
I Goal: learn optimal actions that maximize reward across all states
I Works in high-dimensional, interactive environments
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Deep Learning
I Function approximation in high dimensions
I Uses deep neural networks
I Examples: speech recognition, image classication
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Reinforcement Learning
I Learns from trial and error, not from xed datasets
I Feedback comes from the environment (reward / punishment)
I Builds a policy: which action to take in each state
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Where DRL Fits
Low-Dimensional High-Dimensional
Static Dataset Supervised Learning Deep Supervised Learning
Interaction Tabular RL Deep RL
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Applications of DRL
I Robotics: locomotion, manipulation, pancake ipping, helicopters
I Games: Chess, Go, Pac-Man, StarCraft
Real-world: healthcare, nance, recommender systems, energy grids,
ChatGPT
I
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Four Related Fields
1. Psychology
I Conditioning: Pavlov's dog
I Operant conditioning (Skinner)
I Learning from reinforcement is a core AI idea
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Four Related Fields
1. Psychology
: A natural reaction to food is that a dog salivates. By ringing a
bell whenever the dog is given food, the dog learns to associate the sound with
Pavlov's dog
food, and after enough trials, the dog starts salivating as soon as it hears the
bell, presumably in anticipation of the food, whether it is there or not.
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Four Related Fields
2. Mathematics
I Markov Decision Processes (MDPs)
I Optimization, planning, graph theory
I Symbolic AI: search, reasoning, theorem proving
Andrei Markov (1856-1922)
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Four Related Fields
3. Engineering
I Known as optimal control in engineering.
I Focus on dynamical systems.
Bellman and Pontryagin's work in optimal control laid the foundation of
RL.
I
Two space vehicles docking Richard Bellman (1920-1984) Lev Pontryagin (1908-1988)
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Four Related Fields
4. Biology
I Connectionism: swarm intelligence, neural networks
I Nature-inspired algorithms: ant colony, evolutionary algorithms
Biological Neuron Articial Neural Network Hinton, LeCun, Bengio
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Three Paradigms of Machine Learning
Machine Learning studies how to approximate functions f : X → Y from
data.
I
I Often, functions are not known analytically.
I Instead, we them from observations.
learn
I Three main paradigms:
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Functions in AI
I A function transforms input x to output y : f (x) → y .
I More generally: f : X → Y , where X , Y can be discrete or continuous.
I Real-world functions may be stochastic: f : X → p(Y ).
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Given vs. Learned Functions
I Sometimes f is given exactly (laws of physics, explicit algorithms).
Example: Newton's 2nd Law F = m · a.
I Often, f is unknown and must be approximated from data.
I This is the domain of machine learning.
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Supervised Learning
I Data: example pairs (x, y ).
I Goal: learn a function fˆ that predicts y from x .
I Common tasks:
I Regression: predict a continuous value.
I Classication: predict a discrete category.
I Loss function measures prediction error, e.g. MSE or cross-entropy.
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Example: Regression
Figure: Blue: data points. Red: learned linear function ŷ = ax + b.
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Example: Classication
Cat Cat Dog Cat Dog Dog
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Unsupervised Learning
I No labels: only input data x .
I Goal: nd structure in data (clusters, latent variables).
I Examples:
I k -means clustering
I Principal Component Analysis (PCA)
I Autoencoders
I Learns p(x) instead of p(y |x).
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Reinforcement Learning
I Third paradigm of machine learning
I Learns by interaction with the environment
I Data comes sequentially (one state at a time)
I Objective: learn a policy a function mapping states to the best actions
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Agent and Environment
Figure: Agent interacts with Environment to maximize reward.
I Agent: Learner/decision-maker
I Environment: Provides feedback and state transitions
I Goal: maximize long-term accumulated reward
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Key Dierences from Supervised/Unsupervised Learning
1. Interaction-based: No pre-collected dataset; data generated dynamically
via interaction between agent and environment
2. Reward signal: Partial numeric feedback, not full labels; UL has no
labels, RL has reward, SL has complete labels
3. Sequential decision-making: Learns policies across multiple steps
In RL there is no teacher or supervisor, and there is no static
dataset.
I
RL learns a policy for the environment by interacting with it and
receiving rewards and punishments.
I
SL can classify a set of images for you; UL can tell you which items
belong together; RL can tell you the winning of moves in
I
a game of chess, or the action- that robot-legs need to
sequence
take in order to walk.
sequence
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Supervised vs Reinforcement Learning
Concept Supervised Learning Reinforcement Learning
Inputs x Full dataset One state at a time
Labels y Full (correct action) Partial (numeric reward)
Table: Comparison of paradigms
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Implications of RL Paradigm
I Data is generated step-by-step ⇒ suited for sequential problems
I Risk of circular feedback (policy both selects and learns from actions)
I RL can continue to learn indenitely if environment is challenging
I Example: Chess, Go, robotics, conversational agents
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Deep Reinforcement Learning
I Traditional RL: works on small, low-dimensional state spaces
I Many real-world problems: large, high-dimensional state spaces
I Deep RL = RL + Deep Learning
I Handles large state spaces
I Scales to complex tasks
I Key driver of recent breakthroughs in AI
Introduction Supervised ML Unsupervised ML Reinforcement Learning
Summary
I Deep RL = deep learning + reinforcement learning
I Solves sequential decision problems in high dimensions
I Rooted in psychology, math, engineering, biology
I Applications: robotics, games, healthcare, nance, any interactive setting