GOVT.
RABIA BASRI GRADUATE COLLEGE (W)
WALTON ROAD LAHORE
Presentation Topic: Q-learning Algorithm
Group no 6:
Samia Anwar(116)
Fatima Liaqat(105)
Mahnoor(122)
Nimra Mehboob(123)
Content:
Reinforcement Learning Technique: Q-learning
Some imp terms in Q-Learning
Factors and Algorithm of Q-learning
Steps with examples
Advantages and disadvantages Applications
What is reinforcement learning?
Reinforcement Learning (RL) is a branch of
machine learning
RL allows machines to learn by interacting with an
environment and receiving feedback based on their
actions. This feedback comes is in the form
of rewards or penalties.
Q-LEARNING:
Q-Learning means quality learning.
It is off-policy, model-free and value-based
reinforcement learning algorithm.
Agent has to actively learn through the experience of
interactions with the environment.
off-policy RLA(according to situation which action is
performed on which state).
model-free RLA(learn the consequences of their
actions through experience without transition and
reward function).
value-based RLA(train the value function to learn
which state is more valuable and take action).
Agent uses trail and error to determine which actions
result in rewards(good outcome) and penalties(bad
outcome or negative reward).
The decision making of q learning is improved day
by day due to updation in q table .
Some important terms in Q-learning:
Factors of Q-learning:
There are 2 factors of q learning i.e., Q-
function(Bellman equation) and the other one is Q-
table.
1. Q-function(Bellman Equation):
It is a recursive formula used to calculate value of
given state and determine the optimal action.
Q(s,a)=R(s,a)+ *max[Q(s’,a’)].
Whereas:
Q(s,a) is the Q value for given state and action pair.
R(s,a) is the immediate reward for taking action in
state s.
(Gamma) is the discount factor
representing importance of future rewards.
Max Q(s’,a’) is the maximum q value for the next
state s’ and all possible actions a’.
Q-table:
Q table is a data structure of sets of actions and states
and we use q learning algorithm to update q values in
q table.
Combinations of actions and states.
State no=no. of rows
Action no = no. of columns
Initially q table is initialized with value=0.
The agent will use a q table to take the best possible
action based on the expected reward for each state in
the environment
In simple words a q table is a data structure of step
of actions ans states and we use the q learning
algorithm to update the values in the table.
Q-Learning algorithm:
Steps to follow in q learning algorithm:
Step1:Create an initial Q-Table with all values
initialized to 0
Step 2:Choose an action and perform it.Update value
in table.
Step 3:Get the value of the reward and calculate the
Q-value using bellman equation(Q-function).
Step 4:Continue the same process until the table is
filled or an episode ends.
Example:
Here Rooms: States(s) and Doors: Actions(a).
Suppose that we have 5 rooms in a building.We will number the rooms from 0 to 4 and the
outside of building can be thought of as one big room(5).
We can represent each room as a node (states) and each door as a link(action).
We have to get into the room 5 that’s why Our goal state is room 5 .
Imp points: Goal room:5
The doors that leads immediately to room 5 have reward 100.
Others that have been not directly connected to room5 have 0 reward.
Where there is no link between node(states:room) then reward is -1 (invalid link).
Discount factor gamma:0.8
Application:
References:
https://www.datacamp.com/tutorial/introduction-q-learning-beginner-tutorial
https://www.geeksforgeeks.org/q-learning-in-python/
https://youtu.be/QRMNPCsnSHk
https://youtu.be/3Rx2x2traxw
https://youtu.be/ibBEEZNQZtk
https://youtu.be/5MC8Wdo-hS8