Reinforcement Learning

Reinforcement Learning (RL) is a machine learning technique where an agent learns to make decisions through feedback from its actions in an environment, aiming to maximize positive rewards. It operates without labeled data, relying on exploration and exploitation strategies to improve performance over time. Key components of RL include the agent, environment, actions, states, rewards, and policies, with various approaches such as value-based, policy-based, and model-based methods to implement RL.

Uploaded by

a9501315395

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

Reinforcement Learning

Uploaded by

a9501315395

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

What is Reinforcement Learning?

o Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to

behave in an environment by performing the actions and seeing the results of actions. For each good
action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or
penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data,
unlike supervised learning.
o Since there is no labeled data, so the agent is bound to learn by its experience only.

o RL solves a specific type of problem where decision making is sequential, and the goal is long-term,
such as game-playing, robotics, etc.
o The agent interacts with the environment and explores it by itself. The primary goal of an agent in
reinforcement learning is to improve the performance by getting the maximum positive rewards.
o The agent learns with the process of hit and trial, and based on the experience, it learns to perform
the task in a better way. Hence, we can say that "Reinforcement learning is a type of machine
learning method where an intelligent agent (computer program) interacts with the environment and
learns to act within that." How a Robotic dog learns the movement of his arms is an example of
Reinforcement learning.
o It is a core part of Artificial intelligence, and all AI agent works on the concept of reinforcement
learning. Here we do not need to pre-program the agent, as it learns from its own experience without
any human intervention.
o Example: Suppose there is an AI agent present within a maze environment, and his goal is to find the
diamond. The agent interacts with the environment by performing some actions, and based on those
actions, the state of the agent gets changed, and it also receives a reward or penalty as feedback.
o The agent continues doing these three things (take action, change state/remain in the same state, and
get feedback), and by doing these actions, he learns and explores the environment.
o The agent learns that what actions lead to positive feedback or rewards and what actions lead to
negative feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it
gets a negative point.
Terms used in Reinforcement Learning
o Agent(): An entity that can perceive/explore the environment and act upon it.

o Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the
stochastic environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.

o State(): State is a situation returned by the environment after each action taken by the agent.

o Reward(): A feedback returned to the agent from the environment to evaluate the action of the
agent.
o Policy(): Policy is a strategy applied by the agent for the next action based on the current state.

o Value(): It is expected long-term retuned with the discount factor and opposite to the short-term
reward.
o Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current action
(a).
Key Features of Reinforcement Learning
o In RL, the agent is not instructed about the environment and what actions need to be taken.

o It is based on the hit and trial process.

o The agent takes the next action and changes states according to the feedback of the previous action.

o The agent may get a delayed reward.

o The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive
rewards.
Approaches to implement Reinforcement Learning
There are mainly three ways to implement reinforcement-learning in ML, which are:
1. Value-based:
The value-based approach is about to find the optimal value function, which is the maximum value at
a state under any policy. Therefore, the agent expects the long-term return at any state(s) under
policy π.
2. Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards without using
the value function. In this approach, the agent tries to apply such a policy that the action performed
in each step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:
o Deterministic: The same action is produced by the policy (π) at any state.

o Stochastic: In this policy, probability determines the produced action.

3. Model-based: In the model-based approach, a virtual model is created for the environment, and the
agent explores that environment to learn it. There is no particular solution or algorithm for this
approach because the model representation is different for each environment.
Elements of Reinforcement Learning
There are four main elements of Reinforcement Learning, which are given below:
1. Policy
2. Reward Signal
3. Value Function
4. Model of the environment
1) Policy: A policy can be defined as a way how an agent behaves at a given time. It maps the
perceived states of the environment to the actions taken on those states. A policy is the core element
of the RL as it alone can define the behavior of the agent. In some cases, it may be a simple function
or a lookup table, whereas, for other cases, it may involve general computation as a search process. It
could be deterministic or a stochastic policy:
For deterministic policy: a = π(s)
For stochastic policy: π(a | s) = P[At =a | St = s]
2) Reward Signal: The goal of reinforcement learning is defined by the reward signal. At each state,
the environment sends an immediate signal to the learning agent, and this signal is known as
a reward signal. These rewards are given according to the good and bad actions taken by the agent.
The agent's main objective is to maximize the total number of rewards for good actions. The reward
signal can change the policy, such as if an action selected by the agent leads to low reward, then the
policy may change to select other actions in the future.
3) Value Function: The value function gives information about how good the situation and action
are and how much reward an agent can expect. A reward indicates the immediate signal for each
good and bad action, whereas a value function specifies the good state and action for the future. The
value function depends on the reward as, without reward, there could be no value. The goal of
estimating values is to achieve more rewards.
4) Model: The last element of reinforcement learning is the model, which mimics the behavior of the
environment. With the help of the model, one can make inferences about how the environment will
behave. Such as, if a state and an action are given, then a model can predict the next state and
reward.
The model is used for planning, which means it provides a way to take a course of action by
considering all future situations before actually experiencing those situations. The approaches for
solving the RL problems with the help of the model are termed as the model-based approach.
Comparatively, an approach without using a model is called a model-free approach.

What are Exploration and Exploitation in Reinforcement Learning

Before going to a brief description of exploration and exploitation in machine learning, let's first understand
these terms in simple words. In reinforcement learning, whenever agents get a situation in which they have
to make a difficult choice between whether to continue the same work or explore something new at a
specific time, then, this situation results in Exploration-Exploitation Dilemma because the knowledge of an
agent about the state, actions, rewards and resulting states is always partial.
Now we will discuss exploitation and exploration in technical terms.
Exploitation in Reinforcement Learning
Exploitation is defined as a greedy approach in which agents try to get more rewards by using estimated
value but not the actual value. So, in this technique, agents make the best decision based on current
information.
Exploration in Reinforcement Learning
Unlike exploitation, in exploration techniques, agents primarily focus on improving their knowledge about
each action instead of getting more rewards so that they can get long-term benefits. So, in this
technique, agents work on gathering more information to make the best overall decision.
Examples of Exploitation and Exploration in Machine Learning
Let's understand exploitation and exploration with some interesting real-world examples.
Coal mining:
Let's suppose people A and B are digging in a coal mine in the hope of getting a diamond inside it. Person B
got success in finding the diamond before person A and walks off happily. After seeing him, person A gets a
bit greedy and thinks he too might get success in finding diamond at the same place where person B was
digging coal. This action performed by person A is called greedy action, and this policy is known as a
greedy policy. But person A was unknown because a bigger diamond was buried in that place where he was
initially digging the coal, and this greedy policy would fail in this situation.
In this example, person A only got knowledge of the place where person B was digging but had no
knowledge of what lies beyond that depth. But in the actual scenario, the diamond can also be buried in the
same place where he was digging initially or some completely another place. Hence, with this partial
knowledge about getting more rewards, our reinforcement learning agent will be in a dilemma on whether to
exploit the partial knowledge to receive some rewards or it should explore unknown actions which could
result in many rewards.
However, both these techniques are not feasible simultaneously, but this issue can be resolved by using
Epsilon Greedy Policy (Explained below).
There are a few other examples of Exploitation and Exploration in Machine Learning as follows:
Example 1: Let's say we have a scenario of online restaurant selection for food orders, where you have two
options to select the restaurant. In the first option, you can choose your favorite restaurant from where you
ordered food in the past; this is called exploitation because here, you only know information about a
specific restaurant. And for other options, you can try a new restaurant to explore new varieties and tastes of
food, and it is called exploration. However, food quality might be better in the first option, but it is also
possible that it is more delicious in another restaurant.
Example 2: Suppose there is a game-playing platform where you can play chess with robots. To win this
game, you have two choices either play the move that you believe is best, and for the other choice, you can
play an experimental move. However, you are playing the best possible move, but who knows new move
might be more strategic to win this game. Here, the first choice is called exploitation, where you know about
your game strategy, and the second choice is called exploration, where you are exploring your knowledge
and playing a new move to win the game.
Epsilon Greedy Policy
Epsilon greedy policy is defined as a technique to maintain a balance between exploitation and
exploration. However, to choose between exploration and exploitation, a very simple method is to select
randomly. This can be done by choosing exploitation most of the time with a little exploration.

In the greedy epsilon strategy, an exploration rate or epsilon (denoted as ε) is initially set to 1. This
exploration rate defines the probability of exploring the environment by the agent rather than exploiting it. It
also ensures that the agent will start by exploring the environment with ε=1.
As the agent start and learns more about the environment, the epsilon decreases by some rate in the defined
rate, so the likelihood of exploration becomes less and less probable as the agent learns more and more
about the environment. In such a case, the agent becomes greedy for exploiting the environment.
To find if the agent will select exploration or exploitation at each step, we generate a random number
between 0 and 1 and compare it to the epsilon. If this random number is greater than ε, then the next action
would be decided by the exploitation method. Else it must be exploration. In the case of exploitation, the
agent will take action with the highest Q-value for the current state.
if random_number > epsilon:
//choose next action via exploitation
else:
// choose next action via exploration

Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Unit 5 Part1 RL Notes
No ratings yet
Unit 5 Part1 RL Notes
22 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Unit 4
No ratings yet
Unit 4
56 pages
Unit 3
No ratings yet
Unit 3
29 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
31 pages
Unit 5
No ratings yet
Unit 5
10 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Module 1
No ratings yet
Module 1
72 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
8 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Assignment 15 Modern AI
No ratings yet
Assignment 15 Modern AI
3 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Basics of Reinforcement Learning
No ratings yet
Basics of Reinforcement Learning
15 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Reinforcement Learning Guide
100% (1)
Reinforcement Learning Guide
24 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
19 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
4 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
73 pages
Unit No. 05 - Reinforced and Deep Learning
No ratings yet
Unit No. 05 - Reinforced and Deep Learning
44 pages
ML 10
No ratings yet
ML 10
9 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Unit 3
No ratings yet
Unit 3
12 pages
A (Long) Peek Into Reinforcement Learning - Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning - Lil'Log
23 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Overview of Reinforcement Learning
No ratings yet
Overview of Reinforcement Learning
17 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Unit4 (AI) 2024 Docx-1
No ratings yet
Unit4 (AI) 2024 Docx-1
22 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
2 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
19 pages
RL Unit-1
No ratings yet
RL Unit-1
52 pages
Assignmenrt 3
No ratings yet
Assignmenrt 3
4 pages
Reinforcement Learning 1
No ratings yet
Reinforcement Learning 1
14 pages
RLDL Unit 1
No ratings yet
RLDL Unit 1
15 pages
114021
No ratings yet
114021
55 pages
M3 R5 Jan2023 Set4
No ratings yet
M3 R5 Jan2023 Set4
24 pages
Application Based Questions Answers
No ratings yet
Application Based Questions Answers
2 pages
Lecture 3
No ratings yet
Lecture 3
25 pages
User Manual of Egov 11.0 Implementation of It Solution For RVNL D3799 Document Version / Details
No ratings yet
User Manual of Egov 11.0 Implementation of It Solution For RVNL D3799 Document Version / Details
64 pages
Lesson 1 Educ 8
No ratings yet
Lesson 1 Educ 8
34 pages
Calculate Formula
No ratings yet
Calculate Formula
18 pages
Project
No ratings yet
Project
3 pages
Central Business District
No ratings yet
Central Business District
8 pages
FNB Private Wealth Newsletter - Life and Times Issue 18
No ratings yet
FNB Private Wealth Newsletter - Life and Times Issue 18
12 pages
Geodynamic Evolution of Northeastern Tunisia
No ratings yet
Geodynamic Evolution of Northeastern Tunisia
33 pages
Soal PAS Bahasa Inggris Kelas 8 Kurmer
100% (6)
Soal PAS Bahasa Inggris Kelas 8 Kurmer
10 pages
ACP Revit MEP Electrical Exam Objectives PDF
No ratings yet
ACP Revit MEP Electrical Exam Objectives PDF
1 page
Simulacion Burger King
No ratings yet
Simulacion Burger King
12 pages
Top 10 Tallest Buildings Worldwide
No ratings yet
Top 10 Tallest Buildings Worldwide
25 pages
Documents Required For Grant of CM License
No ratings yet
Documents Required For Grant of CM License
1 page
Seven Barriers To Great Communication
100% (6)
Seven Barriers To Great Communication
4 pages
Family Dynamics & Child Development
No ratings yet
Family Dynamics & Child Development
56 pages
Oxford Legal Research Method Course
No ratings yet
Oxford Legal Research Method Course
23 pages
Vishal Steel Industries Ledger 2015-2016
No ratings yet
Vishal Steel Industries Ledger 2015-2016
1 page
Vocational Assessment Manual
No ratings yet
Vocational Assessment Manual
20 pages
IMO - 2017 - Answer
No ratings yet
IMO - 2017 - Answer
9 pages
Cisco 200-301 Exam Q&As Overview
No ratings yet
Cisco 200-301 Exam Q&As Overview
57 pages
Ricci
No ratings yet
Ricci
10 pages
Online Examination Hall Seating Arrangement
No ratings yet
Online Examination Hall Seating Arrangement
5 pages
Ares (/ˈɛəriːz/; Ancient Greek: Ἄρης, Áres (árɛːs) ) is the Greek
No ratings yet
Ares (/ˈɛəriːz/; Ancient Greek: Ἄρης, Áres (árɛːs) ) is the Greek
12 pages
Certificate
No ratings yet
Certificate
1 page
Method of Determination of Centre of Gravity of Automotive Vehicles
No ratings yet
Method of Determination of Centre of Gravity of Automotive Vehicles
10 pages
Dr. Stephen D, M.D.: Canalplasty
No ratings yet
Dr. Stephen D, M.D.: Canalplasty
1 page
Spectrophotometric Techniques Overview
No ratings yet
Spectrophotometric Techniques Overview
16 pages
Optical Network Sol
50% (2)
Optical Network Sol
85 pages

Reinforcement Learning

Uploaded by

Reinforcement Learning

Uploaded by

What is Reinforcement Learning?

o Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to

o It is based on the hit and trial process.

o The agent may get a delayed reward.

o Stochastic: In this policy, probability determines the produced action.

What are Exploration and Exploitation in Reinforcement Learning

You might also like