0% found this document useful (0 votes)
14 views5 pages

Reinforcement Learning

Reinforcement Learning (RL) is a machine learning technique where an agent learns to make decisions through feedback from its actions in an environment, aiming to maximize positive rewards. It operates without labeled data, relying on exploration and exploitation strategies to improve performance over time. Key components of RL include the agent, environment, actions, states, rewards, and policies, with various approaches such as value-based, policy-based, and model-based methods to implement RL.

Uploaded by

a9501315395
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

Reinforcement Learning

Reinforcement Learning (RL) is a machine learning technique where an agent learns to make decisions through feedback from its actions in an environment, aiming to maximize positive rewards. It operates without labeled data, relying on exploration and exploitation strategies to improve performance over time. Key components of RL include the agent, environment, actions, states, rewards, and policies, with various approaches such as value-based, policy-based, and model-based methods to implement RL.

Uploaded by

a9501315395
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

What is Reinforcement Learning?

o Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to


behave in an environment by performing the actions and seeing the results of actions. For each good
action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or
penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data,
unlike supervised learning.
o Since there is no labeled data, so the agent is bound to learn by its experience only.

o RL solves a specific type of problem where decision making is sequential, and the goal is long-term,
such as game-playing, robotics, etc.
o The agent interacts with the environment and explores it by itself. The primary goal of an agent in
reinforcement learning is to improve the performance by getting the maximum positive rewards.
o The agent learns with the process of hit and trial, and based on the experience, it learns to perform
the task in a better way. Hence, we can say that "Reinforcement learning is a type of machine
learning method where an intelligent agent (computer program) interacts with the environment and
learns to act within that." How a Robotic dog learns the movement of his arms is an example of
Reinforcement learning.
o It is a core part of Artificial intelligence, and all AI agent works on the concept of reinforcement
learning. Here we do not need to pre-program the agent, as it learns from its own experience without
any human intervention.
o Example: Suppose there is an AI agent present within a maze environment, and his goal is to find the
diamond. The agent interacts with the environment by performing some actions, and based on those
actions, the state of the agent gets changed, and it also receives a reward or penalty as feedback.
o The agent continues doing these three things (take action, change state/remain in the same state, and
get feedback), and by doing these actions, he learns and explores the environment.
o The agent learns that what actions lead to positive feedback or rewards and what actions lead to
negative feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it
gets a negative point.
Terms used in Reinforcement Learning
o Agent(): An entity that can perceive/explore the environment and act upon it.

o Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the
stochastic environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.

o State(): State is a situation returned by the environment after each action taken by the agent.

o Reward(): A feedback returned to the agent from the environment to evaluate the action of the
agent.
o Policy(): Policy is a strategy applied by the agent for the next action based on the current state.

o Value(): It is expected long-term retuned with the discount factor and opposite to the short-term
reward.
o Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current action
(a).
Key Features of Reinforcement Learning
o In RL, the agent is not instructed about the environment and what actions need to be taken.

o It is based on the hit and trial process.

o The agent takes the next action and changes states according to the feedback of the previous action.

o The agent may get a delayed reward.

o The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive
rewards.
Approaches to implement Reinforcement Learning
There are mainly three ways to implement reinforcement-learning in ML, which are:
1. Value-based:
The value-based approach is about to find the optimal value function, which is the maximum value at
a state under any policy. Therefore, the agent expects the long-term return at any state(s) under
policy π.
2. Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards without using
the value function. In this approach, the agent tries to apply such a policy that the action performed
in each step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:
o Deterministic: The same action is produced by the policy (π) at any state.

o Stochastic: In this policy, probability determines the produced action.

3. Model-based: In the model-based approach, a virtual model is created for the environment, and the
agent explores that environment to learn it. There is no particular solution or algorithm for this
approach because the model representation is different for each environment.
Elements of Reinforcement Learning
There are four main elements of Reinforcement Learning, which are given below:
1. Policy
2. Reward Signal
3. Value Function
4. Model of the environment
1) Policy: A policy can be defined as a way how an agent behaves at a given time. It maps the
perceived states of the environment to the actions taken on those states. A policy is the core element
of the RL as it alone can define the behavior of the agent. In some cases, it may be a simple function
or a lookup table, whereas, for other cases, it may involve general computation as a search process. It
could be deterministic or a stochastic policy:
For deterministic policy: a = π(s)
For stochastic policy: π(a | s) = P[At =a | St = s]
2) Reward Signal: The goal of reinforcement learning is defined by the reward signal. At each state,
the environment sends an immediate signal to the learning agent, and this signal is known as
a reward signal. These rewards are given according to the good and bad actions taken by the agent.
The agent's main objective is to maximize the total number of rewards for good actions. The reward
signal can change the policy, such as if an action selected by the agent leads to low reward, then the
policy may change to select other actions in the future.
3) Value Function: The value function gives information about how good the situation and action
are and how much reward an agent can expect. A reward indicates the immediate signal for each
good and bad action, whereas a value function specifies the good state and action for the future. The
value function depends on the reward as, without reward, there could be no value. The goal of
estimating values is to achieve more rewards.
4) Model: The last element of reinforcement learning is the model, which mimics the behavior of the
environment. With the help of the model, one can make inferences about how the environment will
behave. Such as, if a state and an action are given, then a model can predict the next state and
reward.
The model is used for planning, which means it provides a way to take a course of action by
considering all future situations before actually experiencing those situations. The approaches for
solving the RL problems with the help of the model are termed as the model-based approach.
Comparatively, an approach without using a model is called a model-free approach.

What are Exploration and Exploitation in Reinforcement Learning


Before going to a brief description of exploration and exploitation in machine learning, let's first understand
these terms in simple words. In reinforcement learning, whenever agents get a situation in which they have
to make a difficult choice between whether to continue the same work or explore something new at a
specific time, then, this situation results in Exploration-Exploitation Dilemma because the knowledge of an
agent about the state, actions, rewards and resulting states is always partial.
Now we will discuss exploitation and exploration in technical terms.
Exploitation in Reinforcement Learning
Exploitation is defined as a greedy approach in which agents try to get more rewards by using estimated
value but not the actual value. So, in this technique, agents make the best decision based on current
information.
Exploration in Reinforcement Learning
Unlike exploitation, in exploration techniques, agents primarily focus on improving their knowledge about
each action instead of getting more rewards so that they can get long-term benefits. So, in this
technique, agents work on gathering more information to make the best overall decision.
Examples of Exploitation and Exploration in Machine Learning
Let's understand exploitation and exploration with some interesting real-world examples.
Coal mining:
Let's suppose people A and B are digging in a coal mine in the hope of getting a diamond inside it. Person B
got success in finding the diamond before person A and walks off happily. After seeing him, person A gets a
bit greedy and thinks he too might get success in finding diamond at the same place where person B was
digging coal. This action performed by person A is called greedy action, and this policy is known as a
greedy policy. But person A was unknown because a bigger diamond was buried in that place where he was
initially digging the coal, and this greedy policy would fail in this situation.
In this example, person A only got knowledge of the place where person B was digging but had no
knowledge of what lies beyond that depth. But in the actual scenario, the diamond can also be buried in the
same place where he was digging initially or some completely another place. Hence, with this partial
knowledge about getting more rewards, our reinforcement learning agent will be in a dilemma on whether to
exploit the partial knowledge to receive some rewards or it should explore unknown actions which could
result in many rewards.
However, both these techniques are not feasible simultaneously, but this issue can be resolved by using
Epsilon Greedy Policy (Explained below).
There are a few other examples of Exploitation and Exploration in Machine Learning as follows:
Example 1: Let's say we have a scenario of online restaurant selection for food orders, where you have two
options to select the restaurant. In the first option, you can choose your favorite restaurant from where you
ordered food in the past; this is called exploitation because here, you only know information about a
specific restaurant. And for other options, you can try a new restaurant to explore new varieties and tastes of
food, and it is called exploration. However, food quality might be better in the first option, but it is also
possible that it is more delicious in another restaurant.
Example 2: Suppose there is a game-playing platform where you can play chess with robots. To win this
game, you have two choices either play the move that you believe is best, and for the other choice, you can
play an experimental move. However, you are playing the best possible move, but who knows new move
might be more strategic to win this game. Here, the first choice is called exploitation, where you know about
your game strategy, and the second choice is called exploration, where you are exploring your knowledge
and playing a new move to win the game.
Epsilon Greedy Policy
Epsilon greedy policy is defined as a technique to maintain a balance between exploitation and
exploration. However, to choose between exploration and exploitation, a very simple method is to select
randomly. This can be done by choosing exploitation most of the time with a little exploration.

In the greedy epsilon strategy, an exploration rate or epsilon (denoted as ε) is initially set to 1. This
exploration rate defines the probability of exploring the environment by the agent rather than exploiting it. It
also ensures that the agent will start by exploring the environment with ε=1.
As the agent start and learns more about the environment, the epsilon decreases by some rate in the defined
rate, so the likelihood of exploration becomes less and less probable as the agent learns more and more
about the environment. In such a case, the agent becomes greedy for exploiting the environment.
To find if the agent will select exploration or exploitation at each step, we generate a random number
between 0 and 1 and compare it to the epsilon. If this random number is greater than ε, then the next action
would be decided by the exploitation method. Else it must be exploration. In the case of exploitation, the
agent will take action with the highest Q-value for the current state.
if random_number > epsilon:
//choose next action via exploitation
else:
// choose next action via exploration

You might also like