0% found this document useful (0 votes)
33 views24 pages

NNFL Unit II For ECE & EEE

NNFL UNIT FOR

Uploaded by

Praneeth M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views24 pages

NNFL Unit II For ECE & EEE

NNFL UNIT FOR

Uploaded by

Praneeth M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Unit II

Learning Process
LAYERS IN ANN:

ANN is made of three layers namely input layer, output layer, and hidden layer/s. There must
be a connection from the nodes in the input layer with the nodes in the hidden layer and from
each hidden layer node with the nodes of the output layer. The input layer takes the data from
the network.

In figure There are three layers; an input layer, hidden layers, and an output layer. Inputs are
inserted into the input layer, and each node provides an output value via an activation
function. The outputs of the input layer are used as inputs to the next hidden layer.

BRIEFLY EXPLAIN THE BASIC BUILDING BLOCKS OF ARTIFICIAL NEURAL


NETWORKS.
Processing of ANN depends upon the following three building blocks:
1. Network Topology
2. Adjustments of Weights or Learning
3. Activation Functions
1. Network Topology: A network topology is the arrangement of a network along with its
nodes and connecting lines. According to the topology, ANN can be classified as the
following kinds:
A. Feedforward Network: It is a non-recurrent network having processing units/nodes in
layers and all the nodes in a layer are connected with the nodes of the previous layers.
The connection has different weights upon them. There is no feedback loop means the
signal can only flow in one direction, from input to output. It may be divided into the
following two types:

 Single layer feedforward network: The concept is of feedforward ANN


having only one weighted layer. In other words, we can say the input layer is
fully connected to the output layer.

 Multilayer feedforward network: The concept is of feedforward ANN


having more than one weighted layer. As this network has one or more layers
between the input and the output layer, it is called hidden layers.

B. Feedback Network: As the name suggests, a feedback network has feedback paths,
which means the signal can flow in both directions using loops. This makes it a non-
linear dynamic system, which changes continuously until it reaches a state of
equilibrium. It may be divided into the following types:
 Recurrent networks: They are feedback networks with closed loops. Following are
the two types of recurrent networks.

 Fully recurrent network: It is the simplest neural network architecture because all
nodes are connected to all other nodes and each node works as both input and output.

 Jordan network − It is a closed loop network in which the output will go to the input
again as feedback as shown in the following diagram.

2. Adjustments of Weights or Learning: Learning, in artificial neural network, is the


method of modifying the weights of connections between the neurons of a specified network.
Learning in ANN can be classified into three categories namely supervised learning,
unsupervised learning, and reinforcement learning.

Supervised Learning: As the name suggests, this type of learning is done under the
supervision of a teacher. This learning process is dependent. During the training of ANN
under supervised learning, the input vector is presented to the network, which will give an
output vector. This output vector is compared with the desired output vector. An error signal
is generated, if there is a difference between the actual output and the desired output vector.
On the basis of this error signal, the weights are adjusted until the actual output is matched
with the desired output.
Unsupervised Learning: As the name suggests, this type of learning is done without the
supervision of a teacher. This learning process is independent. During the training of ANN
under unsupervised learning, the input vectors of similar type are combined to form clusters.
When a new input pattern is applied, then the neural network gives an output response
indicating the class to which the input pattern belongs. There is no feedback from the
environment as to what should be the desired output and if it is correct or incorrect. Hence, in
this type of learning, the network itself must discover the patterns and features from the input
data, and the relation for the input data over the output.

Reinforcement Learning: As the name suggests, this type of learning is used to reinforce or
strengthen the network over some critic information. This learning process is similar to
supervised learning, however we might have very less information. During the training of
network under reinforcement learning, the network receives some feedback from the
environment. This makes it somewhat similar to supervised learning. However, the feedback
obtained here is evaluative not instructive, which means there is no teacher as in supervised
learning. After receiving the feedback, the network performs adjustments of the weights to
get better critic information in future.
3. Activation Functions: An activation function is a mathematical equation that determines
the output of each element (perceptron or neuron) in the neural network. It takes in the
input from each neuron and transforms it into an output, usually between one and zero or
between -1 and +1. It may be defined as the extra force or effort applied over the input to
obtain an exact output. In ANN, we can also apply activation functions over the input to get
the exact output. Followings are some activation functions of interest:

i) Linear Activation Function: It is also called the identity function as it performs no input
editing. It can be defined as: F(x)=x
ii) Sigmoid Activation Function: It is of two type as follows −

 Binary sigmoidal function: This activation function performs input editing between
0 and 1. It is positive in nature. It is always bounded, which means its output cannot
be less than 0 and more than 1. It is also strictly increasing in nature, which means
more the input higher would be the output. It can be defined as

F(x)=sigm(x)=11+exp(−x)F(x)=sigm(x)=11+exp(−x)

 Bipolar sigmoidal function: This activation function performs input editing between
-1 and 1. It can be positive or negative in nature. It is always bounded, which means
its output cannot be less than -1 and more than 1. It is also strictly increasing in
nature like sigmoid function. It can be defined as

F(x)=sigm(x)=21+exp(−x)−1=1−exp(x)1+exp(x)

WHAT IS A NEURAL NETWORK ACTIVATION FUNCTION?


In a neural network, inputs, which are typically real values, are fed into the neurons in the
network. Each neuron has a weight, and the inputs are multiplied by the weight and fed into
the activation function. Each neuron’s output is the input of the neurons in the next layer of
the network, and so the inputs cascade through multiple activation functions until
eventually, the output layer generates a prediction. Neural networks rely on nonlinear
activation functions—the derivative of the activation function helps the network learn
through the backpropagation process.
SOME COMMON ACTIVATION FUNCTIONS INCLUDE THE FOLLOWING:

1. The sigmoid function has a smooth gradient and outputs values between zero and
one. For very high or low values of the input parameters, the network can be very
slow to reach a prediction, called the vanishing gradient problem.
2. The TanH function is zero-cantered making it easier to model inputs that are
strongly negative strongly positive or neutral.
3. The ReLu function is highly computationally efficient but is not able to process
inputs that approach zero or negative.
4. The Leaky ReLu function has a small positive slope in its negative area, enabling it
to process zero or negative values.
5. The Parametric ReLu function allows the negative slope to be learned, performing
backpropagation to learn the most effective slope for zero and negative input values.
6. Softmax is a special activation function use for output neurons. It normalizes outputs
for each class between 0 and 1, and returns the probability that the input belongs to a
specific class.
7. Swish is a new activation function discovered by Google researchers. It performs
better than ReLu with a similar level of computational efficiency

.
Learning Neural Networks and Learning Rules | Artificial Intelligence
In this article we will discuss about:
1. Introduction to Learning Neural Networks
2. Learning Rules of Neurons in Neural Networks.
Introduction to Learning Neural Networks:
The property which is of primary significance for a neural network is the ability of the
network to learn from its environment, and to improve its performance through learning.
The improvement in performance takes place over time in accordance with some
prescribed measure.
A neural network learns about its environment through an inter-active process of
adjustments applied to its synaptic weights and bias levels. Ideally, the network becomes
more knowledgeable about its environment after each iteration of the learning process.
There are too many activities associated with the notion of learning. Moreover, the
process of learning is a matter of view-point, which makes it all the more difficult to
agree on a precise definition of the term. For example, learning as viewed by a
psychologist is quite different from learning in a classroom sense. Recognising that our
particular interest is in neural networks, we use a definition of learning which is adapted
from Mendel and McClaren (1970).
We define learning in the context of neural networks as:
Learning is a process by which the free parameters of a neural network are adapted
through a process of stimulation by the environment in which the network is embedded.
The type of learning is determined by the manner in which the parameter changes take
place.
This definition of the learning process implies the following sequence of events:
1. The neural network is stimulated by an environment.
2. The neural network undergoes changes in its free parameters as a result of this
stimulation.
3. The neural network responds in a new way to the environment because of the changes
which have occurred in its internal structure.
A prescribed set of well-defined rules for the solution of a learning problem is called a
learning algorithm. There is no unique learning algorithm for the design of neural
networks. Rather, we have a kit of tools represented by a diverse variety of learning
algorithms, each of which offers advantages of its own. Basically, learning algorithms
differ from each other in the way in which the adjustment to a synaptic weight of a
neuron is formulated.
Another factor to be considered is the manner in which a neural network (learning
machine), made up of a set of interconnected neurons, reacts to its environment. In this
latter context we speak of a learning paradigm which refers to a model of the
environment in which the neural network operates.
The five learning rules:
1. Error-correction learning,
2. Memory-based learning,
3. Hebbian learning,
4. Competitive learning and
5. Boltzmann learning are basic to design of neural networks.
Some of these algorithms require the use of a teacher and some do not called supervised
and non-supervised learning respectively.
In the study of supervised learning, a key provision is a ‘teacher’ capable of supplying
exact corrections to the network outputs when an error occurs. Such a method is not
possible in biological organism which have neither the exact reciprocal nervous
connections needed for the back propagation of error corrections nor the nervous means
for the in position of behaviour from outside.
Nevertheless, supervised learning has established itself as a powerful paradiagram for
the design of artificial neural networks. In contrast self-organised (unsupervised)
learning is motivated by neurobiological considerations.
Learning Rules of Neurons in Neural Networks :
Five basic learning rules of Neuron are:
1. Error correctional earning,
2. Memory based- learning,
3. Hebbian learning,
4. Competitive learning and
5. Boltzmann learning.
Error correction learning is rooted in optimum filtering, Memory-based learning and
competitive learning are both inspired by neurobiological considerations. Boltzmann
learning is different and is based on ideas borrowed from statistical mechanics. Also two
learning paradigms, learning with a teacher and learning without a teacher, including the
credit-assignment problem, so basic to learning process have been discussed.
1. Error-Correction Learning:
To illustrate our first learning rule of learning process consider the simple case of a
neuron k constituting the only computational node in the output layer of a feed forward
neural network, as depicted in Fig. 11.21. Neuron k is driven by a signal vector x(n)
produced by one or more layers of hidden neurons, which are themselves driven by an
input vector (stimulus) applied to the source nodes (i.e., input layer) of the neural
network.
The argument n denotes discrete time, or more precisely, the time step of an iterative
process involved in adjusting the synaptic weights of neuron k. The output signal of
neuron k is denoted y k(n). This output signal, representing the only output of the neural
network, is compared to a desired response or target output, denoted by y k(n).
Consequently, an error signal, denoted by e k(n), is produced. By definition, we thus have

The error signal e k(n) actuates a control mechanism, the purpose of which is to apply a
sequence of corrective adjustments to the synaptic weights of neuron k. The corrective
adjustments are designed to make the output signal y k(n) come closer to the desired
response dm(n) in a step-by-step manner.
This objective is achieved by minimizing a cost function or index of performance
ɛ(n) defined in terms of the error signal e k(n) as:

That is, ԑ(n) is the instantaneous value of the error energy. The step-by-step adjustments
to the synaptic weights of neuron k are continued until the system reaches a steady state
(i.e., the synaptic weights are essentially stabilized. At that point the learning process is
terminated.
The learning process described herein is obviously referred to as error correction
learning. In particular, minimisation of the cost function ԑ(n) leads to a learning rule
commonly referred to as the delta rule or Widrow-Hoff rule, named in honor of its
originators. Let ω kj (n) denote the value of synaptic weight ω kj. of neuron k excited by
element xj (n) of the signal vector x(n) at time step n. According to the delta rule, the
adjustment Δωkj(n) applied to the synaptic weight ω kj at time step n is defined by
Δ ωkj (n)= ƞek (n) xj (n)
where, ƞ is a positive constant which determines the rate of learning as we proceed from
one step in the learning process to another. It is therefore natural that we refer to n as the
learning-rate parameter.
In other words, the data rule maybe stated as:
The adjustment made to a synaptic weight of a neuron is proportional to the product of
the error signal and the input signal of the synapse in question.
The delta rule, as stated herein, presumes that the error signal is directly measurable. For
this measurement to be feasible we clearly need a supply of desired response from some
external source, which is directly accessible to neuron k.
In other words, neuron k is visible to the outside world, and depicted in Fig. 11.21(a).
From this figure we also observe that error-correction learning is in fact local in nature.
This amounts to saying that the synaptic adjustments made by the delta rule are localised
around neuron k.

Having computed the synaptic adjustment Δω kj(n), the updated value of synaptic weight
Δωkj, is given by equation 11.26.
Effectively, ω kj(n) and ω kj(n + 1) may be viewed as the old and new values of synaptic
weight ωkj, respectively.
In computational terms we may also write:

where, z-1 is the unit-delay operator. That is, z -1 represents a storage element.
Fig. 11.21(b) shows a signal-flow graph representation of the error-correction learning
process, with regard to neuron k. The input signal x j and the induced local field v k of the
neuron k are referred to as presynaptic and postsynaptic signals of the j th synapse of
neuron k, respectively. Also, the Fig. shows that the error-correction learning is an
example of a closed-loop feedback system.
But from the control theory we know that the stability of such a system is determined by
those parameters which constitute the feedback loops of the system. In this case there is
a single feedback loop and the one of the parameters of interest is ƞ, the learning rate. So
to ensure the stability of convergence of iterative learning ƞ should be selected
judiciously.
2. Memory-Based Learning:
In memory-based learning, all (or most) of the past experiences are explicitly stored in a
large memory of correctly classified input-output examples: [(x i, di)]Ni =1 , where
xi denotes an input vector and d i denotes the corresponding desired response. Without
loss of generality, we have restricted the desired response to be a scalar.
For example, in a binary pattern classification problem there are two classes of
hypotheses, denoted by ԑ 1and ԑ2, to be considered. In this example, the desired response
di takes the value 0 (or -1) for class ԑ 1 and the value 1 for class ԑ 2. When classification of
a test vector test (not seen before) is required, the algorithm responds by retrieving and
analysing the training data in a “local neighbourhood” of x test.
All memory-based learning algorithms involve two essential ingredients:
a. Criterion used for defining the local neighbourhood of the test vector x test.
b. Learning rule applied to the training examples in the local neighbourhood of x test.
The algorithms differ from each other in the way in which these two ingredients are
defined.
In a simple yet effective type of memory-based learning known as the nearest neighbour
rule, the local neighbourhood is defined as the training example which lies in the
immediate neighbourhood of the test vector x test. In particular, the vector.

where, d(xi, xtest ) is the Euclidean distance between the vectors x i and xtest. The class
associated with the minimum distance, that is, vector x’ N is reported as the classification
of xtest . This rule is independent of the underlying distribution responsible for generating
the training examples.
Cover and Hart (1967) have formally studied the nearest neighbour rule as a tool for
pattern classification.
The analysis is based on two assumptions:
a. The classified examples (x i, di) are independently and identically distributed (iid),
according to the joint probability distribution of the example (x, d).
b. The sample size N is infinitely large.
Under these two assumptions, it is shown that the probability of classification error
incurred by the nearest neighbour rule is bounded about by twice the Bayes probability
of error, that is, the minimum probability of error over all decision rule. In this sense, it
may be said that half the classification information in a training set of infinite size is
contained in the nearest neighbour, which is a surprising result.
A variant of the nearest neighbour classifier is the k-nearest neighbour classifier,
which proceeds as:
a. Identify the k classified patterns which lie nearest to the test vector x test for some
integer k.
b. Assign x test to class (hypothesis) which is most frequently represented in the k nearest
neighbours to x test (i.e., use a majority vote to make the classification).
Thus, the k-nearest neighbour classifier acts like an averaging device.
3. Hebbian Learning (Generalised Learning) Supervised Learning:
Hebb’s postulate of learning is the oldest and the most famous of all learning rules; it is
named in honor of the neuropsychologist Hebb (1949).
When an axon of cell A is near enough to excite a cell B and repeatedly or persistently
takes part in firing it, some growth process or metabolic changes take place in one or
both cells such that A’s efficiency as one of the cells firing B, is increased.
Hebb proposed this change as a basis of associative learning (at the cellular level), which
would result in an enduring modification in the activity pattern of a spatially distributed
“assembly of nerve cells”.
This statement is made in a neurobiological context. We may expand and rephrase
it as a two-part rule:
a. If two neurons on either side of a synapse are activated simultaneously (i.e.,
synchronously), then the strength of that synapse is selectively increased.
b. If two neurons on either side of a synapse are activated asynchronously, then that
synapse is selectively weakened or eliminated.
Such a synapse is called Hebbian synapse. More precisely, we define a Hebbian synapse
as a synapse which uses a time-dependent, highly local, and strongly interactive
mechanism to increase synaptic efficiency as a function of the correlation between the
presynaptic and postsynaptic activities.
From this definition we may deduce the following four key properties which
characterise a Hebbian synapse:
i. Time-Dependent Mechanism:
This mechanism refers to the facts that the modifications in a Hebbian synapse depend
on the exact time of occurrence of the presynaptic and postsynaptic signals.
ii. Local Mechanism:
By its very nature, a synapse is the transmission site where information-bearing signals
(representing on going activity in the presynaptic and postsynaptic units) are in spatio
temporal contiguity. This locally available information is used by a Hebbian synapse to
produce a local synaptic modification which is input specific.
iii. Interactive Mechanism:
The occurrence of a change in a Hebbian synapse depends on signals on both sides of the
synapse. That is, a Hebbian form of learning depends on a “true interaction” between
presynaptic and postsynaptic signals in the sense that we cannot make a prediction from
either one of these two activities by itself.
iv. Conjunctional or Correlation Mechanism:
One interpretation of Hebb’s postulate of learning is that the condition for a change in
synaptic efficiency is the conjunction of presynaptic and postsynaptic signals. Thus,
according to this interpretation, the co-occurrence, of presynaptic and postsynaptic
signals (within a short interval of time) is sufficient to produce the synaptic
modification. It is for this reason that a Hebbian synapse is sometimes referred to as a
conjunctional synapse or correlational synapse.
4. Competitive Learning Unsupervised Learning:
In competitive learning, as the name implies, the output neurons of a neural network
compete among themselves to become active. Whereas in a neural network based on
Hebbian learning several output neurons may be active simultaneously, in competitive
learning only a single output neuron is active at any one time. It is this feature which
makes competitive learning highly suited to discover statistically salient features which
may be used to classify a set of input patterns.
There are three basic elements to a competitive learning rule:
i. A set of neurons which are all the same except for some randomly distributed synaptic
weights, and which therefore, respond differently to a given get of input patterns.
ii. A limit imposed on the ‘strength’ of each neuron.
iii. A mechanism which permits the neurons to compete for the right to respond to a
given subset of inputs, such that only one output neuron or only one neuron per group, is
active (i.e., ‘on’) at a time. The neuron which wins the competition is called a winner-
takes-all neuron.
Accordingly, the individual neurons of the network learn to specialise on ensembles of
similar patterns; in so doing they become feature detectors for different classes of input
patterns.
In the simplest form of competitive learning, the neural network has a single layer of
output neurons, each of which is fully connected to the input nodes. The network may
include feedback connections among the neurons, as indicated in Fig. 11.22. In the
network architecture described herein, the feedback connections perform lateral
inhibition, with each neuron tending to inhibit the neuron to which it is laterally
connected. In contrast, the feed forward synaptic connections in the network of Fig.
11.15 all are excitatory.

For a neuron k to be the winning neuron, its induced local field v k for a specified input
pattern x must be the largest among all the neurons in the network. The output signal
yk of winning neuron k is set equal to one; the output signals of all the neurons which
lose the competition are set equal to zero.
We thus write:

where, the induced local field y k represents the combined action of all the forward and
feedback inputs to neuron k.
Let ωkj denote the synaptic weight connecting input node j to neuron k. Suppose that
each neuron is allotted a fixed amount of synaptic weight (i.e., all synaptic weights are
positive), which is distributed among its input nodes that is, for all k

A neuron then learns by shifting synaptic weights from its inactive to active input nodes.
If a neuron does not respond to a particular input pattern, no learning takes place in that
neuron.
If a particular neuron wins the competition, each input node of that neuron relinquishes
some proportion of its synaptic weight, and the weight relinquished is then distributed
equally among the active input nodes. According to the standard competitive learning
rule, the change Δω kj applied to synaptic weight ω kj is defined by

where, ƞ is the learning rate parameter. This rule has the overall effect of moving the
synaptic weight vector ω k of winning neuron k towards the input pattern,v
5. Boltzmann Learning:
The Boltzmann learning rule, named in honor of Ludwig Boltzmann, is a stochastic
learning algorithm derived from idea rooted in statistical mechanics. A neural network
designed on the basis of the Boltzmann learning rule is called a Boltzmann machine.
In a Boltzmann machine the neurons constitute a recurrent structure, and they operate in
a binary manner since, for example, they are either in an ‘on’ state denoted by + 1 or in
an ‘off’ state denoted by -1. The machine is characterised by an energy function; E the
value of which is determined by the particular states occupied by the individual neurons
of the machine, as shown by

where xj is the state of neuron j and ω kj is the synaptic weight connecting neuron j to
neuron k. The fact that j ≠ k means simply that none of the neurons in the machine has
self-feedback. The machine operates by choosing a neuron at random for example,
neuron k is at some step of the learning process, then flipping the state of neuron k from
state xk to state – xk at some temperature T with probability

where, ΔEk is the energy change (i.e., the change in the energy function of the machine)
resulting from such a flip. We may note that T is not a physical temperature, but rather a
pseudo temperature under stochastic Model of a Neuron. If this rule is applied
repeatedly, the machine will reach thermal equilibrium.
The neurons of a Boltzmann machine partition into two functional groups:
a. Visible and
b. Hidden.
The visible neurons provide an interface between the network and the environment in
which it operates, whereas the hidden neurons always operate freely.
There are two modes of operation to be considered:
I. Clamped condition, in which the visible neurons are all clamped onto specific states
determined by the environment.
II. Free-running condition, in which all the neurons (visible and hidden) are allowed to
operate freely.
Let P+kj denote the correlation between the states of neurons j and k, with the network in
its clamped condition and P –kj the correlation between the states of neurons j and k with
the network in its free-running condition. Both correlations are averaged over all
possible states of the machine when it is in thermal equilibrium.
Then, according to the Boltzmann learning rule, the change Δω kj applied to the
synaptic weight ω kj from neuron j to neuron k is defined by:
Δωkj = ƞ (ρk+j – ρ–kj), j≠ k…. (11.35)
where, ƞ is learning-rate. Moreover, both ρk j+ and ρ–kj range in value from -1 to +1.

What is Back propagation Algorithm?


Backpropagation defines the whole process encompassing both the calculation of the
gradient and its need in the stochastic gradient descent. Technically, backpropagation is used
to calculate the gradient of the error of the network concerning the network’s modifiable
weights.
The characteristics of Backpropagation are the iterative, recursive and effective approach
through which it computes the updated weight to increase the network until it is not able to
implement the service for which it is being trained. Derivatives of the activation service to be
known at network design time are needed for Backpropagation.
Backpropagation is widely used in neural network training and calculates the loss function
for the weights of the network. Its service with a multi-layer neural network and discover the
internal description of input-output mapping.
It is a standard form of artificial network training, which supports computing gradient loss
function concerning all weights in the network. The backpropagation algorithm is used to
train a neural network more effectively through a chain rule method.
This gradient is used in a simple stochastic gradient descent algorithm to find weights that
minimize the error. The error propagates backward from the output nodes to the inner nodes.
The training algorithm of backpropagation involves four stages which are as follows −
 Initialization of weights − There are some small random values are assigned.
 Feed-forward − Each unit X receives an input signal and transmits this signal to each
of the hidden unit Z1, Z2,... Zn. Each hidden unit calculates the activation function and
sends its signal Z1 to each output unit. The output unit calculates the activation
function to form the response of the given input pattern.
 Backpropagation of errors − Each output unit compares activation Y k with the target
value Tk to determine the associated error for that unit. It is based on the error, the
factor δ�k(K = 1, ... . m) is computed and is used to distribute the error at the output
unit Yk back to all units in the previous layer. Similarly the factor δ�j(j = 1, ... . p) is
compared for each hidden unit Zj.
 It can update the weights and biases.
Types of Backpropagation
There are two types of Backpropagation which are as follows −
Static Back Propagation − In this type of backpropagation, the static output is created
because of the mapping of static input. It is used to resolve static classification problems like
optical character recognition.
Recurrent Backpropagation − The Recurrent Propagation is directed forward or directed
until a specific determined value or threshold value is acquired. After the certain value, the
error is evaluated and propagated backward.
Supervised and Unsupervised learning
Supervised learning: Supervised learning, as the name indicates, has the presence of a
supervisor as a teacher. Basically supervised learning is when we teach or train the machine
using data that is well-labelled. Which means some data is already tagged with the correct
answer. After that, the machine is provided with a new set of examples(data) so that the
supervised learning algorithm analyses the training data(set of training examples) and
produces a correct outcome from labeled data.
For instance, suppose you are given a basket filled with different kinds of fruits. Now the
first step is to train the machine with all the different fruits one by one like this:

 If the shape of the object is rounded and has a depression at the top, is red in color, then
it will be labeled as –Apple.
 If the shape of the object is a long curving cylinder having Green-Yellow color, then it
will be labeled as –Banana.
Now suppose after training the data, you have given a new separate fruit, say Banana from
the basket, and asked to identify it.

Since the machine has already learned the things from previous data and this time has to
use it wisely. It will first classify the fruit with its shape and color and would confirm the
fruit name as BANANA and put it in the Banana category. Thus the machine learns the
things from training data(basket containing fruits) and then applies the knowledge to test
data(new fruit).
Supervised learning is classified into two categories of algorithms:
 Classification: A classification problem is when the output variable is a category, such
as “Red” or “blue” , “disease” or “no disease”.
 Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
Supervised learning deals with or learns with “labeled” data. This implies that some data is
already tagged with the correct answer.
Types:
 Regression
 Logistic Regression
 Classification
 Naive Bayes Classifiers
 K-NN (k nearest neighbours)
 Decision Trees
 Support Vector Machine
Advantages:
 Supervised learning allows collecting data and produces data output from previous
experiences.
 Helps to optimize performance criteria with the help of experience.
 Supervised machine learning helps to solve various types of real-world computation
problems.
 It performs classification and regression tasks.
 It allows estimating or mapping the result to a new sample.
 We have complete control over choosing the number of classes we want in the training
data.
Disadvantages:
 Classifying big data can be challenging.
 Training for supervised learning needs a lot of computation time. So, it requires a lot of
time.
 Supervised learning cannot handle all complex tasks in Machine Learning.
 Computation time is vast for supervised learning.
 It requires a labelled data set.
 It requires a training process.

Steps
Unsupervised learning
Unsupervised learning is the training of a machine using information that is neither
classified nor labelled and allowing the algorithm to act on that information without
guidance. Here the task of the machine is to group unsorted information according to
similarities, patterns, and differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to
the machine. Therefore, the machine is restricted to find the hidden structure in unlabelled
data by itself.
For instance, suppose it is given an image having both dogs and cats which it has never
seen.

Thus, the machine has no idea about the features of dogs and cats so we can’t categorize it
as ‘dogs and cats ‘. But it can categorize them according to their similarities, patterns, and
differences, i.e., we can easily categorize the above picture into two parts. The first may
contain all pics having dogs in them and the second part may contain all pics having cats in
them. Here you didn’t learn anything before, which means no training data or examples.
It allows the model to work on its own to discover patterns and information that was
previously undetected. It mainly deals with unlabelled data.
Unsupervised learning is classified into two categories of algorithms:
 Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behaviour.
 Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
Types of Unsupervised Learning:
Clustering
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:
1. Hierarchical clustering
2. K-means clustering
3. Principal Component Analysis
4. Singular Value Decomposition
5. Independent Component Analysis
Supervised vs. Unsupervised Machine Learning:

Parameters Supervised machine learning Unsupervised machine learning

Algorithms are trained using Algorithms are used against data that is
Input Data
labelled data. not labelled(Unlabelled data)

Computational
Simpler method Computationally complex
Complexity

Accuracy Highly accurate Less accurate

No. of classes No. of classes is known No. of classes is not known

Data Analysis Uses offline analysis Uses real-time analysis of data

Linear and Logistics regression,


K-Means clustering, Hierarchical
Random forest,
Algorithms used clustering,
Support Vector Machine(SVM),
Apriori algorithm, etc.
Neural Network, etc.
Output Desired output is given. Desired output is not given.

Training data Use training data to infer model. No training data is used.

It is not possible to learn larger and It is possible to learn larger and more
Complex model more complex models than with complex models with unsupervised
supervised learning. learning.

Model We can test our model. We cannot test our model.

Supervised learning is also called Unsupervised learning is also called


Called as
classification. clustering.

Example: Optical character


Example Example: Find a face in an image.
recognition.

Advantages of unsupervised learning:


 It does not require training data to be labelled.
 Dimensionality reduction can be easily accomplished using unsupervised learning.
 Capable of finding previously unknown patterns in data.
 Flexibility: Unsupervised learning is flexible in that it can be applied to a wide variety
of problems, including clustering, anomaly detection, and association rule mining.
 Exploration: Unsupervised learning allows for the exploration of data and the
discovery of novel and potentially useful patterns that may not be apparent from the
outset.
 Low cost: Unsupervised learning is often less expensive than supervised learning
because it doesn’t require labelled data, which can be time-consuming and costly to
obtain.

Disadvantages of unsupervised learning:

 Difficult to measure accuracy or effectiveness due to lack of predefined answers


during training.
 The results often have lesser accuracy.
 The user needs to spend time interpreting and label the classes which follow that
classification.
 Lack of guidance: Unsupervised learning lacks the guidance and feedback provided
by labelled data, which can make it difficult to know whether the discovered
patterns are relevant or useful.
 Sensitivity to data quality: Unsupervised learning can be sensitive to data quality,
including missing values, outliers, and noisy data.
 Scalability: Unsupervised learning can be computationally expensive, particularly
for large datasets or complex algorithms, which can limit its scalability.

You might also like