0% found this document useful (0 votes)
5 views39 pages

Unit-2 ML Notes

Artificial Neural Networks (ANN) are inspired by biological neural networks and are designed to process numeric and structured data. They consist of interconnected layers of neurons that perform forward and backward propagation to optimize model parameters for tasks such as regression and classification. ANN has various applications including real-time operations, fraud detection, and handwriting recognition, and can adaptively learn from data while being robust to noise.

Uploaded by

ptlikitha07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views39 pages

Unit-2 ML Notes

Artificial Neural Networks (ANN) are inspired by biological neural networks and are designed to process numeric and structured data. They consist of interconnected layers of neurons that perform forward and backward propagation to optimize model parameters for tasks such as regression and classification. ANN has various applications including real-time operations, fraud detection, and handwriting recognition, and can adaptively learn from data while being robust to noise.

Uploaded by

ptlikitha07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

MALLA REDDY INSTITUTE OF TECHNOLOGY

UNIT II
Artificial Neural Networks - Introduction
Artificial Neural Network (ANN) is a deep learning algorithm that emerged and evolved
from the idea of Biological Neural Networks of human brains. An attempt to simulate the
workings of the human brain culminated in the emergence of ANN. ANN works very similar
to the biological neural networks but doesn’t exactly resemble its workings.

ANN algorithm would accept only numeric and structured data as input. To accept
unstructured and non-numeric data formats such as Image, Text, and Speech, Convolutional
Neural Networks (CNN), and Recursive Neural Networks (RNN) are used respectively. In
this post, we concentrate only on Artificial Neural Networks.

Structure of Artificial neurons and their functions


 A neural network with a single layer is called a perceptron. A multi-layer perceptron
is called Artificial Neural Networks.
 A Neural network can possess any number of layers. Each layer can have one or more
neurons or units. Each of the neurons is interconnected with each and every other
neuron. Each layer could have different activation functions as well.
 ANN consists of two phases Forward propagation and Backpropagation. The
forward propagation involves multiplying weights, adding bias, and applying
activation function to the inputs and propagating it forward.
 The backpropagation step is the most important step which usually involves finding
optimal parameters for the model by propagating in the backward direction of the
Neural network layers. The backpropagation requires optimization function to find
the optimal weights for the model.
 ANN can be applied to both Regression and Classification tasks by changing the
activation functions of the output layers accordingly. (Sigmoid activation function for
binary classification, Softmax activation function for multi-class classification and
Linear activation function for Regression).
Neural network representation
The Neural Network Architecture
To implement the same problem space using a neural network, we need to create a neuron
based structure. Before jumping into the architecture, let’s take a look at some of the
components of a Neural Network.

1. The Input Layer — Represents the input variables plus the bias term. Hence if there
are n input variables, the size of the input layer is n + 1, where + 1 is the bias term
2. The Hidden Layer/ Layers — These signify neurons where all mathematical
calculations are done. Note a given neural network can have more than one neuron in
a hidden layer or multiple hidden layers as well
3. The Activation Function — Converts the output of a given layer before passing on
the information to consecutive layers. Activation functions are mathematical
equations that determine the output of a given neural network. The is a part of each
neuron in the hidden layers and determines output relevant for prediction
4. The Output Layer — The final “output prediction” of the network
5. Forward Propagation — Calculating the output of each iteration from the input
layer to the output layer
6. Backward Propagation — Calculates revised weights (w1, w2, w3, and b1) after
each forward propagation by analyzing the derivative of the cost function used to
optimize the model output
7. Learning Rate — Determined the percentage change attributed to each weight and
bias term after every backward propagation, i.e. controls the speed at which the model
learns information about the data

Fig2. Artificial Neuron Model


Biological Motivation
1. A neural network can be defined as a model of reasoning based on the human brain.
2. The brain consists of a densely interconnected set of nerve cells, or basic
information-processing units, called neurons.
3. The human brain incorporates nearly 10 billion neurons and 60 trillion connections,
synapses, between them.
4. By using multiple neurons simultaneously, the brain can perform its functions much
faster than the fastest computers in existence today.
5. Our brain can be considered as a highly complex, non-linear and parallel information-
processing system.
6. Information is stored and processed in a neural network simultaneously throughout
the whole network, rather than at specific locations.
7. In other words, in neural networks, both data and its processing are global rather than
local.

Learning is a fundamental and essential characteristic of


biological neural networks.
1. The majority of neurons encode their activations or outputs as a series of
brief electrical pulses.
2. The neuron’s cell body (soma) processes the incoming activations and
converts them into output activations.
3. The neuron’s nucleus contains the genetic material in the form of DNA.
4. Dendrites are fibres which emanate from the cell body and
5. provide the receptive ones that receive activation from other neurons.
6. Axons are fibres acting as transmission lines that send activation to other
neurons.
7. The junctions that allow signal transmission between the axons and dendrites
are called synapses.

Synapse

Synapse Dendrites
Axon
Axon

Soma Soma
Dendrites
Synapse

Fig3. Biological Neuron Model


Advantages and Applications of Neural Networks

1.Adaptive learning: An ANN is capable with the ability to learn how to do tasks based on
the data given for training or initial experience.

2.Self organization: An ANN can create its own organization or representation of the
information it receives during learning time.

3.Real-time operation: ANN computations may be carried out in parallel. Special hardware
devices are being designed and manufactured to take advantage of this capability of
ANNs.

4.Fault tolerance via redundant information coding.Partial destruction of a neural network


leads to the corresponding degradation of performance.
5. Air traffic control could be automated with the location, altitude, direction and speed of
each radar blip taken as input to the network. The output would be the air traffic
controller’s instruction in response to each blip.

6. Animal behaviour, predator/prey relationships and population cycles may be suitable for
analysis by neural networks.

7. Appraisal and valuation of property, buildings, automobiles, machinery, etc. should be an


easy task for a neural network.

8. Betting on horse races, stock markets, sporting events, etc. could be based on neural
network predictions.

9. Criminal sentencing could be predicted using a large sample of crime details as input and
the resulting sentences as output.
10. Complex physical and chemical processes that may involve the interaction of numerous
(possibly unknown) mathematical formulas could be modelled heuristically using a neural
network.

11. Data mining, cleaning and validation could be achieved by determining which records
suspiciously diverge from the pattern of their peers.

12. Direct mail advertisers could use neural network analysis of their databases to decide
which customers should be targeted, and avoid waiting money on unlikely targets.

13. Echo patterns from sonar, radar, seismic and magnetic instruments could be used to
predict their targets.

14. Econometric modelling based on neural networks should be more realistic than older
models based on classical statistics.
15. Employee hiring could be optimized if the neural networks were able to predict which
job applicant would show the best job performance.
16. Expert consultants could package their intuitive expertise in to a neural network to
automate their services.
17. Fraud detection regarding credit cards, insurance or faxes could be automated using a
neural network analysis of past incidents.

18. Handwriting and typewriting could be recognized by imposing a grid over the writing,
and then each square of the grid becomes an input to the neural network. This is called
"Optical Character Recognition."
19. Lake water levels could be predicted based upon precipitation patterns and river/dam
flows.

20. Machinery control could be automated by capturing me actions of experienced machine


operators into a neural network

Appropriate problems for neural network learning


1.Instances are represented by many attribute Value pairs.
The target function to be learned is defined over instances that can be described by a vector
of predefined features, such as the pixel values.
These input attributes may be highly correlated or independent of one another.
Input values can be any real values1.

2.The target function output may be discrete-valued, real-valued, or a vector of several


real- or discrete-valued attributes.
system the output is a vector of some attributes, each corresponding to a recommendation
regarding the steering direction.
The value of each output is some real number between 0 and 1, which in this case
corresponds to the confidence in predicting the corresponding steering direction.
We can also train a single network to output both the steering command and suggested
acceleration, simply by concatenating the vectors that encode these two output
predictions.

3.The training examples may contain errors. ANN learning methods are quite robust to
noise in the training data.

4.Long training times are acceptable.


Network training algorithms typically require longer training times than, say, decision tree
learning algorithms.
Training times can range from a few seconds to many hours, depending on factors such as
the number of weights in the network, the number of training examples considered, and the
settings of various learning algorithm parameters.

5.Fast evaluation of the learned target function may be required. Although


ANN learning times are relatively long, evaluating the learned network, in order to apply it
to a subsequent instance, is typically very fast.
A neural network several times per second to continually update its steering command as the
vehicle drives forward.
6.The ability of humans to understand the learned target function is not important.
The weights learned by neural networks are often difficult for humans to interpret.
Learned neural networks are less easily communicated to humans than learned rules.

Perceptron
Perceptron is a single layer neural network and a multi-layer perceptron is called Neural Networks.

Perceptron is a linear classifier (binary). Also, it is used in supervised learning. It helps to classify the
given input data

Fig4. Perceptron Model

There are two types of Perceptrons: Single layer and Multilayer.


1. Single layer Perceptrons can learn only linearly separable patterns.
2. Multilayer Perceptrons or feedforward neural networks with two or more layers have
the greater processing power.
3. The Perceptron algorithm learns the weights for the input signals in order to draw a
linear decision boundary.
4. This enables you to distinguish between the two linearly separable classes +1 and -1.
5. Note: Supervised Learning is a type of Machine Learning used to learn models from
labeled training data. It enables output prediction for future or unseen data.
6. Let us focus on the Perceptron Learning Rule in the next section
Perceptron Learning Rule
Perceptron Learning Rule states that the algorithm would automatically learn the optimal weight
coefficients. The input features are then multiplied with these weights to determine if a neuron fires or
not.

Fig5. Perceptron Rule

The Perceptron receives multiple input signals, and if the sum of the input signals exceeds a
certain threshold, it either outputs a signal or does not return an output. In the context of
supervised learning and classification, this can then be used to predict the class of a sample.

In the next section, let us focus on the perceptron function.

Perceptron Function
Perceptron is a function that maps its input “x,” which is multiplied with the learned weight
coefficient; an output value ”f(x)”is generated.

In the equation given above:


“w” = vector of real-valued weights
“b” = bias (an element that adjusts the boundary away from origin without any dependence
on the input value)
“x” = vector of input x values
Fig6. Perceptron Adding with summation
Inputs of a Perceptron
A Perceptron accepts inputs, moderates them with certain weight values, then applies the
transformation function to output the final result. The above below shows a Perceptron with a
Boolean output.

A Boolean output is based on inputs such as salaried, married, age, past credit profile, etc. It has only
two values: Yes and No or True and False. The summation function “∑” multiplies all inputs of “x”
by weights “w” and then adds them up as follows:

Activation Functions of Perceptron


The activation function applies a step rule (convert the numerical output into +1 or -1) to
check if the output of the weighting function is greater than zero or not.

Fig7. Perceptron Activation function

 Step function gets triggered above a certain value of the neuron output; else it outputs
zero.
 Sign Function outputs +1 or -1 depending on whether neuron output is greater than
zero or not.
 Sigmoid is the S-curve and outputs a value between 0 and 1.
For example:

If ∑ wixi> 0 => then final output “o” = 1 (issue bank loan)

Else, final output “o” = -1 (deny bank loan)

Step function gets triggered above a certain value of the neuron output; else it outputs zero.
Sign Function outputs +1 or -1 depending on whether neuron output is greater than zero or
not. Sigmoid is the S-curve and outputs a value between 0 and 1.

Training a Two-Layer MLP Network

Problems

1. For the network shown in Figure I, calculate the weights are net input to
the output neuron.
Solution:
The given neural net consists of three input neurons and one output neuron

2. Calculate the net input for the network shown in Figure 2 with bias
included in the network

Solution:
3. Obtain the output of the neuron Y for the network shown in Figure 3
using activation functions as: (i) binary sigmoidal and (ii) bipolar
sigmoidal.

Solution:
Multilayer networks and the Back-propagation algorithm.

1. A multilayer perceptron is a feedforward neural network with one or more hidden


layers.
2. The network consists of an input layer of source neurons, at least one middle or
hidden layer of computational neurons, and an output layer of computational
neurons.
3. The input signals are propagated in a forward direction on a layer-by-layer basis.
4. A hidden layer “hides” its desired output.
5. Neurons in the hidden layer cannot be observed through the input/output behaviour of
the network.
6. There is no obvious way to know what the desired output of the hidden layer should
be.

Fig8. Multilayer perceptron Network model

It has 3 layers including one hidden layer. If it has more than 1 hidden layer, it is called a
deep ANN.

An MLP is a typical example of a feedforward artificial neural network.

In figure8, the i activation unit in the l layer is denoted as a .


th th
i
(l)
The number of layers and the number of neurons are referred to as hyper parameters of a
neural network, and these need tuning. Cross-validation techniques must be used to find ideal
values for these.

The weight adjustment training is done via back propagation. Deeper neural networks are
better at processing data. However, deeper layers can lead to vanishing gradient problem.
Special algorithms are required to solve this issue.

Input or Visible Layers

The bottom layer that takes input from your dataset is called the visible layer, because it is the
exposed part of the network. Often a neural network is drawn with a visible layer with one
neuron per input value or column in your dataset. These are not neurons as described above,
but simply pass the input value though to the next layer.

Hidden Layers

Layers after the input layer are called hidden layers because that are not directly exposed to
the input. The simplest network structure is to have a single neuron in the hidden layer that
directly outputs the value.

Given increases in computing power and efficient libraries, very deep neural networks can be
constructed. Deep learning can refer to having many hidden layers in your neural network.
They are deep because they would have been unimaginably slow to train historically, but may
take seconds or minutes to train using modern techniques and hardware.

Output Layer

The final hidden layer is called the output layer and it is responsible for outputting a value or
vector of values that correspond to the format required for the problem.

The choice of activation function in he output layer is strongly constrained by the type of
problem that you are modeling. For example:

 A regression problem may have a single output neuron and the neuron may have no
activation function.
 A binary classification problem may have a single output neuron and use a sigmoid
activation function to output a value between 0 and 1 to represent the probability of
predicting a value for the class 1. This can be turned into a crisp class value by using a
threshold of 0.5 and snap values less than the threshold to 0 otherwise to 1.
 A multi-class classification problem may have multiple neurons in the output layer,
one for each class. In this case a softmax activation function may be used to output a
probability of the network predicting each of the class values. Selecting the output
with the highest probability can be used to produce a crisp class classification value.
Forward and backward passes
The forward and backward phases are repeated from some epochs. In each epoch, the
following occurs:
1. The inputs are propagated from the input to the output layer.
2. The network error is calculated.
3. The error is propagated from the output layer to the input layer.

Back-propagation neural network


1. Learning in a multilayer network proceeds the same way as for a perceptron.
2. A training set of input patterns is presented to the network.
3. The network computes its output pattern, and if there is an error  or in other words a
difference between actual and desired output patterns  the weights are adjusted
to reduce this error
4. In a back-propagation neural network, the learning algorithm has two phases.
5. First, a training input pattern is presented to the network input layer.
6. The network propagates the input pattern from layer to layer until the output pattern is
generated by the output layer.
7. If this pattern is different from the desired output, an error is calculated and then
propagated backwards through the network from the output layer to the input layer.
8. The weights are modified as the error is propagated.
Fig9. Back-propagation neural network model

Back-propagation Algorithm
1.Inputs X, arrive through the preconnected path
Input the instance (x1,…,xn) to the network and compute the network outputs ok

2.Input is modeled using real weights W. The weights are usually randomly selected.
For each output unit k k=ok(1-ok)(tk-ok)
3.Calculate the output for every neuron from the input layer, to the hidden layers, to the
output layer.
For each hidden unit h h=oh(1-oh) k wh,k k
4.Calculate the error in the outputs

Travel back from the output layer to the hidden layer to adjust the weights such that
the error is decreased.
5. Keep repeating the process until the desired output is achieved

Steps to implement Back-propagation neural network


STEP 1. Initialize Network
1. Each neuron has a set of weights that need to be maintained.
2. One weight for each input connection and an additional weight for the bias.
3. We will need to store additional properties for a neuron during training, therefore we
will use a dictionary to represent each neuron and store properties by names such as
‘weights‘ for the weight
STEP 2. Forward Propagate
1. It is the technique we will need to generate predictions during training that will need
to be corrected, and it is the method we will need after the network is trained to make
predictions on new data.
2. We can break forward propagation down into three parts:
i. Neuron Activation.
ii. Neuron Transfer.
3. Forward Propagation

Forward Propagate Function


2.1. Neuron Activation
The first step is to calculate the activation of one neuron given an input.
The input could be a row from our training dataset, as in the case of the hidden layer. It
may also be the outputs from each neuron in the hidden layer, in the case of the output
layer.
2.2. Neuron Transfer
Once a neuron is activated, we need to transfer the activation to see what the neuron
output actually is.
Different transfer functions can be used.
It is traditional to use the sigmoid activation function, but you can also use the tanh
(hyperbolic tangent) function to transfer outputs.
More recently, the rectifier transfer function has been popular with large deep learning
networks
2.3. Forward Propagation

1. Forward propagating an input is straightforward.


2. We work through each layer of our network calculating the outputs for each
neuron.
3. All of the outputs from one layer become inputs to the neurons on the next layer.
4. Below is a function named forward_propagate() that implements the forward
propagation for a row of data from our dataset with our neural network.

STEP 3.Back Propagate Error


1. The backpropagation algorithm is named for the way in which weights are trained.
2. Error is calculated between the expected outputs and the outputs forward
propagated from the network.
3. These errors are then propagated backward through the network from the output layer
to the hidden layer, assigning blame for the error and updating weights as they go.
4. The math for backpropagating error is rooted in calculus, but we will remain high
level in this section and focus on what is calculated and how rather than why the
calculations take this particular form.
This part is broken down into two sections.
Transfer Derivative.
Error Backpropagation

1. The first step is to calculate the error for each output neuron, this will give us our
error signal (input) to propagate backwards through the network.
2. The error for a given neuron can be calculated as follows:
3. error = (expected - output) * transfer_derivative(output)
4. Where expected is the expected output value for the neuron, output is the output
value for the neuron and transfer_derivative() calculates the slope of the neuron’s
output value, as shown above.

STEP 4. Train Network

1. The network is trained using stochastic gradient descent.


2. This involves multiple iterations of exposing a training dataset to the network and for
each row of data forward propagating the inputs, backpropagating the error and
updating the network weights.
3. This part is broken down into two sections:
Update Weights.
Train Network

4.1. Update Weights


1. Once errors are calculated for each neuron in the network via the back propagation
method above, they can be used to update weights.
2. Network weights are updated as follows:
weight = weight + learning_rate * error * input
3. Where weight is a given weight, learning_rate is a parameter that you must
specify, error is the error calculated by the backpropagation procedure for the
neuron and input is the input value that caused the error.

STEP 5.Predict
1. We have already seen how to forward-propagate an input pattern to get an output.
2. We can use the output values themselves directly as the probability of a pattern
belonging to each output class.
3. It may be more useful to turn this output back into a crisp class prediction.
4. We can do this by selecting the class value with the larger probability. This is also
called the arg max function.
5. Below is a function named predict() that implements this procedure.
6. It returns the index in the network output that has the largest probability.
7. It assumes that class values have been converted to integers starting at 0.
Remarks on the Back-Propagation algorithm

1. Convergence and local minima: convergence to the global minimum error,


BACKPROPAGATION is a highly effective function approximation method in
practice.
2. In many practical applications the problem of local minima has not been found to be
as severe as one might fear.
3. To develop some intuition here, consider that networks with large numbers of weights
correspond to error surfaces in very high dimensional spaces (one dimension per
weight).
4. In fact, the more weights in the network, the more dimensions that might provide
"escape routes" for gradient descent to fall away from the local minimum with
respect to this single weight.

Expressive capabilities of ANNs

1.Boolean functions:
Every boolean function can be represented by network with two layers of units where
the number of hidden units required grows exponentially.
2.Continuous functions:
Every bounded continuous function can be approximated with arbitrarily small error,
by network with two layers of units
3.Arbitrary functions:
Any function can be approximated to arbitrary accuracy by a network with three
layers of units
4.Hypothesis space search
Every possible assignment of network weights represents a syntactically distinct
hypothesis.
This hypothesis space is continuous in contrast to that of decision tree.
5.Inductive bias
One can roughly characterize it as smooth interpolation between data points.
One of interesing property of back propagation ability to discover useful intermediate
representations at the hidden unit layers inside the network.
Because training examples constrain only the network inputs and outputs, the
weight-tuning procedure is free to set weights that define whatever hidden unit
representation is most effective at minimizing the squared error E..
An illustrative example: face recognition model

The learning task here involves classifying camera images of faces of various people
in various poses. Images of 20 different people were collected
approximately 32 images per person, varying the person's expression (happy, sad,
angry, neutral), the direction in which they were looking (left, right, straight ahead,
up), and whether or not they were wearing sunglasses

Fig10 . face recognition process

 As can be seen from the example images, there is also variation in the background
behind the person, the clothing worn by the person, and the position of the person's
face within the image.
 In total, 624 greyscale images were collected, each with a resolution of 120 x 128,
with each image pixel described by a greyscale intensity value between 0 (black) and
255 (white).

Fig11 . face recognition in neural network


Input Image process:

1. A variety of target functions can be learned from this image data.


2. For example, given an image as input we could train an ANN to output the
identity of the person, the direction in which the person is facing, the gender of the
person, whether or not they are wearing sunglasses, etc.
3. All of these target functions can be learned to high accuracy from this image data, and
the reader is encouraged to try out these experiments.

Fig12 . Face recognition input images

Fig13. Face recognition input images

 Learning an artificial neural network to recognize face pose. Here a 960 x 3 x 4


network is trained on grey-level images of faces (see top), to predict whether a
person is looking to their left, right, ahead, or up.
 After training on 260 such images, the network achieves an accuracy of 90% over a
separate test set.
 The learned network weights are shown after one weight-tuning iteration through the
training examples and after 100 iterations.
 Each output unit (left, straight, right, up) has four weights, shown by dark (negative)
and light (positive) blocks.
 The leftmost block corresponds to the weight wg, which determines the unit
threshold, and the three blocks to the right correspond to weights on inputs from the
three hidden units.
 The weights from the image pixels into each hidden unit are also shown, with each
weight plotted in the position of the corresponding image pixel.

Design for Input Images:


 After training on a set of 260 images, classification accuracy over a separate test set is
90%.
 In contrast, the default accuracy achieved by randomly guessing one of the four
possible face directions is 25%.
Input encoding
1. Given that the ANN input is to be some representation of the image, one key
design choice is how to encode this image.
2. For example, we could pre-process the image to extract edges, regions of uniform
intensity, or other local image features, then input these features to the network.
3. One difficulty with this design option is that it would lead to a variable number of
features (e.g., edges) per image, whereas the ANN has a fixed number of input units
4. The design option chosen in this case was instead to encode the image as a fixed set
of 30 x 32 pixel intensity values, with one network input per pixel.
5. The pixel intensity values ranging from 0 to 255 were linearly scaled to range from
0 to 1 so that network inputs would have values in the same interval as the hidden unit
and output unit activations.
6. The 30 x 32 pixel image is, in fact, a coarse resolution summary of the original 120 x
128 captured image, with each coarse pixel intensity calculated as the mean of the
corresponding high-resolution pixel intensities
7. Using this coarse-resolution image reduces the number of inputs and network
weights to a much more manageable size, thereby reducing computational demands,
while maintaining sufficient resolution to correctly classify the images.

Output encoding:

 The ANN must output one of four values indicating the direction in which the
person is looking (left, right, up, or straight).
 Note we could encode this four-way classification using a single output unit,
assigning outputs of, say, 0.2, 0.4, 0.6, and 0.8 to encode these four possible
values. Instead, we use four distinct output units, each representing one of the four
possible face directions, with the highest-valued output taken as the network
prediction.
Network graph structure
1. Therefore, another design choice we face is how many units to include in the
network and how to interconnect them.
2. The most common network structure is a layered network with feed forward
connections from every unit in one layer to every unit in the next.
3. In the current design we chose this standard structure, using two layers of sigmoid
units (one hidden layer and one output layer)
4. It is common to use one or two layers of sigmoid units and, occasionally, three
layers.
5. It is not common to use more layers than this because training times become very
long and because networks with three layers of sigmoid units.
6. Given our choice of a layered feed forward network with one hidden layer, how
many hidden units should we include
7. In the results reported in Figure, only three hidden units were used, yielding a test
set accuracy of 90%.
8. In other experiments 30 hidden units were used, yielding a test set accuracy one
to two percent higher.
9. Although the generalization accuracy varied only a small amount between these
two experiments, the second experiment required significantly more training time.
10. Using 260 training images, the training time was approximately 1 hour on a Sun
Sparc5 workstation for the 30 hidden unit network, compared to approximately
5 minutes for the 3 hidden unit network

Advanced topics in artificial neural networks:


Alternative Error Functions
As noted earlier, gradient descent can be performed for any function E that is
Backpropagation differentiable with respect to the parameterized hypothesis space. While
the basic algorithm defines E in terms of the sum of squared errors of the network, other
definitions have been suggested in order to incorporate other constraints into the weight-
tuning rule. For each new definition of E a new weight-tuning rule for gradient descent must
be derived.
Examples of alternative definitions of E include

This causes the gradient descent search to seek weight vectors with small magnitudes,
thereby reducing the risk of overfitting. One way to do this is to redefine E as

which yields a weight update rule identical to the Backpropagation, except that each weight
is multiplied by the constant (1 - 2yq) upon each iteration.

Adding a term for errors in the slope, or derivative of the target function. In some cases,
training information may be available regarding desired derivatives of the target function, as
well as desired values

In both of these systems the error function is modified to add a term measuring the
discrepancy between these training derivatives and the actual derivatives of the learned
network.
Minimizing the cross entropy of the network with respect to the target values. Consider
learning a probabilistic function, such as predicting whether a loan applicant will pay back a
loan based on attributes such as the applicant's age and bank balance. Although the training
examples exhibit only boolean target
values (either a 1 or 0, depending on whether this applicant paid back the loan), the
underlying target function might be best modeled by outputting the probability that the given
applicant will repay the loan, rather than attempting to output the actual 1 and 0 value for
each input instance

Given such situations in which we wish for the network to output probability estimates, it can
be shown that the best (i.e., maximum likelihood) probability estimates are given by the
network that minimizes the cross entropy, defined as

Recurrent Networks
Recurrent networks are artificial neural networks that apply to time series data and that use
outputs of network units at time t as the input to other units at time t + 1. In this way, they
support a form of directed cycles in the network. To illustrate, consider the time series
prediction task of predicting the next day's stock market average y(t + 1 ) based on the current
day's economic indicators x(t). Given a time series of such data, one obvious approach is to
train a feedforward network to predict y(t + 1 ) as its output, based on the input values x(t)
One limitation of such a network is that the prediction of y(t + 1 ) depends only on x(t) and
cannot capture possible dependencies of y (t + 1 ) on earlier values of x. This might be
necessary, for example, if tomorrow's stock market average -(+t 1 ) depends on the difference
between today's economic indicator values x(t) and yesterday's values x(t - 1 ) .
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner

You might also like