Face Recognitionwith Deep Learning
Face Recognitionwith Deep Learning
net/publication/365349514
CITATION READS
1 885
2 authors:
All content following this page was uploaded by Youssef Mamdouh Youssef Abdelkader Ahmed on 13 November 2022.
Presented By:
Youssef Mamdouh Youssef Ali Wagdy Salaheldin
Supervised By
[Link]. Gamal [Link]
June - 2019
Abstract
2
Acknowledgment
3
Table of content
Abstract 2
Acknowledgment 3
Chapter 1: Introduction
1.1 What is Machine Learning? 9
1.2 How does machine learning works? 11
• Supervised ML
• Unsupervised ML
• Semi-supervised ML
• Reinforcement learning
1.5 Real-world machine learning use cases 21
2.2.2 Perceptron 28
4
2.2.3 ANN 34
• Advantages of Backpropagation
• Disadvantages of Backpropagation
2.4 CNN 48
2.4.6 Sigmoid 53
2.4.7 Tanh 53
2.4.8 ReLU 54
5
2.5 Face Recognition 54
3.2 Preprocessing 61
Chapter 4: Experiments
4.1 Architecture model 62
4.2 Experiment 1 63
4.3 Experiment 2 65
4.4 Experiment 3 67
4.5 Experiment 4 70
4.6 Experiment 5 72
4.7 Experiment 6 75
4.8 Experiment 7 78
4.9 Experiment 8 80
4.10 Experiment 9 83
5.2 Summary 86
References 88
6
List of figures
Figure 1: Artificial Intelligence diagram 9
Figure 2: How does machine learning work? 11
Figure 3: Types of Machine learning 12
Figure 4: Neural Networks 23
Figure 5: Basic components of Perceptron 29
Figure 6: How does Perceptron work? 30
Figure 7: Biological Neural Network 34
Figure 8: Artificial Neural Network 34
Figure 9: ANN layers 36
Figure 10: How do artificial neural networks work? 39
Figure 11: How Backpropagation Algorithm Works 42
Figure 12: Convolution operation 48
Figure 13: Formula for Convolution Layer 49
Figure 14: Convolution layer operation 50
Figure 15: Pooling layer operation 52
Figure 16: Formula for Padding Layer 52
Figure 17: Face recognition techniques 56
Figure 18: Holistic matching 57
Figure 19: Feature-based 58
Figure 20: Model Based 59
Figure 21: Bad architectural model 62
Figure 22: Best architectural model 63
Figure 23: Experiment 1 class 86 64
Figure 24: Experiment 1 class 112 64
Figure 25: Experiment 2 class 108 66
Figure 26: Experiment 2 class 96 66
Figure 27: Experiment 3 class 60 68
7
Figure 28: Experiment 3 class 44 69
Figure 29: Experiment 4 class 43 71
Figure 30: Experiment 4 class 10 71
Figure 31: Experiment 5 class 51 73
Figure 32: Experiment 5 class 30 74
Figure 33: Experiment 6 class 83 76
Figure 34: Experiment 6 class 121 77
Figure 35: Experiment 7 class 34 79
Figure 36: Experiment 7 class 23 79
Figure 37: Experiment 8 class 3 81
Figure 38: Experiment 8 class 84 82
Figure 39: Experiment 9 class 130 84
Figure 40: Experiment 9 class 151 85
8
Chapter 1
Introduction
1.1 What is Machine Learning?
Machine learning (ML) is a discipline of artificial intelligence (AI) that
provides machines with the ability to automatically learn from data and
past experiences while identifying patterns to make predictions with
minimal human intervention.
9
While machine learning is not a new concept – dating back to World War
II when the Enigma Machine was used – the ability to apply complex
mathematical calculations automatically to growing volumes and varieties
of available data is a relatively recent development.
Today, with the rise of big data, IoT, and ubiquitous computing, machine
learning has become essential for solving problems across numerous areas,
such as:
10
1.2 How does machine learning works?
Machine learning algorithms are molded on a training dataset to create a
model. As new input data is introduced to the trained ML algorithm, it
uses the developed model to make a prediction.
11
1.3 Why is Machine learning important?
Machine learning is important because it gives companies and
enterprises a view of trends in customer behavior and business
operational patterns, as well as supports the development of new
products. Many of today's leading companies, such as Facebook,
Google and Uber, make machine learning the center part of their
operations. It has become a significant competitive differentiator for
many companies.
1.4 Types of Machine Learning
Machine learning algorithms can be trained in many ways, with each
method having its pros and cons. Based on these methods and ways of
learning, machine learning is broadly categorized into four main types:
12
1. Supervised machine learning
This type of ML involves supervision, where machines are trained
on labeled datasets and enabled to predict outputs based on the
provided training.
The labeled dataset specifies that some input and output
parameters are already mapped. Hence, the machine is trained
with the input and corresponding output. A device is made to
predict the outcome using the test dataset in subsequent phases.
For example, consider an input dataset of parrot and crow images.
Initially, the machine is trained to understand the pictures,
including the parrot and crow’s color, eyes, shape, and size. Post-
training, an input picture of a parrot is provided, and the machine is
expected to identify the object and predict the output.
The trained machine checks for the various features of the object,
such as color, eyes, shape, etc., in the input picture, to make a
final prediction.
This is the process of object identification in supervised machine
learning.
13
The primary objective of the supervised learning technique is to map the
input variable (a) with the output variable (b). Supervised machine
learning is further classified into two broad categories:
• Classification: These refer to algorithms that address
classification problems where the output variable is categorical;
for example, yes or no, true or false, male or female, etc. Real-
world applications of this category are evident in spam
detection and email filtering.
14
Problems and Issues in Supervised learning:
Before we get started, we must know about how to pick a good machine
learning algorithm for the given dataset.
To intelligently pick an algorithm to use for a supervised learning task,
we must consider the following factors:
1. Heterogeneity of Data: Many algorithms like neural networks
and support vector machines like their feature vectors to be
homogeneous numeric and normalized. The algorithms that
employ distance metrics are very sensitive to this, and hence if
the data is heterogeneous, these methods should be the
afterthought. Decision Trees can handle heterogeneous data
very easily.
15
4. Bias-Variance Tradeoff: A learning algorithm is biased for a
particular input x if, when trained on each of these data sets, it
is systematically incorrect when predicting the correct output for
x, whereas a learning algorithm has high variance for a
particular input x if it predicts different output values when
trained on different training sets. The prediction error of a
learned classifier can be related to the sum of bias and variance
of the learning algorithm, and neither can be high as they will
make the prediction error to be high. A key feature of machine
learning algorithms is that they are able to tune the balance
between bias and variance automatically, or by manual tuning
using bias parameters, and using such algorithms will resolve
this situation.
16
6. Overfitting: The programmer should know that there is a
possibility that the output values may constitute of an inherent
noise which is the result of human or sensor errors. In this case,
the algorithm must not attempt to infer the function that exactly
matches all the data. Being too careful in fitting the data can
cause overfitting, after which the model will answer perfectly for
all training examples but will have a very high error for unseen
samples. A practical way of preventing this is stopping the
learning process prematurely, as well as applying filters to the
data in the pre-learning phase to remove noises. Only after
considering all these factors can we pick a supervised learning
algorithm that works for the dataset we are working on. For
example, if we were working with a dataset consisting of
heterogeneous data, then decision trees would fare better than
other algorithms. If the input space of the dataset we were
working on had 1000 dimensions, then it’s better to first perform
PCA on the data before using a supervised learning algorithm
on it.
17
2. Unsupervised machine learning
Unsupervised learning refers to a learning technique that’s devoid
of supervision. Here, the machine is trained using an unlabeled
dataset and is enabled to predict the output without any
supervision.
An unsupervised learning algorithm aims to group the unsorted
dataset based on the input’s similarities, differences, and patterns.
For example, consider an input dataset of images of a fruit-filled
container. Here, the images are not known to the machine learning
model. When we input the dataset into the ML model, the task of
the model is to identify the pattern of objects, such as color, shape,
or differences seen in the input images and categorize them.
Upon categorization, the machine then predicts the output as it
gets tested with a test dataset.
Unsupervised machine learning is further classified into two types:
• Clustering: The clustering technique refers to grouping
objects into clusters based on parameters such as
similarities or differences between objects. For example,
grouping customers by the products they purchase.
Some known clustering algorithms include the K-Means
Clustering Algorithm, Mean-Shift Algorithm, DBSCAN
Algorithm, Principal Component Analysis, and Independent
Component Analysis.
• Association: Association learning refers to identifying typical
relations between the variables of a large dataset. It
determines the dependency of various data items and maps
associated variables. Typical applications include web usage
mining and market data analysis.
Popular algorithms obeying association rules include the
Apriori Algorithm, Eclat Algorithm, and FP-Growth Algorithm.
18
Disadvantages of unsupervised learning
• The spectral classes do not necessarily represent the features on
the ground.
• It does not consider spatial relationships in the data.
• It can take time to interpret the spectral classes.
3. Semi-supervised learning
Semi-supervised learning comprises characteristics of both
supervised and unsupervised machine learning. It uses the
combination of labeled and unlabeled datasets to train its
algorithms. Using both types of datasets, semi-supervised learning
overcomes the drawbacks of the options mentioned above.
Consider an example of a college student.
A student learning a concept under a teacher’s supervision in
college is termed supervised learning. In unsupervised learning, a
student self-learns the same concept at home without a teacher’s
guidance. Meanwhile, a student revising the concept after learning
under the direction of a teacher in college is a semi-supervised
form of learning.
Disadvantages of semi-supervised
• Iteration results are not stable.
• It is not applicable to network-level data.
• It has low accuracy.
19
4. Reinforcement learning
Reinforcement learning is a feedback-based process. Here, the AI
component automatically takes stock of its surroundings by the hit
& trial method, takes action, learns from experiences, and
improves performance.
The component is rewarded for each good action and penalized
for every wrong move. Thus, the reinforcement learning
component aims to maximize the rewards by performing good
actions.
Unlike supervised learning, reinforcement learning lacks labeled
data, and the agents learn via experiences only. Consider video
games.
Here, the game specifies the environment, and each move of the
reinforcement agent defines its state. The agent is entitled to
receive feedback via punishment and rewards, thereby affecting
the overall game score.
The ultimate goal of the agent is to achieve a high score.
Reinforcement learning is applied across different fields such as
game theory, information theory, and multi-agent systems.
Reinforcement learning is further divided into two types of methods
or algorithms:
o Positive reinforcement learning: This refers to adding a
reinforcing stimulus after a specific behavior of the agent,
which makes it more likely that the behavior may occur again
in the future, e.g., adding a reward after a behavior.
o Negative reinforcement learning: Negative reinforcement
learning refers to strengthening a specific behavior that
avoids a negative outcome.
20
Disadvantages of reinforcement learning
• Too much reinforcement learning can lead to an overload of states
which can diminish the results.
• This algorithm is not preferable for solving simple problems.
• This algorithm needs a lot of data and a lot of computation.
• The curse of dimensionality limits reinforcement learning for real
physical systems.
Here are just a few examples of machine learning you might encounter
every day:
21
Recommendation engines: Using past consumption behavior data, AI
algorithms can help to discover data trends that can be used to develop
more effective cross-selling strategies. This is used to make relevant
add-on recommendations to customers during the checkout process for
online retailers.
22
Chapter 2
Neural Network and Deep Learning
2.1 Neural Networks
A neural network is a series of algorithms that attempt to recognize
underlying relationships in a set of data through a process that imitate
the way the human brain operates. In this sense, neural networks refer
to systems of neurons, either organic or artificial in nature.
For example, when human faces some sort of unexpected event, the
human eye detects the object and sends signals to the brain, then it
responds based on the intensity of the signal. These signals is
transferred to the brain via something called neurons. The neurons take
the signal and pass it by to the next neuron until it reaches the brain.
Neural networks can adapt to changing input; so the network produce
the best possible result without needing to redesign the output criteria.
23
2.1.1 Why are neural networks important?
Neural networks can help computers make intelligent decisions with
limited human assistance. This is because they can learn and model the
relationships between input and output data that are nonlinear and
complex. For instance, they can do the following tasks.
Make generalizations and inferences
Neural networks can comprehend unstructured data and make general
observations without explicit training. For instance, they can recognize
that two different input sentences have a similar meaning:
Can you tell me how to make the payment?
How do I transfer money?
A neural network would know that both sentences mean the same thing.
Or it would be able to broadly recognize that Baxter Road is a place, but
Baxter Smith is a person’s name.
Reveal hidden relationships and patterns
Neural networks can analyze raw data more deeply and reveal new
insights for which they might not have been trained. For example,
consider a pattern recognition neural network that analyses consumer
purchases. By comparing the buying patterns of numerous users, the
neural network can suggest new items that might interest a specific
consumer.
Create autonomous, self-learning systems
Neural networks can learn and improve over time based on user
behavior. For example, consider a neural network that automatically
corrects or suggests words by analyzing your typing behavior. Let us
assume that the model was trained in the English language and can
spell-check English words. However, if you frequently type non-English
words, like danke, the neural network can automatically learn and
correct these words too.
Learn and model highly volatile data
Some datasets, such as loan repayment amounts in a bank, can have
large variations. Neural networks can model such data as well. For
example, they can analyze financial transactions and flag some of them
for fraud detection. They can also process complex data that holds the
24
key to difficult biological problems like protein folding, DNA analysis, and
more.
2.1.2 What are neural networks used for?
Neural networks have several use cases across many industries, such
as the following:
• Medical diagnosis by medical image classification
• Targeted marketing by social network filtering and behavioral data
analysis
• Financial predictions by processing historical data of financial
instruments
• Electrical load and energy demand forecasting
• Process and quality control
• Chemical compound identification
We give four of the important applications of neural networks below.
Computer vision
Computer vision is the ability of computers to extract information and
insights from images and videos. With neural networks, computers can
distinguish and recognize images similar to humans. Computer vision
has several applications, such as the following:
Visual recognition in self-driving cars so they can recognize road signs
and other road users.
Content moderation to automatically remove unsafe or inappropriate
content from image and video archives.
Facial recognition to identify faces and recognize attributes like open
eyes, glasses, and facial hair.
Image labeling to identify brand logos, clothing, safety gear, and other
image details.
25
Speech recognition
Neural networks can analyze human speech despite varying speech
patterns, pitch, tone, language, and accent. Virtual assistants like
Amazon Alexa and automatic transcription software use speech
recognition to do tasks like these:
Assist call center agents and automatically classify calls
Convert clinical conversations into documentation in real time
Accurately subtitle videos and meeting recordings for wider content
reach
Natural language processing
Natural language processing (NLP) is the ability to process natural,
human-created text. Neural networks help computers gather insights and
meaning from text data and documents. NLP has several use cases,
including in these functions:
Automated virtual agents and chatbots
Automatic organization and classification of written data
Business intelligence analysis of long-form documents like emails and
forms
Indexing of key phrases that indicate sentiment, like positive and
negative comments on social media
Document summarization and article generation for a given topic
Recommendation engines
Neural networks can track user activity to develop personalized
recommendations. They can also analyze all user behavior and discover
new products or services that interest a specific user. For example,
Curalate, a Philadelphia-based startup, helps brands convert social
media posts into sales. Brands use Curalate’s intelligent product tagging
(IPT) service to automate the collection and curation of user-generated
social content. IPT uses neural networks to automatically find and
recommend products relevant to the user’s social media activity.
Consumers don't have to hunt through online catalogs to find a specific
product from a social media image. Instead, they can use Curalate’s
auto product tagging to purchase the product with ease.
26
2.1.3 How do neural networks work?
The human brain is the inspiration behind neural network architecture.
Human brain cells, called neurons, form a complex, highly
interconnected network and send electrical signals to each other to help
humans process information. Similarly, an artificial neural network is
made of artificial neurons that work together to solve a problem. Artificial
neurons are software modules, called nodes, and artificial neural
networks are software programs or algorithms that, at their core, use
computing systems to solve mathematical calculations.
Simple neural network architecture
A basic neural network has interconnected artificial neurons in three
layers:
Input Layer
Information from the outside world enters the artificial neural network
from the input layer. Input nodes process the data, analyze or categorize
it, and pass it on to the next layer.
Hidden Layer
Hidden layers take their input from the input layer or other hidden layers.
Artificial neural networks can have a large number of hidden layers.
Each hidden layer analyzes the output from the previous layer,
processes it further, and passes it on to the next layer.
Output Layer
The output layer gives the final result of all the data processing by the
artificial neural network. It can have single or multiple nodes. For
instance, if we have a binary (yes/no) classification problem, the output
layer will have one output node, which will give the result as 1 or 0.
However, if we have a multi-class classification problem, the output layer
might consist of more than one output node.
27
2.2 Traditional Neural Networks
2.2.1Warren Mcculloch
In the 1943 paper McCulloch and Pitts attempted to demonstrate that
a Turing machine program could be implemented in a finite network
of formal neurons (in the event, the Turing Machine contains their
model of the brain), that the neuron was the base logic unit of the
brain.
In the 1947 paper they offered approaches to designing "nervous
nets" to recognize visual inputs despite changes in orientation or size.
From 1952 McCulloch worked at the Research Laboratory of
Electronics at MIT, working primarily on neural network modelling. His
team examined the visual system of the frog in consideration of
McCulloch's 1947 paper, discovering that the eye provides the brain
with information that is already, to a degree, organized and
interpreted, instead of simply transmitting an image.
2.2.2 Perceptron
Perceptron was introduced by Frank Rosenblatt in 1957. He
proposed a Perceptron learning rule based on the original MCP
neuron. A Perceptron is an algorithm for supervised learning of binary
classifiers. This algorithm enables neurons to learn and processes
elements in the training set one at a time.
What is the Perceptron model in Machine Learning?
Perceptron is Machine Learning algorithm for supervised learning of
various binary classification tasks.
Further, Perceptron is also understood as an Artificial Neuron or neural
network unit that helps to detect certain input data computations in
business intelligence.
Perceptron model is also treated as one of the best and simplest types
of Artificial Neural networks.
However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four
main parameters, i.e., input values, weights and Bias, net sum, and an
activation function.
28
Basic Components of Perceptron
Mr. Frank Rosenblatt invented the perceptron model as a binary
classifier which contains three main components. These are as follows:
29
How does Perceptron work?
In Machine Learning, Perceptron is considered as a single-layer neural
network that consists of four main parameters named input values (Input
nodes), weights and Bias, net sum, and an activation function.
The perceptron model begins with the multiplication of all input values
and their weights, then add these values together to create the weighted
sum.
Then this weighted sum is applied to the activation function 'f' to obtain
the desired output. This activation function is also known as the step
function and is represented by 'f'.
30
There are two types of Perceptrons: Single layer and Multilayer.
31
Multi-Layered Perceptron Model:
Like a single-layer perceptron model, a multi-layer perceptron model
also has the same model structure but has a greater number of hidden
layers.
The multi-layer perceptron model is also known as the Backpropagation
algorithm, which executes in two stages as follows:
Forward Stage: Activation functions start from the input layer in the
forward stage and terminate on the output layer.
Backward Stage: In the backward stage, weight and bias values are
modified as per the model's requirement. In this stage, the error between
actual output and demanded originated backward on the output layer
and ended on the input layer.
Hence, a multi-layered perceptron model has considered as multiple
artificial neural networks having various layers in which activation
function does not remain linear, similar to a single layer perceptron
model. Instead of linear, activation function can be executed as sigmoid,
TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can
process linear and non-linear patterns. Further, it can also implement
logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR.
Advantages of Multi-Layer Perceptron:
A multi-layered perceptron model can be used to solve complex non-
linear problems.
It works well with both small and large input data.
It helps us to obtain quick predictions after the training.
It helps to obtain the same accuracy ratio with large as well as small
data.
Disadvantages of Multi-Layer Perceptron:
In Multi-layer perceptron, computations are difficult and time-consuming.
In multi-layer Perceptron, it is difficult to predict how much the dependent
variable affects each independent variable.
The model functioning depends on the quality of the training.
32
Characteristics of Perceptron
The perceptron model has the following characteristics.
• Perceptron is a machine learning algorithm for supervised learning
of binary classifiers.
• In Perceptron, the weight coefficient is automatically learned.
• Initially, weights are multiplied with input features, and the decision
is made whether the neuron is fired or not.
• The activation function applies a step rule to check whether the
weight function is greater than zero.
• The linear decision boundary is drawn, enabling the distinction
between the two linearly separable classes +1 and -1.
• If the added sum of all input values is more than the threshold
value, it must have an output signal; otherwise, no output will be
shown.
33
2.2.3 ANN
What is Artificial Neural Network?
The term "Artificial Neural Network" is derived from Biological neural
networks that develop the structure of a human brain. Similar to the
human brain that has neurons interconnected to one another, artificial
neural networks also have neurons that are interconnected to one
another in various layers of the networks. These neurons are known as
nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
34
Dendrites from Biological Neural Network represent inputs in Artificial
Neural Networks, cell nucleus represents Nodes, synapse represents
Weights, and Axon represents Output.
An Artificial Neural Network in the field of Artificial intelligence where
it attempts to mimic the network of neurons makes up a human brain so
that computers will have an option to understand things and make
decisions in a human-like manner.
The artificial neural network is designed by programming computers to
behave simply like interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron
has an association point somewhere in the range of 1,000 and 100,000.
In the human brain, data is stored in such a manner as to be distributed,
and we can extract more than one piece of this data when necessary
from our memory parallelly.
We can say that the human brain is made up of incredibly amazing
parallel processors.
We can understand the artificial neural network with an example,
consider an example of a digital logic gate that takes an input and gives
an output. "OR" gate, which takes two inputs. If one or both the inputs
are "On," then we get "On" in output. If both the inputs are "Off," then we
get "Off" in output. Here the output depends upon input.
Our brain does not perform the same task. The outputs to inputs
relationship keep changing because of the neurons in our brain, which
are "learning."
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural
network, we have to understand what a neural network consists of. In
order to define a neural network that consists of a large number of
artificial neurons, which are termed units arranged in a sequence of
layers. Let us look at various types of layers available in an artificial
neural network.
35
Artificial Neural Network primarily consists of three layers:
Input Layer:
As the name suggests, it accepts inputs in several different formats
provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It
performs all the calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden
layer, which finally results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum
of the inputs and includes a bias. This computation is represented in the
form of a transfer function.
36
Advantages of Artificial Neural Networks (ANN)
1. Hardware Dependence:
1. Artificial Neural Networks require processors with parallel
processing power, by their structure.
2. For this reason, the realization of the equipment is
dependent.
2. Unexplained functioning of the network:
1. This the most important problem of ANN.
2. When ANN gives a probing solution, it does not give a clue
as to why and how.
3. This reduces trust in the network.
37
4. The difficulty of showing the problem to the network:
1. ANNs can work with numerical information.
2. Problems have to be translated into numerical values before
being introduced to ANN.
3. The display mechanism to be determined will directly
influence the performance of the network.
4. This is dependent on the user's ability.
38
How do artificial neural networks work?
Artificial Neural Network can be best represented as a weighted directed
graph, where the artificial neurons form the nodes.
The association between the neurons outputs and neuron inputs can be
viewed as the directed edges with weights.
The Artificial Neural Network receives the input signal from the external
source in the form of a pattern and image in the form of a vector.
These inputs are then mathematically assigned by the notations x(n) for
every n number of inputs.
40
2.2.4 Backpropagation
Proper tuning of the weights allows you to reduce error rates and make
the model reliable by increasing its generalization.
41
How Backpropagation Algorithm Works
The Back propagation algorithm in neural network calculates the
gradient of the loss function for a single weight by the chain rule.
5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.
42
Types of Backpropagation Networks
• Static Back-propagation
• Recurrent Backpropagation
Static back-propagation:
It is one kind of backpropagation network which produces a
mapping of a static input for static output. It is useful to solve static
classification issues like optical character recognition.
Recurrent Backpropagation:
Recurrent Back propagation in data mining is fed forward until a
fixed value is achieved. After that, the error is computed and
propagated backward.
The main difference between both of these methods is: that the
mapping is rapid in static back-propagation while it’s non-static in
recurrent backpropagation.
43
Advantages of Backpropagation:
• Backpropagation is fast, simple and easy to program
• It has no parameters to tune apart from the numbers of input
• It is a flexible method as it does not require prior knowledge about
the network
• It is a standard method that generally works well
• It does not need any special mention of the features of the function
to be learned.
44
2.3 Deep Learning
Deep learning is a subset of machine learning, which is essentially a
neural network with three or more layers.
These neural networks attempt to simulate the behavior of the human
brain—albeit far from matching its ability—allowing it to “learn” from large
amounts of data.
While a neural network with a single layer can still make approximate
predictions, additional hidden layers can help to optimize and refine for
accuracy.
Deep learning drives many artificial intelligence (AI) applications and
services that improve automation, performing analytical and physical
tasks without human intervention.
Deep learning technology lies behind everyday products and services
(such as digital assistants, voice-enabled TV remotes, and credit card
fraud detection) as well as emerging technologies (such as self-driving
cars).
2.3.1 How deep learning works
Deep learning neural networks, or artificial neural networks, attempts to
mimic the human brain through a combination of data inputs, weights,
and bias. These elements work together to accurately recognize,
classify, and describe objects within the data.
Deep neural networks consist of multiple layers of interconnected nodes,
each building upon the previous layer to refine and optimize the
prediction or categorization.
This progression of computations through the network is called forward
propagation. The input and output layers of a deep neural network are
called visible layers.
The input layer is where the deep learning model ingests the data for
processing, and the output layer is where the final prediction or
classification is made.
45
2.3.2 Deep learning applications
Real-world deep learning applications are a part of our daily lives, but in
most cases, they are so well-integrated into products and services that
users are unaware of the complex data processing that is taking place in
the background. Some of these examples include the following:
Law enforcement
Deep learning algorithms can analyze and learn from transactional data
to identify dangerous patterns that indicate possible fraudulent or
criminal activity. Speech recognition, computer vision, and other deep
learning applications can improve the efficiency and effectiveness of
investigative analysis by extracting patterns and evidence from sound
and video recordings, images, and documents, which helps law
enforcement analyze large amounts of data more quickly and accurately.
Financial services
46
Customer service
Healthcare
47
2.4 CNN
CNN deals with larger images better than other neural networks where it
takes the image as an input and applies filter on image to reduce the
size of the image; by detecting the most important (unique) features of
the image. It keeps doing the same thing until it gets all the unique
features. The filter size is usually a 3x3 matrix (can be bigger).
The CNN are distinguished from other neural networks by their superior
performance with image.
48
During the forward pass, the kernel slides across the height and width of
the image-producing the image representation of that receptive region.
This produces a two-dimensional representation of the image known as
an activation map that gives the response of the kernel at each spatial
position of the image.
The sliding size of the kernel is called a stride.
If we have an input of size W x W x D and Dout number of kernels with a
spatial size of F with stride S and amount of padding P, then the size of
output volume can be determined by the following formula:
49
50
2.4.2 Motivation behind Convolution
Convolution leverages three important ideas that motivated computer
vision researchers: sparse interaction, parameter sharing, and
equivariant representation. Let’s describe each one of them in detail.
Trivial neural network layers use matrix multiplication by a matrix of
parameters describing the interaction between the input and output unit.
This means that every output unit interacts with every input unit.
However, convolution neural networks have sparse interaction. This is
achieved by making kernel smaller than the input e.g., an image can
have millions or thousands of pixels, but while processing it using kernel
we can detect meaningful information that is of tens or hundreds of
pixels.
This means that we need to store fewer parameters that not only
reduces the memory requirement of the model but also improves the
statistical efficiency of the model.
If computing one feature at a spatial point (x1, y1) is useful then it should
also be useful at some other spatial point say (x2, y2).
It means that for a single two-dimensional slice i.e., for creating one
activation map, neurons are constrained to use the same set of weights.
In a traditional neural network, each element of the weight matrix is used
once and then never revisited, while convolution network has shared
parameters i.e., for getting output, weights applied to one input are the
same as the weight applied elsewhere.
Due to parameter sharing, the layers of convolution neural network will
have a property of equivariance to translation. It says that if we changed
the input in a way, the output will also get changed in the same way.
51
2.4.3 Pooling Layer
The pooling layer replaces the output of the network at certain locations
by deriving a summary statistic of the nearby outputs.
This helps in reducing the spatial size of the representation, which
decreases the required amount of computation and weights.
The pooling operation is processed on every slice of the representation
individually.
There are several pooling functions such as the average of the
rectangular neighborhood, L2 norm of the rectangular neighborhood,
and a weighted average based on the distance from the central pixel.
However, the most popular process is max pooling, which reports the
maximum output from the neighborhood.
52
This will yield an output volume of size Wout x Wout x D.
In all cases, pooling provides some translation invariance which means
that an object would be recognizable regardless of where it appears on
the frame.
2.4.4 Fully Connected Layer
Neurons in this layer have full connectivity with all neurons in the
preceding and succeeding layer as seen in regular FCNN. This is why it
can be computed as usual by a matrix multiplication followed by a bias
effect.
The FC layer helps to map the representation between the input and the
output.
53
2.4.8 ReLU
The Rectified Linear Unit (ReLU) has become very popular in the last
few years. It computes the function ƒ(κ)=max (0,κ). In other words, the
activation is simply threshold at zero.
In comparison to sigmoid and tanh, ReLU is more reliable and
accelerates the convergence by six times.
Unfortunately, a con is that ReLU can be fragile during training. A large
gradient flowing through it can update it in such a way that the neuron
will never get further updated. However, we can work with this by setting
a proper learning rate.
Many people are familiar with face recognition technology through the
FaceID used to unlock iPhones (however, this is only one application of
face recognition). Typically, facial recognition does not rely on a massive
database of photos to determine an individual’s identity — it simply
identifies and recognizes one person as the sole owner of the device,
while limiting access to others.
54
Facial technology systems can vary, but in general, they tend to operate
as follows:
The camera detects and locates the image of a face, either alone or in a
crowd. The image may show the person looking straight ahead or in
profile.
55
2.8.2 Face Recognition Techniques:
Face recognition is a challenging yet interesting problem that it has
attracted researchers who have different backgrounds like psychology,
pattern recognition, neural networks, computer vision, and computer
graphics.
56
Holistic matching
57
Feature-based
Here local features such as eyes, nose, and mouth are first of all
extracted and their locations, geometry and appearance are fed into a
structural classifier. A challenge for feature extraction methods is feature
"restoration", this is when the system tries to retrieve features that are
invisible due to large variations, e.g. head Pose while matching a frontal
image with a profile image.
Different extraction methods:
Generic methods based on edges, lines, and curves
Feature-template-based methods
Structural matching methods
58
Model Based
59
Hybrid Methods
60
Chapter 3
Face Recognition by using CNN
3.1 Dataset
Our dataset consists of 153 classes, each class contains 20 images with
the same background but from a different angle with a different facial
reaction.
3.2 Preprocessing
Image enhancement was made to the images to improve the quality of
data and information obtained from them to be able to extract more
features.
Some of the techniques used are image augmentation, rescaling and
resizing.
61
Chapter 4
We initially built some models but they weren't able to give us the
required accuracy, for example:
This model was designed using 1 block, and started with 32 filters with
activation function ‘relu’ and ‘softmax.
After many trials and modifications, we were able to come up with the
best model
62
Best Architecture Model:
This model was designed using 3 blocks, and started with 32 filters for
the first block, 64 for the second block and 64 for the third block with
stride (2,2) for each block and activation function ‘relu’ and ‘softmax.
4.2 Experiment 1
Dataset:
First a simple database that contains 113 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 75% of them in the train folder and the rest in the test
folder automatically.
63
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.
Results:
1)
2)
64
4.3 Experiment 2
Dataset:
First a simple database that contains 113 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 50% of them in the train folder and the rest in the test
folder automatically.
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.
65
Results:
1)
67
Results:
1)
68
2)
69
4.5 Experiment 4
Dataset:
First a simple database that contains 132 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 75% of them in the train folder and the rest in the test
folder automatically.
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.
70
Results:
1)
2)
71
4.6 Experiment 5
Dataset:
First a simple database that contains 132 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 50% of them in the train folder and the rest in the test
folder automatically.
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.
72
Results:
1)
73
2)
74
4.7 Experiment 6
Dataset:
First a simple database that contains 132 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 25% of them in the train folder and the rest in the test
folder automatically.
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.
75
Results:
1)
76
2)
77
4.8 Experiment 7
Dataset:
First a simple database that contains 152 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 75% of them in the train folder and the rest in the test
folder automatically.
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.
78
Results:
1)
2)
79
4.9 Experiment 8
Dataset:
First a simple database that contains 152 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 50% of them in the train folder and the rest in the test
folder automatically.
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.
80
Results:
1)
81
2)
82
4.10 Experiment 9
Dataset:
First a simple database that contains 152 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 25% of them in the train folder and the rest in the test
folder automatically.
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.
83
Results:
1)
84
2)
85
Chapter 5
5.1 Problems faced throughout the project:
1. Installing the required libraries with the version needed for our
system
2. Kernel dead: when we used a version other than python
version 3.6
3. Overfitting: We solved this problem by adding strides to our
model
4. GPU: Anaconda wasn’t reading the GPU at some of our
laptops
5.2 Summary
To conclude, our project is face recognition with deep learning. For our
model, we have constructed an architecture and trained our dataset with
it with different ‘train’ and ‘test’ percentages until we got the best results
possible.
From our experience with this project for this academic year, we have
figured that deep learning is a technology going through continuous
updates and upgrades. During our research in papers and articles, for
our model, we have looked to find what architecture fits the best with a
certain case. No architecture is agreed upon to be the “best” for specific
case. What happens is that developers experience new techniques and
try several methods to reach to the best result with the resources
available between their hands. And this is what we have tried to do.
Yet, this results in the next conclusion which is no model can reach a
100% accuracy. For this to happen, the dataset entered should cover
every single side of what the model should understand and know. This is
hard, especially that we have only worked with Anaconda, our personal
computers, and lab’s computer provided by the university. The resources
should be very high to handle both the storage and the processing of the
models. However, even if these resources are available and such
dataset exists, no model will reach perfection in its predictions but it will
get faster result and very close to great predictions.
86
5.3 FUTURE WORK
There are some ideas that we want to implement in the future, for
example, we want to make our system able to identify a companion to an
image even if the image is incomplete, and for the model to be able to
identify the owner of the image even if one of the features is hidden such
as the mouth, eyes, and so on. So we want to make some adjustments
so that we can get the desired results with the right accuracy. We can
then use this model for some applications on the ground such as
criminal investigation systems.
87
References
1) Practical Machine Learning and Image Processing - Himanshu
Singh
2) [Link]
processing
3) [Link]
and-why/
4) [Link]
machine-learning
5) [Link]
[Link]
6) [Link]
7) [Link]
facial-recognition
8) [Link]
earning_Project
9) [Link]
of-artificial-neural-networks/
10) [Link]
intelligence/articles/what-is-ml/
11) [Link]
disadvantages-of-different-types-of-machine-learning-algorithms/
12) [Link]
13) [Link]
14) [Link]
15) [Link]
network/
16) [Link]
networks-explained-9cc5188c4939
88