0% found this document useful (0 votes)
23 views89 pages

Face Recognitionwith Deep Learning

Uploaded by

shantesh14g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views89 pages

Face Recognitionwith Deep Learning

Uploaded by

shantesh14g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/365349514

FACE RECOGNITION WITH DEEP LEARNING

Research · June 2019


DOI: 10.13140/RG.2.2.35787.05920

CITATION READS

1 885

2 authors:

Youssef Mamdouh Youssef Abdelkader Ahmed Ali Wagdy


Pharos University Cairo University
3 PUBLICATIONS 1 CITATION 219 PUBLICATIONS 5,004 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Youssef Mamdouh Youssef Abdelkader Ahmed on 13 November 2022.

The user has requested enhancement of the downloaded file.


FACE RECOGNITION WITH DEEP LEARNING USING PYTHON

Presented By:
Youssef Mamdouh Youssef Ali Wagdy Salaheldin

Supervised By
[Link]. Gamal [Link]

June - 2019
Abstract

Face recognition is the problem of identifying and verifying people in a


photograph by their face. Face recognition problem is a task that is
trivially performed by humans, even under varying light and when faces
are changed by age or obstructed with accessories and facial hair.
Nevertheless, it is remained a challenging computer vision problem for
decades until recently.
Deep learning methods can leverage very large datasets of faces and
learn rich and compact representations of faces, allowing modern
models to first perform as-well and later to outperform the face
recognition capabilities of humans.
In this project, you will discover the problem of face recognition and how
deep learning methods can achieve superhuman performance.
A system based on deep neural networks was designed using
Convolutional Neural Network algorithm. Firstly, the system was tested
on a database of 113 people, each person has 20 images, categorized
into male. Secondly, the system was tested on a database of 132
people, each person has 20 images, categorized into male and female.
Lastly, it was tested on a database of 152 people, each person has 20
images, categorized into male, female and male stuff. The data set was
divided into two groups: the training group and the test and prediction
group.
After conducting many experiments, the system architecture of the
neural network was stable. The network efficiency was tested when the
training data represented 75% and the prediction data was 25%. The
network efficiency was 99.76%.
Then the training data was changed to be 50% correct and the
prediction data was 50%. The network efficiency was 99.73%.
Then the training data was changed to correct 25% and prediction data
75%. The network efficiency was 96.28%.
Then perform the same experiments on each type of database. This
indicates the efficiency of the system.
Finally, a real application of the system was made, which is a door that
worked very efficiently.

2
Acknowledgment

It has been a great opportunity to gain lots of experience in real time


projects, followed by the knowledge of how to actually design and
analyze real projects.
For that we want to thank all the people who made it possible for
students like us.
We would like to express our deepest gratitude to our supervisor
Dr. Gamal Behery for his patience and guidance along the project.
Moreover, it is our duty to thank all the testing committee members for
their generous discussions and encouragement.
At last, we would like to thank all the people who helped, supported
and encouraged us to successfully finish the project whether they were
in the university or in the industry.

3
Table of content
Abstract 2
Acknowledgment 3
Chapter 1: Introduction
1.1 What is Machine Learning? 9
1.2 How does machine learning works? 11

1.3 Why is machine learning important 12

1.4 types of machine learning 12

• Supervised ML
• Unsupervised ML
• Semi-supervised ML
• Reinforcement learning
1.5 Real-world machine learning use cases 21

Chapter 2: Neural Networks and Deep Learning

2.1 Neural network 23

2.1.1 Why are neural networks important? 24

2.1.2 What are neural networks used for? 25

2.1.3 2.1.3 How do neural networks work? 27

2.2 Traditional Neural Networks 28

2.2.1 Warren Mcculloch 28

2.2.2 Perceptron 28

• What is the Perceptron model in Machine Learning?

• Basic components of perceptron


• How does perceptron works?
• Single layer perceptron model
• Multi-layer perceptron model
• Characteristics of Perceptron

4
2.2.3 ANN 34

• What is Artificial Neural Network?


• The architecture of an artificial neural network

• Advantages of Artificial Neural Networks (ANN)

• Disadvantages of Artificial Neural Networks (ANN)

• How do artificial neural networks work?

• Types of Artificial Neural Network


2.2.4 Backpropagation 41

• Why We Need Backpropagation?


• How Backpropagation Algorithm Works

• Types of Backpropagation Networks

• Advantages of Backpropagation
• Disadvantages of Backpropagation

2.3 Deep learning 45

2.3.1 How deep learning works 45

2.3.2 Deep learning applications 46

2.4 CNN 48

2.4.1 Convolution Layer 48

2.4.2 Motivation behind Convolution 51

2.4.3 Pooling Layer 52

2.4.4 Fully Connected Layer 53

2.4.5 Non-Linearity Layers 53

2.4.6 Sigmoid 53

2.4.7 Tanh 53

2.4.8 ReLU 54

5
2.5 Face Recognition 54

2.5.1 What is Facial Recognition 54

2.5.2 Face Recognition Techniques 56

Chapter 3: Face Recognition by using CNN


3.1 Dataset 61

3.2 Preprocessing 61

3.3 Here is how the procedure work 61

Chapter 4: Experiments
4.1 Architecture model 62

4.2 Experiment 1 63

4.3 Experiment 2 65

4.4 Experiment 3 67

4.5 Experiment 4 70

4.6 Experiment 5 72

4.7 Experiment 6 75

4.8 Experiment 7 78

4.9 Experiment 8 80

4.10 Experiment 9 83

Chapter 5: Problems, Conclusion and Future Work


5.1 Problems faced throughout the project 86

5.2 Summary 86

5.3 Future work 87

References 88

6
List of figures
Figure 1: Artificial Intelligence diagram 9
Figure 2: How does machine learning work? 11
Figure 3: Types of Machine learning 12
Figure 4: Neural Networks 23
Figure 5: Basic components of Perceptron 29
Figure 6: How does Perceptron work? 30
Figure 7: Biological Neural Network 34
Figure 8: Artificial Neural Network 34
Figure 9: ANN layers 36
Figure 10: How do artificial neural networks work? 39
Figure 11: How Backpropagation Algorithm Works 42
Figure 12: Convolution operation 48
Figure 13: Formula for Convolution Layer 49
Figure 14: Convolution layer operation 50
Figure 15: Pooling layer operation 52
Figure 16: Formula for Padding Layer 52
Figure 17: Face recognition techniques 56
Figure 18: Holistic matching 57
Figure 19: Feature-based 58
Figure 20: Model Based 59
Figure 21: Bad architectural model 62
Figure 22: Best architectural model 63
Figure 23: Experiment 1 class 86 64
Figure 24: Experiment 1 class 112 64
Figure 25: Experiment 2 class 108 66
Figure 26: Experiment 2 class 96 66
Figure 27: Experiment 3 class 60 68

7
Figure 28: Experiment 3 class 44 69
Figure 29: Experiment 4 class 43 71
Figure 30: Experiment 4 class 10 71
Figure 31: Experiment 5 class 51 73
Figure 32: Experiment 5 class 30 74
Figure 33: Experiment 6 class 83 76
Figure 34: Experiment 6 class 121 77
Figure 35: Experiment 7 class 34 79
Figure 36: Experiment 7 class 23 79
Figure 37: Experiment 8 class 3 81
Figure 38: Experiment 8 class 84 82
Figure 39: Experiment 9 class 130 84
Figure 40: Experiment 9 class 151 85

8
Chapter 1
Introduction
1.1 What is Machine Learning?
Machine learning (ML) is a discipline of artificial intelligence (AI) that
provides machines with the ability to automatically learn from data and
past experiences while identifying patterns to make predictions with
minimal human intervention.

Machine learning methods enable computers to operate autonomously


without explicit programming. ML applications are fed with new data, and
they can independently learn, grow, develop, and adapt.

Machine learning derives insightful information from large volumes of


data by leveraging algorithms to identify patterns and learn in an iterative
process. ML algorithms use computation methods to learn directly from
data instead of relying on any predetermined equation that may serve as a
model.

The performance of ML algorithms adaptively improves with an increase


in the number of available samples during the ‘learning’ processes. For
example, deep learning is a sub-domain of machine learning that trains
computers to imitate natural human traits like learning from examples. It
offers better performance parameters than conventional ML algorithms.

9
While machine learning is not a new concept – dating back to World War
II when the Enigma Machine was used – the ability to apply complex
mathematical calculations automatically to growing volumes and varieties
of available data is a relatively recent development.

Today, with the rise of big data, IoT, and ubiquitous computing, machine
learning has become essential for solving problems across numerous areas,
such as:

• Computational finance (credit scoring, algorithmic trading)


• Computer vision (facial recognition, motion tracking, object
detection)
• Computational biology (DNA sequencing, brain tumor detection,
drug discovery)
• Automotive, aerospace, and manufacturing (predictive
maintenance)
• Natural language processing (voice recognition)

10
1.2 How does machine learning works?
Machine learning algorithms are molded on a training dataset to create a
model. As new input data is introduced to the trained ML algorithm, it
uses the developed model to make a prediction.

11
1.3 Why is Machine learning important?
Machine learning is important because it gives companies and
enterprises a view of trends in customer behavior and business
operational patterns, as well as supports the development of new
products. Many of today's leading companies, such as Facebook,
Google and Uber, make machine learning the center part of their
operations. It has become a significant competitive differentiator for
many companies.
1.4 Types of Machine Learning
Machine learning algorithms can be trained in many ways, with each
method having its pros and cons. Based on these methods and ways of
learning, machine learning is broadly categorized into four main types:

12
1. Supervised machine learning
This type of ML involves supervision, where machines are trained
on labeled datasets and enabled to predict outputs based on the
provided training.
The labeled dataset specifies that some input and output
parameters are already mapped. Hence, the machine is trained
with the input and corresponding output. A device is made to
predict the outcome using the test dataset in subsequent phases.
For example, consider an input dataset of parrot and crow images.
Initially, the machine is trained to understand the pictures,
including the parrot and crow’s color, eyes, shape, and size. Post-
training, an input picture of a parrot is provided, and the machine is
expected to identify the object and predict the output.
The trained machine checks for the various features of the object,
such as color, eyes, shape, etc., in the input picture, to make a
final prediction.
This is the process of object identification in supervised machine
learning.

13
The primary objective of the supervised learning technique is to map the
input variable (a) with the output variable (b). Supervised machine
learning is further classified into two broad categories:
• Classification: These refer to algorithms that address
classification problems where the output variable is categorical;
for example, yes or no, true or false, male or female, etc. Real-
world applications of this category are evident in spam
detection and email filtering.

Some known classification algorithms include the Random


Forest Algorithm, Decision Tree Algorithm, Logistic
Regression Algorithm, and Support Vector Machine
Algorithm.

• Regression: Regression algorithms handle regression problems


where input and output variables have a linear relationship.
These are known to predict continuous output variables.
Examples include weather prediction, market trend analysis,
etc.

Popular regression algorithms include the Simple Linear


Regression Algorithm, Multivariate Regression Algorithm,
Decision Tree Algorithm, and Lasso Regression.

14
Problems and Issues in Supervised learning:
Before we get started, we must know about how to pick a good machine
learning algorithm for the given dataset.
To intelligently pick an algorithm to use for a supervised learning task,
we must consider the following factors:
1. Heterogeneity of Data: Many algorithms like neural networks
and support vector machines like their feature vectors to be
homogeneous numeric and normalized. The algorithms that
employ distance metrics are very sensitive to this, and hence if
the data is heterogeneous, these methods should be the
afterthought. Decision Trees can handle heterogeneous data
very easily.

2. Redundancy of Data: If the data contains redundant


information, i.e. contain highly correlated values, then it’s
useless to use distance based methods because of numerical
instability. In this case, some sort of Regularization can be
employed to the data to prevent this situation.

3. Dependent Features: If there is some dependence between


the feature vectors, then algorithms that monitor complex
interactions like Neural Networks and Decision Trees fare better
than other algorithms.

15
4. Bias-Variance Tradeoff: A learning algorithm is biased for a
particular input x if, when trained on each of these data sets, it
is systematically incorrect when predicting the correct output for
x, whereas a learning algorithm has high variance for a
particular input x if it predicts different output values when
trained on different training sets. The prediction error of a
learned classifier can be related to the sum of bias and variance
of the learning algorithm, and neither can be high as they will
make the prediction error to be high. A key feature of machine
learning algorithms is that they are able to tune the balance
between bias and variance automatically, or by manual tuning
using bias parameters, and using such algorithms will resolve
this situation.

5. Curse of Dimensionality: If the problem has an input space


that has a large number of dimensions, and the problem only
depends on a subspace of the input space with small
dimensions, the machine learning algorithm can be confused by
the huge number of dimensions and hence the variance of the
algorithm can be high. In practice, if the data scientist can
manually remove irrelevant features from the input data, this is
likely to improve the accuracy of the learned function. In
addition, there are many algorithms for feature selection that
seek to identify the relevant features and discard the irrelevant
ones, for instance Principle Component Analysis for
unsupervised learning. This reduces the dimensionality.

16
6. Overfitting: The programmer should know that there is a
possibility that the output values may constitute of an inherent
noise which is the result of human or sensor errors. In this case,
the algorithm must not attempt to infer the function that exactly
matches all the data. Being too careful in fitting the data can
cause overfitting, after which the model will answer perfectly for
all training examples but will have a very high error for unseen
samples. A practical way of preventing this is stopping the
learning process prematurely, as well as applying filters to the
data in the pre-learning phase to remove noises. Only after
considering all these factors can we pick a supervised learning
algorithm that works for the dataset we are working on. For
example, if we were working with a dataset consisting of
heterogeneous data, then decision trees would fare better than
other algorithms. If the input space of the dataset we were
working on had 1000 dimensions, then it’s better to first perform
PCA on the data before using a supervised learning algorithm
on it.

Disadvantages of supervised learning


• Classes may not match spectral classes.
• Varying consistency in classes.
• Cost and time are involved in selecting training data.

17
2. Unsupervised machine learning
Unsupervised learning refers to a learning technique that’s devoid
of supervision. Here, the machine is trained using an unlabeled
dataset and is enabled to predict the output without any
supervision.
An unsupervised learning algorithm aims to group the unsorted
dataset based on the input’s similarities, differences, and patterns.
For example, consider an input dataset of images of a fruit-filled
container. Here, the images are not known to the machine learning
model. When we input the dataset into the ML model, the task of
the model is to identify the pattern of objects, such as color, shape,
or differences seen in the input images and categorize them.
Upon categorization, the machine then predicts the output as it
gets tested with a test dataset.
Unsupervised machine learning is further classified into two types:
• Clustering: The clustering technique refers to grouping
objects into clusters based on parameters such as
similarities or differences between objects. For example,
grouping customers by the products they purchase.
Some known clustering algorithms include the K-Means
Clustering Algorithm, Mean-Shift Algorithm, DBSCAN
Algorithm, Principal Component Analysis, and Independent
Component Analysis.
• Association: Association learning refers to identifying typical
relations between the variables of a large dataset. It
determines the dependency of various data items and maps
associated variables. Typical applications include web usage
mining and market data analysis.
Popular algorithms obeying association rules include the
Apriori Algorithm, Eclat Algorithm, and FP-Growth Algorithm.

18
Disadvantages of unsupervised learning
• The spectral classes do not necessarily represent the features on
the ground.
• It does not consider spatial relationships in the data.
• It can take time to interpret the spectral classes.

3. Semi-supervised learning
Semi-supervised learning comprises characteristics of both
supervised and unsupervised machine learning. It uses the
combination of labeled and unlabeled datasets to train its
algorithms. Using both types of datasets, semi-supervised learning
overcomes the drawbacks of the options mentioned above.
Consider an example of a college student.
A student learning a concept under a teacher’s supervision in
college is termed supervised learning. In unsupervised learning, a
student self-learns the same concept at home without a teacher’s
guidance. Meanwhile, a student revising the concept after learning
under the direction of a teacher in college is a semi-supervised
form of learning.
Disadvantages of semi-supervised
• Iteration results are not stable.
• It is not applicable to network-level data.
• It has low accuracy.

19
4. Reinforcement learning
Reinforcement learning is a feedback-based process. Here, the AI
component automatically takes stock of its surroundings by the hit
& trial method, takes action, learns from experiences, and
improves performance.
The component is rewarded for each good action and penalized
for every wrong move. Thus, the reinforcement learning
component aims to maximize the rewards by performing good
actions.
Unlike supervised learning, reinforcement learning lacks labeled
data, and the agents learn via experiences only. Consider video
games.
Here, the game specifies the environment, and each move of the
reinforcement agent defines its state. The agent is entitled to
receive feedback via punishment and rewards, thereby affecting
the overall game score.
The ultimate goal of the agent is to achieve a high score.
Reinforcement learning is applied across different fields such as
game theory, information theory, and multi-agent systems.
Reinforcement learning is further divided into two types of methods
or algorithms:
o Positive reinforcement learning: This refers to adding a
reinforcing stimulus after a specific behavior of the agent,
which makes it more likely that the behavior may occur again
in the future, e.g., adding a reward after a behavior.
o Negative reinforcement learning: Negative reinforcement
learning refers to strengthening a specific behavior that
avoids a negative outcome.

20
Disadvantages of reinforcement learning
• Too much reinforcement learning can lead to an overload of states
which can diminish the results.
• This algorithm is not preferable for solving simple problems.
• This algorithm needs a lot of data and a lot of computation.
• The curse of dimensionality limits reinforcement learning for real
physical systems.

1.5 Real-world machine learning use cases

Here are just a few examples of machine learning you might encounter
every day:

Speech recognition: It is also known as automatic speech recognition


(ASR), computer speech recognition, or speech-to-text, and it is a
capability which uses natural language processing (NLP) to process
human speech into a written format. Many mobile devices incorporate
speech recognition into their systems to conduct voice search—e.g.
Siri—or provide more accessibility around texting.

Customer service: Online chatbots are replacing human agents along


the customer journey. They answer frequently asked questions (FAQs)
around topics, like shipping, or provide personalized advice, cross-
selling products or suggesting sizes for users, changing the way we think
about customer engagement across websites and social media
platforms. Examples include messaging bots on e-commerce sites
with virtual agents, messaging apps, such as Slack and Facebook
Messenger, and tasks usually done by virtual assistants and voice
assistants.

Computer vision: This AI technology enables computers and systems


to derive meaningful information from digital images, videos and other
visual inputs, and based on those inputs, it can take action. This ability
to provide recommendations distinguishes it from image recognition
tasks. Powered by convolutional neural networks, computer vision has
applications within photo tagging in social media, radiology imaging in
healthcare, and self-driving cars within the automotive industry.

21
Recommendation engines: Using past consumption behavior data, AI
algorithms can help to discover data trends that can be used to develop
more effective cross-selling strategies. This is used to make relevant
add-on recommendations to customers during the checkout process for
online retailers.

Automated stock trading: Designed to optimize stock portfolios, AI-


driven high-frequency trading platforms make thousands or even millions
of trades per day without human intervention.

22
Chapter 2
Neural Network and Deep Learning
2.1 Neural Networks
A neural network is a series of algorithms that attempt to recognize
underlying relationships in a set of data through a process that imitate
the way the human brain operates. In this sense, neural networks refer
to systems of neurons, either organic or artificial in nature.
For example, when human faces some sort of unexpected event, the
human eye detects the object and sends signals to the brain, then it
responds based on the intensity of the signal. These signals is
transferred to the brain via something called neurons. The neurons take
the signal and pass it by to the next neuron until it reaches the brain.
Neural networks can adapt to changing input; so the network produce
the best possible result without needing to redesign the output criteria.

23
2.1.1 Why are neural networks important?
Neural networks can help computers make intelligent decisions with
limited human assistance. This is because they can learn and model the
relationships between input and output data that are nonlinear and
complex. For instance, they can do the following tasks.
Make generalizations and inferences
Neural networks can comprehend unstructured data and make general
observations without explicit training. For instance, they can recognize
that two different input sentences have a similar meaning:
Can you tell me how to make the payment?
How do I transfer money?
A neural network would know that both sentences mean the same thing.
Or it would be able to broadly recognize that Baxter Road is a place, but
Baxter Smith is a person’s name.
Reveal hidden relationships and patterns
Neural networks can analyze raw data more deeply and reveal new
insights for which they might not have been trained. For example,
consider a pattern recognition neural network that analyses consumer
purchases. By comparing the buying patterns of numerous users, the
neural network can suggest new items that might interest a specific
consumer.
Create autonomous, self-learning systems
Neural networks can learn and improve over time based on user
behavior. For example, consider a neural network that automatically
corrects or suggests words by analyzing your typing behavior. Let us
assume that the model was trained in the English language and can
spell-check English words. However, if you frequently type non-English
words, like danke, the neural network can automatically learn and
correct these words too.
Learn and model highly volatile data
Some datasets, such as loan repayment amounts in a bank, can have
large variations. Neural networks can model such data as well. For
example, they can analyze financial transactions and flag some of them
for fraud detection. They can also process complex data that holds the

24
key to difficult biological problems like protein folding, DNA analysis, and
more.
2.1.2 What are neural networks used for?
Neural networks have several use cases across many industries, such
as the following:
• Medical diagnosis by medical image classification
• Targeted marketing by social network filtering and behavioral data
analysis
• Financial predictions by processing historical data of financial
instruments
• Electrical load and energy demand forecasting
• Process and quality control
• Chemical compound identification
We give four of the important applications of neural networks below.

Computer vision
Computer vision is the ability of computers to extract information and
insights from images and videos. With neural networks, computers can
distinguish and recognize images similar to humans. Computer vision
has several applications, such as the following:
Visual recognition in self-driving cars so they can recognize road signs
and other road users.
Content moderation to automatically remove unsafe or inappropriate
content from image and video archives.
Facial recognition to identify faces and recognize attributes like open
eyes, glasses, and facial hair.
Image labeling to identify brand logos, clothing, safety gear, and other
image details.

25
Speech recognition
Neural networks can analyze human speech despite varying speech
patterns, pitch, tone, language, and accent. Virtual assistants like
Amazon Alexa and automatic transcription software use speech
recognition to do tasks like these:
Assist call center agents and automatically classify calls
Convert clinical conversations into documentation in real time
Accurately subtitle videos and meeting recordings for wider content
reach
Natural language processing
Natural language processing (NLP) is the ability to process natural,
human-created text. Neural networks help computers gather insights and
meaning from text data and documents. NLP has several use cases,
including in these functions:
Automated virtual agents and chatbots
Automatic organization and classification of written data
Business intelligence analysis of long-form documents like emails and
forms
Indexing of key phrases that indicate sentiment, like positive and
negative comments on social media
Document summarization and article generation for a given topic
Recommendation engines
Neural networks can track user activity to develop personalized
recommendations. They can also analyze all user behavior and discover
new products or services that interest a specific user. For example,
Curalate, a Philadelphia-based startup, helps brands convert social
media posts into sales. Brands use Curalate’s intelligent product tagging
(IPT) service to automate the collection and curation of user-generated
social content. IPT uses neural networks to automatically find and
recommend products relevant to the user’s social media activity.
Consumers don't have to hunt through online catalogs to find a specific
product from a social media image. Instead, they can use Curalate’s
auto product tagging to purchase the product with ease.

26
2.1.3 How do neural networks work?
The human brain is the inspiration behind neural network architecture.
Human brain cells, called neurons, form a complex, highly
interconnected network and send electrical signals to each other to help
humans process information. Similarly, an artificial neural network is
made of artificial neurons that work together to solve a problem. Artificial
neurons are software modules, called nodes, and artificial neural
networks are software programs or algorithms that, at their core, use
computing systems to solve mathematical calculations.
Simple neural network architecture
A basic neural network has interconnected artificial neurons in three
layers:
Input Layer
Information from the outside world enters the artificial neural network
from the input layer. Input nodes process the data, analyze or categorize
it, and pass it on to the next layer.
Hidden Layer
Hidden layers take their input from the input layer or other hidden layers.
Artificial neural networks can have a large number of hidden layers.
Each hidden layer analyzes the output from the previous layer,
processes it further, and passes it on to the next layer.
Output Layer
The output layer gives the final result of all the data processing by the
artificial neural network. It can have single or multiple nodes. For
instance, if we have a binary (yes/no) classification problem, the output
layer will have one output node, which will give the result as 1 or 0.
However, if we have a multi-class classification problem, the output layer
might consist of more than one output node.

27
2.2 Traditional Neural Networks
2.2.1Warren Mcculloch
In the 1943 paper McCulloch and Pitts attempted to demonstrate that
a Turing machine program could be implemented in a finite network
of formal neurons (in the event, the Turing Machine contains their
model of the brain), that the neuron was the base logic unit of the
brain.
In the 1947 paper they offered approaches to designing "nervous
nets" to recognize visual inputs despite changes in orientation or size.
From 1952 McCulloch worked at the Research Laboratory of
Electronics at MIT, working primarily on neural network modelling. His
team examined the visual system of the frog in consideration of
McCulloch's 1947 paper, discovering that the eye provides the brain
with information that is already, to a degree, organized and
interpreted, instead of simply transmitting an image.
2.2.2 Perceptron
Perceptron was introduced by Frank Rosenblatt in 1957. He
proposed a Perceptron learning rule based on the original MCP
neuron. A Perceptron is an algorithm for supervised learning of binary
classifiers. This algorithm enables neurons to learn and processes
elements in the training set one at a time.
What is the Perceptron model in Machine Learning?
Perceptron is Machine Learning algorithm for supervised learning of
various binary classification tasks.
Further, Perceptron is also understood as an Artificial Neuron or neural
network unit that helps to detect certain input data computations in
business intelligence.
Perceptron model is also treated as one of the best and simplest types
of Artificial Neural networks.
However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four
main parameters, i.e., input values, weights and Bias, net sum, and an
activation function.

28
Basic Components of Perceptron
Mr. Frank Rosenblatt invented the perceptron model as a binary
classifier which contains three main components. These are as follows:

• Input Nodes or Input Layer:


This is the primary component of Perceptron which accepts the
initial data into the system for further processing. Each input node
contains a real numerical value.
• Wight and Bias:
Weight parameter represents the strength of the connection
between units. This is another most important parameter of
Perceptron components.
Weight is directly proportional to the strength of the associated
input neuron in deciding the output. Further, Bias can be
considered as the line of intercept in a linear equation.
• Activation Function:
These are the final and important components that help to
determine whether the neuron will fire or not. Activation Function
can be considered primarily as a step function.

29
How does Perceptron work?
In Machine Learning, Perceptron is considered as a single-layer neural
network that consists of four main parameters named input values (Input
nodes), weights and Bias, net sum, and an activation function.
The perceptron model begins with the multiplication of all input values
and their weights, then add these values together to create the weighted
sum.
Then this weighted sum is applied to the activation function 'f' to obtain
the desired output. This activation function is also known as the step
function and is represented by 'f'.

30
There are two types of Perceptrons: Single layer and Multilayer.

Single Layer Perceptron Model:


This is one of the easiest Artificial neural networks (ANN) types.
A single-layered perceptron model consists feed-forward network and
also includes a threshold transfer function inside the model.
The main objective of the single-layer perceptron model is to analyze the
linearly separable objects with binary outcomes.
In a single layer perceptron model, its algorithms do not contain
recorded data, so it begins with inconstantly allocated input for weight
parameters.
Further, it sums up all inputs (weight). After adding all inputs, if the total
sum of all inputs is more than a pre-determined value, the model gets
activated and shows the output value as +1.
If the outcome is same as pre-determined or threshold value, then the
performance of this model is stated as satisfied, and weight demand
does not change.
However, this model consists of a few discrepancies triggered when
multiple weight inputs values are fed into the model. Hence, to find
desired output and minimize errors, some changes should be necessary
for the weights input.
"Single-layer perceptron can learn only linearly separable patterns."

31
Multi-Layered Perceptron Model:
Like a single-layer perceptron model, a multi-layer perceptron model
also has the same model structure but has a greater number of hidden
layers.
The multi-layer perceptron model is also known as the Backpropagation
algorithm, which executes in two stages as follows:
Forward Stage: Activation functions start from the input layer in the
forward stage and terminate on the output layer.
Backward Stage: In the backward stage, weight and bias values are
modified as per the model's requirement. In this stage, the error between
actual output and demanded originated backward on the output layer
and ended on the input layer.
Hence, a multi-layered perceptron model has considered as multiple
artificial neural networks having various layers in which activation
function does not remain linear, similar to a single layer perceptron
model. Instead of linear, activation function can be executed as sigmoid,
TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can
process linear and non-linear patterns. Further, it can also implement
logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR.
Advantages of Multi-Layer Perceptron:
A multi-layered perceptron model can be used to solve complex non-
linear problems.
It works well with both small and large input data.
It helps us to obtain quick predictions after the training.
It helps to obtain the same accuracy ratio with large as well as small
data.
Disadvantages of Multi-Layer Perceptron:
In Multi-layer perceptron, computations are difficult and time-consuming.
In multi-layer Perceptron, it is difficult to predict how much the dependent
variable affects each independent variable.
The model functioning depends on the quality of the training.

32
Characteristics of Perceptron
The perceptron model has the following characteristics.
• Perceptron is a machine learning algorithm for supervised learning
of binary classifiers.
• In Perceptron, the weight coefficient is automatically learned.
• Initially, weights are multiplied with input features, and the decision
is made whether the neuron is fired or not.
• The activation function applies a step rule to check whether the
weight function is greater than zero.
• The linear decision boundary is drawn, enabling the distinction
between the two linearly separable classes +1 and -1.
• If the added sum of all input values is more than the threshold
value, it must have an output signal; otherwise, no output will be
shown.

33
2.2.3 ANN
What is Artificial Neural Network?
The term "Artificial Neural Network" is derived from Biological neural
networks that develop the structure of a human brain. Similar to the
human brain that has neurons interconnected to one another, artificial
neural networks also have neurons that are interconnected to one
another in various layers of the networks. These neurons are known as
nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.

34
Dendrites from Biological Neural Network represent inputs in Artificial
Neural Networks, cell nucleus represents Nodes, synapse represents
Weights, and Axon represents Output.
An Artificial Neural Network in the field of Artificial intelligence where
it attempts to mimic the network of neurons makes up a human brain so
that computers will have an option to understand things and make
decisions in a human-like manner.
The artificial neural network is designed by programming computers to
behave simply like interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron
has an association point somewhere in the range of 1,000 and 100,000.
In the human brain, data is stored in such a manner as to be distributed,
and we can extract more than one piece of this data when necessary
from our memory parallelly.
We can say that the human brain is made up of incredibly amazing
parallel processors.
We can understand the artificial neural network with an example,
consider an example of a digital logic gate that takes an input and gives
an output. "OR" gate, which takes two inputs. If one or both the inputs
are "On," then we get "On" in output. If both the inputs are "Off," then we
get "Off" in output. Here the output depends upon input.
Our brain does not perform the same task. The outputs to inputs
relationship keep changing because of the neurons in our brain, which
are "learning."
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural
network, we have to understand what a neural network consists of. In
order to define a neural network that consists of a large number of
artificial neurons, which are termed units arranged in a sequence of
layers. Let us look at various types of layers available in an artificial
neural network.

35
Artificial Neural Network primarily consists of three layers:

Input Layer:
As the name suggests, it accepts inputs in several different formats
provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It
performs all the calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden
layer, which finally results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum
of the inputs and includes a bias. This computation is represented in the
form of a transfer function.

It determines weighted total is passed as an input to an activation


function to produce the output. Activation functions choose whether a
node should fire or not. Only those who are fired make it to the output
layer. There are distinctive activation functions available that can be
applied upon the sort of task we are performing.

36
Advantages of Artificial Neural Networks (ANN)

1. Problems in ANN are represented by attribute-value pairs.


2. ANNs are used for problems having the target function, the output
may be discrete-valued, real-valued, or a vector of several real or
discrete-valued attributes.
3. ANN learning methods are quite robust to noise in the training
data. The training examples may contain errors, which do not
affect the final output.
4. It is used where the fast evaluation of the learned target function
required.
5. ANNs can bear long training times depending on factors such as
the number of weights in the network, the number of training
examples considered, and the settings of various learning
algorithm parameters.

Disadvantages of Artificial Neural Networks (ANN)

1. Hardware Dependence:
1. Artificial Neural Networks require processors with parallel
processing power, by their structure.
2. For this reason, the realization of the equipment is
dependent.
2. Unexplained functioning of the network:
1. This the most important problem of ANN.
2. When ANN gives a probing solution, it does not give a clue
as to why and how.
3. This reduces trust in the network.

3. Assurance of proper network structure:


1. There is no specific rule for determining the structure of
artificial neural networks.
2. The appropriate network structure is achieved through
experience and trial and error.

37
4. The difficulty of showing the problem to the network:
1. ANNs can work with numerical information.
2. Problems have to be translated into numerical values before
being introduced to ANN.
3. The display mechanism to be determined will directly
influence the performance of the network.
4. This is dependent on the user's ability.

5. The duration of the network is unknown:


1. The network is reduced to a certain value of the error on the
sample means that the training has been completed.
2. The value does not give us optimum results.

38
How do artificial neural networks work?
Artificial Neural Network can be best represented as a weighted directed
graph, where the artificial neurons form the nodes.
The association between the neurons outputs and neuron inputs can be
viewed as the directed edges with weights.
The Artificial Neural Network receives the input signal from the external
source in the form of a pattern and image in the form of a vector.
These inputs are then mathematically assigned by the notations x(n) for
every n number of inputs.

Afterward, each of the input is multiplied by its corresponding weights


(these weights are the details utilized by the artificial neural networks to
solve a specific problem). In general terms, these weights normally
represent the strength of the interconnection between neurons inside the
artificial neural network. All the weighted inputs are summarized inside
the computing unit.
If the weighted sum is equal to zero, then bias is added to make the
output non-zero or something else to scale up to the system's response.
Bias has the same input, and weight equals to 1. Here the total of
39
weighted inputs can be in the range of 0 to positive infinity. Here, to keep
the response in the limits of the desired value, a certain maximum value
is benchmarked, and the total of weighted inputs is passed through the
activation function.
Types of Artificial Neural Network:
There are various types of Artificial Neural Networks (ANN) depending
upon the human brain neuron and network functions, an artificial neural
network similarly performs tasks. The majority of the artificial neural
networks will have some similarities with a more complex biological
partner and are very effective at their expected tasks. For example,
segmentation or classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the
best-evolved results internally. As per the University of
Massachusetts, Lowell Centre for Atmospheric Research. The
feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize
feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input
layer, an output layer, and at least one layer of a neuron. Through
assessment of its output by reviewing its input, the intensity of the
network can be noticed based on group behavior of the associated
neurons, and the output is decided. The primary advantage of this
network is that it figures out how to evaluate and recognize input
patterns.

40
2.2.4 Backpropagation

Backpropagation is the essence of neural network training. It is the


method of refining the weights of a neural network based on the error
rate obtained in the previous approaches (i.e., iteration).

Proper tuning of the weights allows you to reduce error rates and make
the model reliable by increasing its generalization.

Backpropagation in neural network is a short form for “backward


propagation of errors.” It is a standard method of training artificial neural
networks. This method helps calculate the mean of a loss function with
respect to all the weights in the network.

Why We Need Backpropagation?


The backpropagation technology helps to adjust the weights of the
network connections to minimize the difference between the actual
output and the desired output of the net, which is calculated as a loss
function.
• Helps to simplify the network structure by removing the weighted
links, so that the trained network will have the minimum effect
• This method is especially applicable in deep neural networks,
which work on error-prone projects like speech and image
recognition.
• It functions with multiple inputs using chain rules and power rules.
• It is used to calculate the gradient of the loss function with respect
to all the weights in the network.
• Minimizes the loss function by updating the weights with the
gradient optimization method.
• Modifies the weights of the connected nodes during the process of
training to produce ‘learning’.
• This method is iterative, recursive, and more efficient.

41
How Backpropagation Algorithm Works
The Back propagation algorithm in neural network calculates the
gradient of the loss function for a single weight by the chain rule.

It efficiently computes one layer at a time, unlike a native direct


computation. It computes the gradient, but it does not define how the
gradient is used. It generalizes the computation in the delta rule.

Consider the following Back propagation neural network example


diagram to understand:

1. Inputs X, arrive through the pre-connected path


2. Input is modeled using real weights W. The weights are usually
randomly selected.
3. Calculate the output for every neuron from the input layer, to the
hidden layers, to the output layer.
4. Calculate the error in the outputs :

ErrorB= Actual Output – Desired Output

5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.

Keep repeating the process until the desired output is achieved.

42
Types of Backpropagation Networks
• Static Back-propagation
• Recurrent Backpropagation

Static back-propagation:
It is one kind of backpropagation network which produces a
mapping of a static input for static output. It is useful to solve static
classification issues like optical character recognition.

Recurrent Backpropagation:
Recurrent Back propagation in data mining is fed forward until a
fixed value is achieved. After that, the error is computed and
propagated backward.

The main difference between both of these methods is: that the
mapping is rapid in static back-propagation while it’s non-static in
recurrent backpropagation.

43
Advantages of Backpropagation:
• Backpropagation is fast, simple and easy to program
• It has no parameters to tune apart from the numbers of input
• It is a flexible method as it does not require prior knowledge about
the network
• It is a standard method that generally works well
• It does not need any special mention of the features of the function
to be learned.

Disadvantages of using Backpropagation

• The actual performance of backpropagation on a specific problem


is dependent on the input data.
• Back propagation algorithm in data mining can be quite sensitive
to noisy data
• You need to use the matrix-based approach for backpropagation
instead of mini-batch.

44
2.3 Deep Learning
Deep learning is a subset of machine learning, which is essentially a
neural network with three or more layers.
These neural networks attempt to simulate the behavior of the human
brain—albeit far from matching its ability—allowing it to “learn” from large
amounts of data.
While a neural network with a single layer can still make approximate
predictions, additional hidden layers can help to optimize and refine for
accuracy.
Deep learning drives many artificial intelligence (AI) applications and
services that improve automation, performing analytical and physical
tasks without human intervention.
Deep learning technology lies behind everyday products and services
(such as digital assistants, voice-enabled TV remotes, and credit card
fraud detection) as well as emerging technologies (such as self-driving
cars).
2.3.1 How deep learning works
Deep learning neural networks, or artificial neural networks, attempts to
mimic the human brain through a combination of data inputs, weights,
and bias. These elements work together to accurately recognize,
classify, and describe objects within the data.
Deep neural networks consist of multiple layers of interconnected nodes,
each building upon the previous layer to refine and optimize the
prediction or categorization.
This progression of computations through the network is called forward
propagation. The input and output layers of a deep neural network are
called visible layers.
The input layer is where the deep learning model ingests the data for
processing, and the output layer is where the final prediction or
classification is made.

45
2.3.2 Deep learning applications

Real-world deep learning applications are a part of our daily lives, but in
most cases, they are so well-integrated into products and services that
users are unaware of the complex data processing that is taking place in
the background. Some of these examples include the following:

Law enforcement

Deep learning algorithms can analyze and learn from transactional data
to identify dangerous patterns that indicate possible fraudulent or
criminal activity. Speech recognition, computer vision, and other deep
learning applications can improve the efficiency and effectiveness of
investigative analysis by extracting patterns and evidence from sound
and video recordings, images, and documents, which helps law
enforcement analyze large amounts of data more quickly and accurately.

Financial services

Financial institutions regularly use predictive analytics to drive


algorithmic trading of stocks, assess business risks for loan approvals,
detect fraud, and help manage credit and investment portfolios for
clients.

46
Customer service

Many organizations incorporate deep learning technology into their


customer service processes. Chatbots—used in a variety of applications,
services, and customer service portals—are a straightforward form of AI.
Traditional chatbots use natural language and even visual recognition,
commonly found in call center-like menus. However, more sophisticated
chatbot solutions attempt to determine, through learning, if there are
multiple responses to ambiguous questions. Based on the responses it
receives, the chatbot then tries to answer these questions directly or
route the conversation to a human user.

Virtual assistants like Apple's Siri, Amazon Alexa, or Google Assistant


extends the idea of a chatbot by enabling speech recognition
functionality. This creates a new method to engage users in a
personalized way.

Healthcare

The healthcare industry has benefited greatly from deep learning


capabilities ever since the digitization of hospital records and images.
Image recognition applications can support medical imaging specialists
and radiologists, helping them analyze and assess more images in less
time.

47
2.4 CNN
CNN deals with larger images better than other neural networks where it
takes the image as an input and applies filter on image to reduce the
size of the image; by detecting the most important (unique) features of
the image. It keeps doing the same thing until it gets all the unique
features. The filter size is usually a 3x3 matrix (can be bigger).
The CNN are distinguished from other neural networks by their superior
performance with image.

They have three main types of layers:


1- Convolutional layer
2- Pooling layer
3- Fully-connected (FC) layer

2.4.1 Convolution Layer


The convolution layer is the core building block of the CNN. It carries the
main portion of the network’s computational load.
This layer performs a dot product between two matrices, where one
matrix is the set of learnable parameters otherwise known as a kernel,
and the other matrix is the restricted portion of the receptive field.
The kernel is spatially smaller than an image but is more in-depth. This
means that, if the image is composed of three (RGB) channels, the
kernel height and width will be spatially small, but the depth extends up
to all three channels.

48
During the forward pass, the kernel slides across the height and width of
the image-producing the image representation of that receptive region.
This produces a two-dimensional representation of the image known as
an activation map that gives the response of the kernel at each spatial
position of the image.
The sliding size of the kernel is called a stride.
If we have an input of size W x W x D and Dout number of kernels with a
spatial size of F with stride S and amount of padding P, then the size of
output volume can be determined by the following formula:

This will yield an output volume of size Wout x Wout x Dout.

49
50
2.4.2 Motivation behind Convolution
Convolution leverages three important ideas that motivated computer
vision researchers: sparse interaction, parameter sharing, and
equivariant representation. Let’s describe each one of them in detail.
Trivial neural network layers use matrix multiplication by a matrix of
parameters describing the interaction between the input and output unit.
This means that every output unit interacts with every input unit.
However, convolution neural networks have sparse interaction. This is
achieved by making kernel smaller than the input e.g., an image can
have millions or thousands of pixels, but while processing it using kernel
we can detect meaningful information that is of tens or hundreds of
pixels.
This means that we need to store fewer parameters that not only
reduces the memory requirement of the model but also improves the
statistical efficiency of the model.
If computing one feature at a spatial point (x1, y1) is useful then it should
also be useful at some other spatial point say (x2, y2).
It means that for a single two-dimensional slice i.e., for creating one
activation map, neurons are constrained to use the same set of weights.
In a traditional neural network, each element of the weight matrix is used
once and then never revisited, while convolution network has shared
parameters i.e., for getting output, weights applied to one input are the
same as the weight applied elsewhere.
Due to parameter sharing, the layers of convolution neural network will
have a property of equivariance to translation. It says that if we changed
the input in a way, the output will also get changed in the same way.

51
2.4.3 Pooling Layer
The pooling layer replaces the output of the network at certain locations
by deriving a summary statistic of the nearby outputs.
This helps in reducing the spatial size of the representation, which
decreases the required amount of computation and weights.
The pooling operation is processed on every slice of the representation
individually.
There are several pooling functions such as the average of the
rectangular neighborhood, L2 norm of the rectangular neighborhood,
and a weighted average based on the distance from the central pixel.
However, the most popular process is max pooling, which reports the
maximum output from the neighborhood.

If we have an activation map of size W x W x D, a pooling kernel of


spatial size F, and stride S, then the size of output volume can be
determined by the following formula:

52
This will yield an output volume of size Wout x Wout x D.
In all cases, pooling provides some translation invariance which means
that an object would be recognizable regardless of where it appears on
the frame.
2.4.4 Fully Connected Layer
Neurons in this layer have full connectivity with all neurons in the
preceding and succeeding layer as seen in regular FCNN. This is why it
can be computed as usual by a matrix multiplication followed by a bias
effect.
The FC layer helps to map the representation between the input and the
output.

2.4.5 Non-Linearity Layers


Since convolution is a linear operation and images are far from linear,
non-linearity layers are often placed directly after the convolutional layer
to introduce non-linearity to the activation map.
There are several types of non-linear operations, the popular ones
being:
2.4.6 Sigmoid
The sigmoid non-linearity has the mathematical form σ(κ) = 1/(1+e¯κ). It
takes a real-valued number and “squashes” it into a range between 0
and 1.
However, a very undesirable property of sigmoid is that when the
activation is at either tail, the gradient becomes almost zero. If the local
gradient becomes very small, then in backpropagation it will effectively
“kill” the gradient. Also, if the data coming into the neuron is always
positive, then the output of sigmoid will be either all positives or all
negatives, resulting in a zig-zag dynamic of gradient updates for weight.
2.4.7 Tanh
Tanh squashes a real-valued number to the range [-1, 1]. Like sigmoid,
the activation saturates, but — unlike the sigmoid neurons — its output
is zero centered.

53
2.4.8 ReLU
The Rectified Linear Unit (ReLU) has become very popular in the last
few years. It computes the function ƒ(κ)=max (0,κ). In other words, the
activation is simply threshold at zero.
In comparison to sigmoid and tanh, ReLU is more reliable and
accelerates the convergence by six times.
Unfortunately, a con is that ReLU can be fragile during training. A large
gradient flowing through it can update it in such a way that the neuron
will never get further updated. However, we can work with this by setting
a proper learning rate.

2.5 Face Recognition


2.5.1 What is Facial Recognition?
Facial recognition is a way of identifying or confirming an individual’s
identity using their face. Facial recognition systems can be used to
identify people in photos, videos, or in real-time.

Facial recognition is a category of biometric security. Other forms of


biometric software include voice recognition, fingerprint recognition, and
eye retina or iris recognition. The technology is mostly used for security
and law enforcement, though there is increasing interest in other areas
of use.

Many people are familiar with face recognition technology through the
FaceID used to unlock iPhones (however, this is only one application of
face recognition). Typically, facial recognition does not rely on a massive
database of photos to determine an individual’s identity — it simply
identifies and recognizes one person as the sole owner of the device,
while limiting access to others.

Beyond unlocking phones, facial recognition works by matching the


faces of people walking past special cameras, to images of people on a
watch list. The watch lists can contain pictures of anyone, including
people who are not suspected of any wrongdoing, and the images can
come from anywhere — even from our social media accounts.

54
Facial technology systems can vary, but in general, they tend to operate
as follows:

Step 1: Face detection

The camera detects and locates the image of a face, either alone or in a
crowd. The image may show the person looking straight ahead or in
profile.

Step 2: Face analysis

Next, an image of the face is captured and analyzed. Most facial


recognition technology relies on 2D rather than 3D images because it
can more conveniently match a 2D image with public photos or those in
a database. The software reads the geometry of your face. Key factors
include the distance between your eyes, the depth of your eye sockets,
the distance from forehead to chin, the shape of your cheekbones, and
the contour of the lips, ears, and chin. The aim is to identify the facial
landmarks that are key to distinguishing your face.

Step 3: Converting the image to data

The face capture process transforms analog information (a face) into a


set of digital information (data) based on the person's facial features.
Your face's analysis is essentially turned into a mathematical formula.
The numerical code is called a faceprint. In the same way that
thumbprints are unique, each person has their own faceprint.

Step 4: Finding a match


Your faceprint is then compared against a database of other known
faces. For example, the FBI has access to up to 650 million photos,
drawn from various state databases. On Facebook, any photo tagged
with a person’s name becomes a part of Facebook's database, which
may also be used for facial recognition. If your faceprint matches an
image in a facial recognition database, then a determination is made.
Of all the biometric measurements, facial recognition is considered the
most natural. Intuitively, this makes sense, since we typically recognize
ourselves and others by looking at faces, rather than thumbprints and
irises. It is estimated that over half of the world's population is touched
by facial recognition technology regularly.

55
2.8.2 Face Recognition Techniques:
Face recognition is a challenging yet interesting problem that it has
attracted researchers who have different backgrounds like psychology,
pattern recognition, neural networks, computer vision, and computer
graphics.

56
Holistic matching

In this approach, complete face region is taken into account as input


data into face catching system. One of the best example of holistic
methods are Eigenfaces, PCA, Linear Discriminant Analysis and
independent component analysis etc.

This approach covers face recognition as a two-dimensional recognition


problem.
Insert a set of images into a database, these images are named as the
training set because they will be used when we compare images and
create the eigenfaces.
Eigenfaces are made by extracting characteristic features from the
faces. The input images are normalized to line up the eyes and mouths.
Then they are resized so that they have the same size. Eigenfaces can
now be extracted from the image data by using a mathematical tool
called PCA.
Now each image will be represented as a vector of weights. System is
now ready to accept queries. The weight of the incoming unknown
image is found and then compared to the weights of already present
images in the system.
If the input image's weight is over a given threshold it is considered to be
unidentified. The identification of the input image is done by finding the
image in the database whose weights are the closest to the weights of
the input image.
The image in the database with the closest weight will be returned as a
hit to the user.

57
Feature-based

Here local features such as eyes, nose, and mouth are first of all
extracted and their locations, geometry and appearance are fed into a
structural classifier. A challenge for feature extraction methods is feature
"restoration", this is when the system tries to retrieve features that are
invisible due to large variations, e.g. head Pose while matching a frontal
image with a profile image.
Different extraction methods:
Generic methods based on edges, lines, and curves
Feature-template-based methods
Structural matching methods

58
Model Based

The model-based approach tries to model a face. The new sample is


introduced to the model and the parameters of the model are used to
recognize the [Link]-based method can be classified as 2D or
3D.

59
Hybrid Methods

This uses a combination of both holistic and feature extraction methods.


Generally 3D Images are used in these methods. The image of a face is
caught in 3D, to note the curves of the eye sockets, or the shapes of the
chin or forehead. Even a face in profile would serve because the system
uses depth, and an axis of measurement, which gives it enough
information to construct a full face. The 3D system includes Detection,
Position, Measurement, Representation and Matching.
1. Detection - Capturing a face by scanning a photograph or photographing
a person's face in real time.
2. Position - Determining the location, size and angle of the head.
Measurement - Assigning measurements to each curve of the face to
make a template.
3. Representation - Converting the template into a numerical
representation of the face.
4. Matching - Comparing the received data with faces in the database. The
3D image which is to be compared with an existing 3D image, needs to
have no alterations.

60
Chapter 3
Face Recognition by using CNN
3.1 Dataset

Our dataset consists of 153 classes, each class contains 20 images with
the same background but from a different angle with a different facial
reaction.

3.2 Preprocessing
Image enhancement was made to the images to improve the quality of
data and information obtained from them to be able to extract more
features.
Some of the techniques used are image augmentation, rescaling and
resizing.

3.3 Here is how the procedure work:


1. Make sure your data is arranged into a format acceptable for train
test split. In scikit-learn, this consists of separating your full dataset
into Features and Target.
2. Split the dataset into two pieces: a training set and a testing set.
This consists of randomly selecting about 75% (you can vary this)
of the rows and putting them into your training set and putting the
remaining 25% to your test set. Note that the colors in “Features”
and “Target” indicate where their data will go (“X_train”, “X_test”,
“y_train”, “y_test”) for a particular train test split.
3. Train the model on the training set. This is “X_train” and “y_train”
in the image.
4. Test the model on the testing set (“X_test” and “y_test” in the
image) and evaluate the performance.

61
Chapter 4

4.1 Architecture Model

We initially built some models but they weren't able to give us the
required accuracy, for example:

This model was designed using 1 block, and started with 32 filters with
activation function ‘relu’ and ‘softmax.

After many trials and modifications, we were able to come up with the
best model

62
Best Architecture Model:
This model was designed using 3 blocks, and started with 32 filters for
the first block, 64 for the second block and 64 for the third block with
stride (2,2) for each block and activation function ‘relu’ and ‘softmax.

4.2 Experiment 1
Dataset:
First a simple database that contains 113 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 75% of them in the train folder and the rest in the test
folder automatically.

63
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

Results:
1)

Class (86) Accuracy 80%

2)

Class (112) Accuracy 100%

64
4.3 Experiment 2
Dataset:
First a simple database that contains 113 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 50% of them in the train folder and the rest in the test
folder automatically.

Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

65
Results:
1)

Class (108) Accuracy 80%


2)

Class (96) Accuracy 100%


66
4.4 Experiment 3
Dataset:
First a simple database that contains 113 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 25% of them in the train folder and the rest in the test
folder automatically.
Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

67
Results:

1)

Class (60) Accuracy 86.67%

68
2)

Class (44) Accuracy 100%

69
4.5 Experiment 4
Dataset:
First a simple database that contains 132 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 75% of them in the train folder and the rest in the test
folder automatically.

Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

70
Results:

1)

Class (43) Accuracy 80%

2)

Class (10) Accuracy 100%

71
4.6 Experiment 5
Dataset:
First a simple database that contains 132 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 50% of them in the train folder and the rest in the test
folder automatically.

Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

72
Results:

1)

Class (51) Accuracy 70%

73
2)

Class (30) Accuracy 100%

74
4.7 Experiment 6
Dataset:
First a simple database that contains 132 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 25% of them in the train folder and the rest in the test
folder automatically.

Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

75
Results:

1)

Class (83) Accuracy 80%

76
2)

Class (121) Accuracy 100%

77
4.8 Experiment 7
Dataset:
First a simple database that contains 152 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 75% of them in the train folder and the rest in the test
folder automatically.

Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

78
Results:

1)

Class (34) Accuracy 80%

2)

Class (23) Accuracy 100%

79
4.9 Experiment 8
Dataset:
First a simple database that contains 152 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 50% of them in the train folder and the rest in the test
folder automatically.

Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

80
Results:

1)

Class (3) Accuracy 90%

81
2)

Class (84) Accuracy 100%

82
4.10 Experiment 9
Dataset:
First a simple database that contains 152 person is used, each person
has 20 images with different positions. The images are divided into two
folders; train and test.
A simple function was implemented to create train and test folders and
inside each folder a folder for each person.
Another function was implemented to divide the images for each person
randomly with 25% of them in the train folder and the rest in the test
folder automatically.

Preprocessing:
We used the image data generator to perform some augmentations.
Each person is categorized into a class with target size = 256 x 256 and
batch size = 10.
We pushed the images into a NumPy array to make it easier to deal with
the values obtained from the images when dealing with the prediction
method.
Train test split is a model validation procedure that allows you to
simulate how a model would perform on new/unseen data.

83
Results:

1)

Class (130) Accuracy 80%

84
2)

Class (151) Accuracy 100%

85
Chapter 5
5.1 Problems faced throughout the project:
1. Installing the required libraries with the version needed for our
system
2. Kernel dead: when we used a version other than python
version 3.6
3. Overfitting: We solved this problem by adding strides to our
model
4. GPU: Anaconda wasn’t reading the GPU at some of our
laptops

5.2 Summary
To conclude, our project is face recognition with deep learning. For our
model, we have constructed an architecture and trained our dataset with
it with different ‘train’ and ‘test’ percentages until we got the best results
possible.

From our experience with this project for this academic year, we have
figured that deep learning is a technology going through continuous
updates and upgrades. During our research in papers and articles, for
our model, we have looked to find what architecture fits the best with a
certain case. No architecture is agreed upon to be the “best” for specific
case. What happens is that developers experience new techniques and
try several methods to reach to the best result with the resources
available between their hands. And this is what we have tried to do.

Yet, this results in the next conclusion which is no model can reach a
100% accuracy. For this to happen, the dataset entered should cover
every single side of what the model should understand and know. This is
hard, especially that we have only worked with Anaconda, our personal
computers, and lab’s computer provided by the university. The resources
should be very high to handle both the storage and the processing of the
models. However, even if these resources are available and such
dataset exists, no model will reach perfection in its predictions but it will
get faster result and very close to great predictions.

86
5.3 FUTURE WORK
There are some ideas that we want to implement in the future, for
example, we want to make our system able to identify a companion to an
image even if the image is incomplete, and for the model to be able to
identify the owner of the image even if one of the features is hidden such
as the mouth, eyes, and so on. So we want to make some adjustments
so that we can get the desired results with the right accuracy. We can
then use this model for some applications on the ground such as
criminal investigation systems.

87
References
1) Practical Machine Learning and Image Processing - Himanshu
Singh
2) [Link]
processing
3) [Link]
and-why/
4) [Link]
machine-learning
5) [Link]
[Link]
6) [Link]
7) [Link]
facial-recognition
8) [Link]
earning_Project
9) [Link]
of-artificial-neural-networks/
10) [Link]
intelligence/articles/what-is-ml/
11) [Link]
disadvantages-of-different-types-of-machine-learning-algorithms/
12) [Link]
13) [Link]
14) [Link]
15) [Link]
network/
16) [Link]
networks-explained-9cc5188c4939

88

View publication stats

You might also like