1.
INTRODUCTION
Transfer Learning for Recognizing Face in Disguise
In the field of Machine Learning, there are many areas and methods of
recognizing something, whether what is detected is a non-living (object) or living thing
(human, animal, event plant). For biometric identification, the face of a person is often
used in identifying someone else. Humans, can identify someone else from one another
because humans have the memory and brain to process our thinking. But a machine
cannot do that itself, thus arises a field that makes machine thinking, that is Machine
Learning which is pioneered by Arthur Samuel.
In the past decade, numerous approaches method to identifying a person’s faces
which are Eigenfaces and Principal Component Analysis (PCA), to Convolutional
Neural Networks (CNN) which then after that, the ability to recognize face became
higher and higher. Transfer learning is an approach used in machine learning where the
first training task produces a model, then do the second test using the model of the first
training task. Transfer learning differs from traditional machine learning because it
involves using a pre-trained model as a springboard to start a secondary task. With the
many advantages of using CNN, such as transfer learning, for example, CNN has been
widely applied in various fields of research. Which are image classification, pedestrian
detection, object detection, video analysis, food detection, and face recognition.
In this paper, compare some popular Pre-Trained CNN Model Architecture
provided by Keras which is an open-source neural network library written in Python.
The architecture used is VGG16, VGG19, ResNet50, ResNet152 v2, InceptionV3 and
Inception-ResNet V2. Then, divide it into two parts: using the vector to train the
classifier model, and evaluating the accuracy and cost function of the classifier model.
From this research, expected to see the best Pre-Trained Architecture model with the
highest level of accuracy, and the lowest cost function in the optimal hyperparameter
state. The author uses the “Recognizing Disguised Faces” dataset, which is a data set of
75 pictures of a person’s face using a disguised tool like a bandana, masker, fake
mustache, fake beard, glasses, etcetera. Each person in dataset mostly get 7-8 picture as
disguised and last 2 is his/her real face.
2.AIM AND OBJECTIVE
Aim
Testing some popular Convolutional Neural Network (CNN) Model Architecture
to see which one is better to recognize the person face dataset in disguised. The author
uses the “Recognizing Disguised Faces” dataset to distinguish 75 classes of faces, and
then try to train and test how accurate it can be recognized by the machine, where it will
be useful to anyone who needs to explore and develop an Architecture of Deep
Learning. [1]
Objectives
● The main objective of this project is to see the best Pre-Trained Architecture model
with the highest level of accuracy, and the lowest cost function in the optimal
hyperparameter state.
●Less computational power or resources like RAM, CPU, GPU or TPU, etc.
●In less quantity of data, we can achieve more accuracy.
●Face recognition, most relevant applications of image analysis.
● True challenge to build an automated system which equals human ability to recognize
faces.
● Humans are quite good identifying known faces, but not very skilled when large
amount of unknown faces.
● Human face recognition ability helps to develop a non-human face recognition
system.
3. LITERATURE SURVEY
Paper 1
Paper Name : Face Recognition with Convolutional Neural Network and Transfer
Learning.
Authors Name: R.Meena Prakash, N.Thenmoezhi, M.Gayathri
Explanation:
An age - invariant face recognition method using discriminative model with deep
feature training is proposed. In this work, AlexNet is used as the transfer learning CNN
model to learn high level deep features. These features are then encoded using a code
book into a code word with higher dimension for image representation. The encoding
framework ensures similar code word for the same person’s face images photographed
during different time scale. Linear regression-based classifier is used for face recognition
and the method is tested on three datasets including FGNET which are publicly
available. A face recognition system which incorporates the Convolutional neural
network, auto encoder and denoising is proposed which is called Deep Stacked
Denoising Sparse Autoencoders (DS-DSA).
Multiclass Support Vector Machine (SVM) and Softmax classifiers are used for
the classification process. The method is tested on four publicly available datasets
including ORL, Yale and Pubfig. Cross-modality face recognition using deep local
descriptor learning frame work is proposed in which both the compact local information
and discriminant features are learnt from raw facial patches directly. Deep local
descriptors are extracted with Convolutional Neural Network (CNN). The method is
tested on six extensively used face recognition datasets of different modes.
Paper 2
Paper Name : Face Recognition Based on Two-stage CNN Combined with Transfer
Learning
Authors Name: Anqi Zhou, Jianxin Chen, Jie Ding, Zhaolai Pan
Explanation:
In this paper, a method for face recognition under unrestricted environment. The
first stage aims at searching face windows and their bounding box regression vector. The
second merges facial candidates which are covered highly. Then, the network finishes
the task of face detection and alignment whose outputs are face regions and the position
of the five facial landmarks. Experiments prove that our method of face detection and
alignment is better than other methods. It achieves great performance on angle of faces
in image and obtains accuracy of 96.1% on AFW and 98.9% on FDDB. Secondly, faces
are sent to convolutional neural network for training. With very limited training data,
retraining the entire network seems impossible. The result shows that our method
exceeds the traditional algorithms and general CNN in small dataset.
Paper 3
Paper Name : Face Key Point Location Method based on Parallel Convolutional
Neural Network
Authors Name: Zhou Pu, Kai Wang, Kai Yan
Explanation:
Based on convolutional neural network and face detection algorithm, this paper
proposes a training sample expansion strategy, and a parallel convolutional network face
detection algorithm for face features, occlusion and illumination detection, combined
with Relu activation function and Dropout random regularization strategy. Network
training not only speeds up the convergence of the network, but also improves the
generalization ability. On this basis, the software based on face detection and feature
point location is designed to realize the automatic loading of images and the face
recognition function, to achieve accurate positioning of the face points, and to locate
experiments on the LWF face database. The results show that the method is greatly
improved in accuracy and reliability, and it can achieve robust and accurate estimation of
key points.
Comparative Analysis
Table 3.1 Comparative Analysis of Existing System
Sr.
Author Project Title Publication Technology Purpose
No.
Face
An age - invariant
Recognition
face recognition
R.Meena with
method using
Prakash, Convolutiona
1. IEEE, 2019 AlexNet discriminative
N.Thenmoezhi l Neural
model with deep
, M.Gayathri Network and
feature training is
Transfer
proposed.
Learning.
Face
It exceeds the
Anqi Zhou, Recognition
traditional
Jianxin Chen, Based on Two-
algorithms and
Jie Ding, stage CNN
2. IEEE, 2019 ImageNet general
Zhaolai Pan Combined
CNN in small
with Transfer
dataset.
Learning
This paper
Face Key
proposes a training
Point Location
sample expansion
Zhou Pu, Method based
strategy, and a
Kai Wang, on Parallel
3. IICSPI, 2019 ImageNet parallel
Kai Yan Convolutional
convolutional
Neural
network face
Network
detection
algorithm for face
features
4. EXISTING SYSTEM
There are perhaps a dozen or more top-performing models for image recognition
that can be downloaded and used as the basis for image recognition and related computer
vision tasks.
Perhaps three of the more popular models are as follows:
• VGG (e.g. VGG16 or VGG19).
• GoogLeNet (e.g. InceptionV3).
• Residual Network (e.g. ResNet50).
These models are both widely used for transfer learning both because of their
performance, but also because they were examples that introduced specific architectural
innovations, namely consistent and repeating structures (VGG), inception modules
(GoogLeNet), and residual modules (ResNet).
Keras provides access to a number of top-performing pre-trained models that
were developed for image recognition tasks.
They are available via the Applications API, and include functions to load a
model with or without the pre-trained weights, and prepare data in a way that a given
model may expect (e.g. scaling of size and pixel values).
The first time a pre-trained model is loaded, Keras will download the required
model weights, which may take some time given the speed of your internet connection.
Weights are stored in the .keras/models/ directory under your home directory and will be
loaded from this location the next time that they are used.
When loading a given model, the “include_top” argument can be set to False, in
which case the fully-connected output layers of the model used to make predictions is
not loaded, allowing a new output layer to be added and trained.
5. PROBLEM STATEMENT
Create a project using transfer learning solving various problems like Face
Recognition and Image Classification, using CNN, such as transfer learning for example,
CNN has been widely applied in various fields of research which is image classification,
pedestrian detection, object detection, video analysis, food detection and face
recognition.
Human emotions and intentions are expressed through facial expressions and
inferring an efficient and effective feature is the primary component of the facial
expression system. Facial expressions convey non-verbal cues, which play an important
role in interpersonal relations. Automatic recognition of facial expressions can be an
important segment of natural human-machine interfaces; it may further be used in
behavioral science and in clinical practice. An automatic Facial Expression Recognition
system needs to solve the following problems: detection and location of faces in a
cluttered scene, facial feature extraction, and facial expression classification.
6. SCOPE
In this project, a facial expression recognition system is implemented using a
convolution neural network. Facial images are classified into seven facial expression
categories namely Anger, Disgust, Fear, Happy, Sad, Surprise, and 'Neutral. Kaggle
dataset is used to train and test the classifier.
• A method is to be composed of three parts: face detection, face extraction and face
recognition.
• To proposed such a method that illustrates the robustness against the appearance,
variations of expression, lighting and presence/absence of accessories.
• Extensive experiments are yet to be apply to demonstrate the effectiveness of
method.
7. PROPOSED SYSTEM
In the proposed method, VGG16 CNN architecture is used for transfer learning
and is trained on the input images from he faces dataset. The input images to the CNN
are of size 224x224x3. The input images are passed through stack of 13 convolutional
layers with filter size 3x3 and different depths of 64,528,256 and 512. Following the
convolutional layers, three fully connected layers (FC) are used. The first two FC layers
have 4096 units and the last FC layer performs classification into 1000 classes. All the
hidden layers have non-linear rectification (ReLU). SoftMax layer is used as the last
layer of the CNN which performs activation function. Including the input and output
layer, there are totally 41 layers in the architecture. The architecture of VGG16 CNN
model is shown in Figure7.1.
The VGG16 was already trained on huge ImageNet database and the weights
were learnt. These weights are used to initialize the model and trained on the input faces
images training dataset. The extracted features are fed as input to the FC layer followed
by SoftMax activation and classified into different faces.
SYSTEM ARCHITECTURE
Fig.7.1. System Architecture For Recognizing Disguise Faces
ALGORITHM
Step 1: Start
Step 2: Collect the dataset
Step 3: Train the model using VGG 16
Step 4: Test and run the model
Step 5: Display Results Stored
Step 6: End
MATHEMATICAL MODEL
Artificial Neural Network (ANN)
Artificial Neural Network (ANN) is a processing system that has several
performance characteristics similar to our biological brain neural networks. McCulloch
& Pitts first designed ANN in 1943. ANN has been developed as a mathematical model
of generalization of neurobiology or human cognition, then based on this assumption:
● Neurons is an element where an information processing occurs.
● Each connection link has an associated weight that streams the transmitted
signal.
● The connection between neurons is called the connection link that passes a
signal.
● To determine each neuron output, it implements an activation function that is
usually non-linear to the input of its network.
Fig.7.3.1 Single Nuuron network
There are many elements of the processing units in ANN which commonly called
neurons, units, cells, or nodes. Every neuron connected via a communication link and
associated with load. Weight represents the information that will be used to solving
problems, and among the widely used neural networks implementation is for the pattern
classification challenge.
Let’s see the simple neural network in Fig.7.3.1, let output neurons be ŷ, it then
receives input from activation function neuron a. And respectively a get the input from
three other activation x1, x2 and x3. Which symbolize by X1, X2, and X3 for their
neurons name. Moreover, the weight connected from X1, X2, and X3 to activation
function a are w1, w2, and w3. So, the calculation for output can be denoted by (1).
ŷ = a = w1x1 + w1x1 + w1x1 (1)
After that, calculate the loss function network above. The loss function is a
measure of the difference between the prediction of ŷ, and the true value (ground truth),
in other words, it is an error calculation for one stage of training. This function can be
seen (2).
In this research, we use categorical cross-entropy as loss function, because we
want to classify each person by his/her face. This function will compare the distribution
of predicted face, by true and false which set to 1 for true and 0 if false. The true class of
a person’s face represents as a one-hot encoded vector, which is we get the lower loss if
the model output vector is closer to the true class. The loss function is as follows:
Class symbolize by C, where Xi is the input vector for one-hot encoded target vector Yi,
and pij is probability that Ith element in class j.
Convolutional Neural Network (CNN)
Convolutional Neural Networks (CNN) or commonly referred to as CNN, is one
of the special cases of the Artificial Neural Network (ANN) which is currently
considered the best technique to solve object recognition and digit detection problems.
In a Neural Network as deep as CNN there are many models that are being
developed until now, but in this paper, just focus on 3 different model architecture.
AlexNet
Created in 2012, this architecture is the first deep networks that can classify some
object with significant accuracy in the ImageNet dataset, compared to traditional
methodologies that were before AlexNet. This network consists of 5 convolutional
layers followed by 3 fully connected layers, as illustrated in Fig.7.3.2.1.
Fig.7.3.2.1. Network architecture of AlexNet.
VGG16
Created in 2013, this architecture comes from the VGG group, Oxford. VGG was
made to improve from the AlexNet architecture by replacing large kernel filters (11 and
5 in the first and second convolutional layers) with some 3x3 kernel filters. With a given
receptive field, small-sized kernels that are stacked are better than large- size kernels,
because several non-linear layers increase the depth of the network which makes it
possible to learn more complex features. As a comparison, it can be seen in Fig.7.3.2.2.
Fig.7.3.2.2. Network architecture of VGG16
GoogLeNet / Inception
Created in 2014, while VGG achieved phenomenal accuracy in the ImageNet
dataset, but its use requires high computation, even though it uses a GPU (Graphic
Processing Unit). This has become inefficient due to the large width of the convolutional
layer used. GoogLeNet builds on the idea that most activations in deep networks are not
needed (zero value) or excessive because of the correlation between them. Therefore, the
most efficient deep network architecture will have sparse connections between
activations, which implies that all 512 output channels will not have connections
between each other. GoogLeNet designed a module called the Inception module which
numbered roughly like a thin CNN with a solid construction (shown in Fig.7.3.2.3).
Because only a small fraction of the neurons is effective as mentioned previously, the
width/number of convolutional filters of the kernel size is kept small. This module also
uses convolution of various sizes to capture details at various scales (5x5, 3x3, 1x1).
7.3.2.4. ResNet
In accordance with what has been discussed so far, namely, to improve accuracy
in the network must increase the depth of the layer, as long as it can keep over-fitting.
However, increasing the deep network does not work by simply adding layers. Deep
networks are difficult to practice because of the problem of vanishing gradients, where
gradients are re-propagated to the previous layer, repeated repetition can make the
gradient very small. As a result, as the network grows, the performance becomes
saturated or even begins to degrade quickly. Created in 2015, the basic idea of ResNet
(Residual Network) is to introduce what is called an "identity shortcut connection" that
passes through one or more layers, as shown in Fig.7.3.2.4.
Fig.7.3.2.4. Network architecture of ResNet
8. METHODOLOGY
DFD Diagram
Fig. 8.1.1 DFD Level 0 for Recognizing Disguise faces
Fig. 8.1.2 DFD Level 1 for Recognizing Disguise faces
Fig. 8.1.3 DFD Level 2 for Recognizing Disgise Faces
UML Diagram
Activity Diagram
Fig. 8.2.1 Activity Diagram for Recognizing Disguise Faces
Sequence Diagram
Fig. 8.2.2 Sequence Diagram for Recognizing Disguise Faces
Use Case Diagram
Fig. 8.2.2 Use Case Diagram for Recognizing Disguise Faces
Class Diagram
Fig. 8.2.4 Class Diagram for Recognizing Disguise Faces
Flow Chart
Fig. 8.2.5 Flowchart of Recognizing Disguise Faces
9. PLANNING
Fig. 9.1 Gantt chart for planning
10. DETAILS HARDWARE AND SOFTWARE
Hardware Requirements
System : Intel Core i3 2.00 GHz.
Hard Disk : 1 TB.
Monitor : 14’ Color Monitor.
Mouse : Optical Mouse.
Ram : 2 GB.
Keyboard : 101 Keyboard Keys.
Software Requirements
Operating system : Windows 10.
Coding Language : Python
Software’s used : Keras Version 2.3.1
11. ADVANTAGES
● Helps solve complex real-world problems with several constraints.
● Tackle problems like having little or almost no labeled data availability.
● Ease of transferring knowledge from one model to another based on domains and
tasks.
● Provides a path towards achieving Artificial General Intelligence some day in the
future.
● It automatically detects the important features without any human supervision.
● Less computational power or resources like RAM, CPU, GPU or TPU, etc.
● In less quantity of data, we can achieve more accuracy.
12. IMPLEMENTATION FOR NEXT SEMESTER
The techniques used for the whole process of face recognition are machine learning
based because of their high accuracy as compared with other techniques. Face detection is
the pre-step for face recognition that is performed using Haar-like features. Detection rate
of this method is 98% using 3099 features. Face recognition is achieved using Deep
Learning’s sub-field that is Convolutional Neural Network (CNN). It is a multi-layer
network trained to perform a specific task using classification. Transfer learning of a
trained CNN model that is AlexNet is done for face recognition. It has an accuracy of
98.5% using 2500 variant images in a class. These smart glasses can serve in the security
domain for the authentication process.
REFERENCES
[1] A. Samuel, "Some Studies in Machine Learning Using the Game of Checkers,"
IBM Journal of Research and Development, vol. 3, no. 3, p. 210–229, 1959.
[2] M. A. Turk and A. P. Pentland, "Face recognition using eigenfaces," IEEE Conference
on Computer Vision and Pattern Recognition, vol. 1, p. pp. 586–591, 1991.
[3] J. Yang, D. Zhang, A. F. Frangi and J. Yang, "Two dimensional PCA: a new approach to
appearance-based face representation and recognition," IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 26, no. 1, p. 131–137, 2004.
[4] Y. Taigman, M. Yang, M. Ranzato and L. Wolf, "Deepface: Closing the gap to human-
level performance in face verification," CVPR, p. 1701–1708, 2014.
[5] J. West, D. Ventura and S. Warnick, "A Theoretical Foundation for Inductive Transfer,"
Spring Research Presentation, 2007.
[6] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and
a.T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," in ACM MM,
2014.
[7] D. Tomé, F. Monti, L. Baroffio, L. Bondi, M. Tagliasacchi and S. Tubaro, "Deep
Convolutional Neural Networks for pedestrian detection," Signal Processing: Image
Communication, vol. 47, pp. 482-489, 2016.
[8] Z. Deng, H. Sun, S. Zhou, J. Zhao and H. Z. Lin Lei, "Multi-scale object detection
in remote sensing imagery with convolutional neural networks," ISPRS Journal of
Photogrammetry and Remote Sensing, vol. 145, no. Part A, pp. 3-22, 2018.
[9] M. Perez, S. Avila, D. Moreira, D. Moraes, V. Testoni, E. Valle, S. Goldenstein and A.
Rocha, "Video pornography detection through deep learning techniques and motion
information," Neurocomputing, vol. 230, pp. 279-293, 2017.
[10] R. D. Yogaswara and A. D. Wibawa, "Comparison of Supervised LearningImage
Classification Algorithms for Food and Non-Food Objects," in CENIM, Surabaya, 2018.
[11] Z. Yang and R. Nevatia, "A multi-scale cascade fully convolutional network face
detector," in International Conference on Pattern Recognition (ICPR), Mexico, 2016.
[12] F. Chollet, Keras, GitHub: GitHub repository, 2015.
[13] T. I. Dhamecha, A. Nigam, R. Singh and M. Vatsa, "Disguise detection and face
recognition in visible and thermalspectrums," in IEEE International Conference on
Biometrics, pp. 1–8, 2013.