Data Classification Using Convolutional Neural Network
Data Classification Using Convolutional Neural Network
NETWORK
Bachelor’s thesis
Aleksandr Bukatoi
ABSTRACT
ABSTRACT
The author’s aim in this paper was to understand how deep learning can
be connected to automation engineering and which solution would be best
suited for data classification.
At the end of this work the author has achieved the desired results in the
form of program classifying images in two categories with a possibility of
improving the system for future projects.
1 INTRODUCTION.............................................................................................................1
5 CONCLUSIONS.............................................................................................................29
REFERENCES.....................................................................................................................30
Appendices
Appendix 1 Contents of the file “CNN.py”
LIST OF ABBREVIATIONS
AI Artificial intelligence
ANN Artificial neural network
ANSI American national standards institute
CNN Convolutional neural network
CNTK Microsoft cognitive toolkit
CPU Central processing unit
DL Deep learning
GUI Graphical user interface
GPU Graphics processing unit
ISO International organization for standardization
JIT Just-in-time compilation
JVM Java virtual machine
LLVM The LLVM compiler infrastructure project
Matplotlib A plotting library for the Python programming language
MIT Massachusetts institute of technology
ML Machine Learning
MSIL Microsoft intermediate language
NumPy The fundamental package for scientific computing with
Python
PyPI The Python package index
PyPy An alternative implementation of the Python
programming language to CPython
PyQt One of the most popular Python bindings for the Qt
cross-platform C++ framework
QtPy A small abstraction layer that lets user write applications
using a single api call to either PyQt or PySide
SciPy A free and open-source Python library used for scientific
computing and technical computing
1
1 INTRODUCTION
Since ancient times the humankind has been striving to lessen the
everyday burden of people. The progress has made a great path from the
invention of wheel to self-driving cars. As automation is rising, less tasks
require human labour while opening new possibilities to engineers and
developers.
Artificial intelligence has been a fascinating topic for science fiction writers
and directors for decades, we are yet to observe its birth. The results of
various experiments with AI are just simple imitations of how science
perceives the work of the human brain. Some of the solutions are called
Neural Networks. The field responsible for building Neural Networks is
called deep learning.
The aim of this thesis work was to study the Convolutional Neural Network
as a part of deep learning, to build a sample program to represent the
possibilities of this branch of artificial intelligence.
2 DEEP LEARNING
The history of Deep Learning starts in 1943, when a computer model based
on the neural network was created by Walter Pitts and Warren McCulloch.
The combination of algorithms and mathematics called “threshold logic”
was used to mimic the thought process. Since that time, Deep Learning has
been evolving steadily with two significant events in its development called
Artificial Intelligence winters (Foote 2017).
3
The first AI winter started during the 1970s, the result of promises that
could not be kept. This resulted in lack of funding in both Artificial
Intelligence and Deep Learning research. Fortunately, there were
individuals who continued the research without funding (Foote 2017).
The use of errors in training Deep Learning models called Back propagation
evolved drastically in 1970. Seppo Linnainmaa wrote his master’s thesis,
including a FORTRAN code for back propagation. Unfortunately, the
concept was not applied to neural networks until 1985, when Rumelhart,
Williams, and Hinton demonstrated back propagation in a neural network
could provide “interesting” distribution representations. This discovery
brought to light the question within cognitive psychology of whether
human understanding relies on symbolic logic (computationalism) or
distributed representations (connectionism). Yann LeCun provided the
first practical demonstration of backpropagation at Bell Labs in 1989. He
combined convolutional neural networks with back propagation, so the
4
system could read “handwritten” digits. This system was eventually used
to read the numbers of handwritten checks (Foote 2017).
The next significant step for Deep Learning occurred in 1999, when
computers started becoming faster at processing data and GPU (graphics
processing units) were developed. Faster processing, with GPUs processing
pictures, increased computational speeds by 1000 times over a decade.
During this time, neural networks began to compete with support vector
machines. While a neural network could be slow compared to a support
vector machine, neural networks offered better results using the same
data. Neural networks also have the advantage of continuing to improve
as more training data is added (Foote 2017).
First of all, Deep Learning consists of the following methods and their
variations:
• Unsupervised learning systems such as Boltzman Machines for
preliminary training, Auto-Encoders, Generative Adversarial
Network.
• Supervised learning such as Convolution Neural Networks which
brought technoogy of pattern recognition to a new level.
• Recurrent Neural Networks, allowing to train on processes in time.
• Recursive neural networks, allowing to include feedback between
circuit elements and chains.
3.1 Neuron
From the mathematical point of view, the artificial neuron is the adder of
all incoming signals, applying to the resulting weighted sum some simple,
in general, nonlinear function that is continuous throughout the domain of
7
The current state of the neuron (1) is defined as the weighted sum of its
inputs:
𝑛
(1)
𝑠 = ∑ 𝑥𝑖 ∗ 𝑤𝑖 + 𝑤0
𝑖=1
𝑦 = 𝑓(𝑠) (2)
Input nodes accept information from input values, which, for example,
could be a binary 1 or 0. The purpose of input nodes is to serve the data
flowing into the network (Albright 2016).
Output layer nodes role is the same as hidden layer ones: output nodes
sum the input from the hidden layer, and if they reach a required value,
the output nodes fire and send specific signals. At the end of the process,
the output layer will be transmitting a set of signals that represents the
result of the input (Albright 2016).
The network shown above is simple, deep neural networks can have many
hidden layers and hundreds and even thousands of nodes (Albright 2016).
9
Each convolution operation has a kernel which could be any matrix smaller
than the original image in height and width as illustrated in figure 5. Each
kernel is useful for a specific task, such as sharpening, blurring, edge
detection, and etc.
To calculate the convolution, the kernel is swept on the image and at every
single location the output is calculated. The following equations are used
to calculate the exact size of convolution output for an input with the size
of (height = 𝐻, width = 𝑊) and a filter with the size of (height = 𝐹ℎ , width
= 𝐹𝑤 ):
𝐻 − 𝐹ℎ + 2𝑃 (4)
𝑂𝑢𝑡𝑝𝑢𝑡 ℎ𝑒𝑖𝑔ℎ𝑡 = +1
𝑆ℎ
11
𝑊 − 𝐹𝑤 + 2𝑃 (5)
𝑂𝑢𝑡𝑝𝑢𝑡 𝑤𝑖𝑑𝑡ℎ = +1
𝑆𝑤
where 𝑆ℎ and 𝑆𝑤 are vertical and horizontal stride of the convolution and
𝑃 is the amount of zero-padding added to the border of the image
(Karpathy 2016).
Depth of the output corresponds to the number of filters the user would
like to implement, each learning to search for something different in the
input (Karpathy 2016).
Stride is a parameter that specifies how filter slides on the image. When
the stride is 1 the filters move one pixel at a time. Increasing stride allows
to produce smaller output volumes spatially (Karpathy 2016).
If all neurons in a single depth slice are using the same weight vector, then
the forward pass of the convolutional layer can be computed in each depth
slice as a convolution of the neuron’s weights with the input volume
(Therefore the name: Convolutional layer). This is the reason why it is
common to refer to the sets of weights as a filter (or a kernel), that is
convolved with the input (Karpathy 2016).
The Rectified Linear Unit (6) (also Rectifier) is used to increase non-linearity
in images. It computes the function:
Leaky ReLU (7) is one attempt to solve the “dying ReLU” problem. Instead
of the function being zero when 𝑥 < 0, a leaky ReLU has a negative slope.
3.7.2 Maxout
Maxout is another solution to the “dying ReLU”. The Maxout neuron (8)
computes the function:
Both ReLU and leaky ReLU are a special case of this form. The Maxout
neuron has all the advantages of a ReLU and has no drawbacks. However,
the Maxout doubles the number of parameters for every single neuron
(Karpathy, 2016).
3.8 Pooling
In addition to max pooling, the pooling layers can also perform other
functions. Average pooling was often used historically but has recently
fallen out of favour compared to the max pooling operation, which has
been shown to be more efficient (Karpathy, 2016).
3.9 Flattering
Figure 7. Flattening
Flattening layer is a simple layer that is used to prepare data to be the input
of the final and most important layer – Fully-Connected Layer. Generally,
neural networks receive data in one dimension in a form of an array of
values, this layer uses data received from pooling layer or convolutional
layer and squashed the matrixes into arrays as illustrated in figure 7.
Obtained values are used as an input to the neural network (Karpathy,
2016).
3.10 Full-connection
After numerous convolution and pooling layers are used in neural network
Fully connected layer is used in order to access all activation functions in
previous layer (Karpathy 2016).
3.11.1 Softmax
𝑒 𝑎𝑖 (9)
𝑝𝑖 =
∑𝑁 𝑎
𝑘=1 𝑒𝑘
To make softmax function numerically stable, the values in the vector are
simply normalized by multiplying the numerator and denominator with a
constant 𝐶 (Dahal November 15).
𝑒 𝑎𝑖 (10)
𝑝𝑖 = =
∑𝑁 𝑘=1 𝑒
𝑎𝑘
𝐶 ∗ 𝑒 𝑎𝑖
=
𝐶 ∗ ∑𝑁𝑘 =1 𝑒
𝑎𝑘
𝑒 𝑎𝑖 +log(𝐶)
∑𝑁
𝑘=1 𝑒
𝑎𝑘 +log(𝐶)
16
User can choose an arbitrary value for log(𝐶) term, but usually
log(𝐶 ) = −max(𝛼) is chosen, as it shifts all elements in the vector from
negative to zero, and negatives with large exponents saturate to zero
rather than the infinity, avoiding overshooting (Dahal November 15).
𝑒 𝑎𝑖 (11)
𝜕
𝜕𝑝𝑖 ∑𝑁
𝑘=1 𝑒
𝑎𝑘
=
𝜕𝑎𝑗 𝜕𝑎𝑗
𝑔(𝑥)
From quotient rule we know that for 𝑓(𝑥 ) = ℎ(𝑥) we have 𝑓 (𝑥 ) =
𝑔′ (𝑥)ℎ(𝑥) −ℎ′ (𝑥) 𝑔(𝑥)
.
ℎ(𝑥) 2
𝜕
In this situation 𝑔 (𝑥 ) = 𝑒 𝑎 and ℎ(𝑥 ) = ∑𝑁 𝑎𝑘
𝑘=1 𝑒 .In ℎ(𝑥), 𝜕𝑒𝑎𝑗 will always
𝜕
be 𝑒 𝑎𝑗 . , But it is important to note that in 𝑔(𝑥) 𝜕𝑒 𝑎 will be 𝑒 𝑎𝑗 only if 𝑖 =
𝑗
𝑗 otherwise is 0 (Dahal November 15).
If 𝑖 = 𝑗
𝑒𝑎𝑖 (12)
𝜕 𝑁 𝑎
∑ 𝑘=1 𝑒 𝑘
=
𝜕𝑎𝑗
𝑒 𝑎𝑖 ∑𝑁𝑘=1 𝑒 𝑎𝑘 − 𝑒 𝑎𝑗 𝑒 𝑎𝑖
=
(∑𝑁 𝑎𝑘 2
𝑘=1 𝑒 )
𝑎𝑘 − 𝑒 𝑎𝑗 )
𝑒 𝑎𝑖 (∑𝑁𝑘=1 𝑒
=
(∑𝑁 𝑎𝑘 2
𝑘=1 𝑒 )
𝑒 𝑎𝑗 (∑𝑁𝑘 =1 𝑒
𝑎𝑘 − 𝑒 𝑎𝑗 )
∗ =
∑𝑁
𝑘=1 𝑒
𝑎𝑘 ∑𝑁𝑘=1 𝑒
𝑎𝑘
𝑝𝑖 (1 − 𝑝𝑗 )
For 𝑖 ≠ 𝑗
𝑒 𝑎𝑖 (13)
𝜕
∑𝑁𝑘=1 𝑒
𝑎𝑘
=
𝜕𝑎𝑗
0 − 𝑒 𝑎𝑗 𝑒 𝑎𝑖
=
(∑𝑁 𝑎𝑘 2
𝑘=1 𝑒 )
−𝑒 𝑎𝑗 𝑒 𝑎𝑖
∗ =
∑𝑁
𝑘=1 𝑒
𝑎𝑘 ∑𝑁 𝑒 𝑎𝑘
𝑘=1
17
−𝑝𝑗 𝑝𝑖
𝜕𝑝𝑖 𝑝 (1 − 𝑝𝑗 ) 𝑖𝑓 𝑖 ≠ 𝑗 (14)
={ 𝑖
𝜕𝑎𝑗 −𝑝𝑗 𝑝𝑖 𝑖𝑓 𝑖 ≠ 𝑗
1 𝑖𝑓 𝑖 = 1
Or using Kronecker delta 𝜕𝑖𝑗 = {
0 𝑖𝑓 𝑖 ≠ 𝑗
𝜕𝑝𝑖 (15)
= 𝑝𝑖 (𝜕𝑖𝑗 − 𝑝𝑗 )
𝜕𝑎𝑗
Cross entropy indicates the distance between what the model believes the
output distribution should be, and what the original distribution really is.
It is defined as, 𝐻 (𝑦, 𝑝) = − ∑𝑖 𝑦𝑢 log(𝑝𝑖 ). Cross-entropy measure is a
widely used alternative of squared error. It is used when node activations
can be understood as representing the probability that each hypothesis
might be true, i.e. when the output is a probability distribution. Thus, it is
used as a loss function in neural networks which have softmax activations
in the output layer (Dahal November 15).
Cross-entropy loss with softmax function are used as the output layer
extensively. Now derivative of softmax (14) that was derived earlier is used
to derive the derivative of the cross-entropy loss function (16)(17)(18)
(Dahal November 15).
𝐿 = ∑ 𝑦 log(𝑝 ) (16)
𝑖 𝑖
𝑖
𝜕𝐿 (17)
=
𝜕𝑜𝑖
𝜕 log(𝑝𝑖 )
∑ 𝑦𝑘 =
𝜕𝑜𝑖
𝑘
𝜕 log(𝑝𝑖 ) 𝜕𝑝𝑘
− ∑ 𝑦𝑘 ∗ =
𝜕𝑝𝑘 𝑜𝑖
𝑘
1 𝜕𝑝𝑘
− ∑ 𝑦𝑘 ∗
𝑝𝑘 𝜕𝑜𝑖
𝜕𝐿 (18)
=
𝜕𝑜𝑖
1
−𝑦𝑖 (1 − 𝑝𝑖 ) − ∑ 𝑦𝑘 (−𝑝𝑘 𝑝𝑖 ) =
𝑝𝑘
𝑘≠𝑖
−𝑦𝑖 (1 − 𝑝𝑖 ) + ∑ 𝑦𝑘 𝑝𝑖 =
𝑘≠1
−𝑦𝑖 + 𝑦𝑖 𝑝𝑖 + ∑ 𝑦𝑘 𝑝𝑖 =
𝑘≠1
𝑝𝑖 (𝑦𝑖 + ∑ 𝑦𝑘 ) − 𝑦𝑖
𝑘≠1
𝑦 is one but encoded vector for the labels, so ∑𝑘 𝑦𝑘 =1 and 𝑦𝑖 + ∑𝑘≠1 𝑦𝑘 =1.
So the formula is,
𝜕𝐿 (19)
= 𝑝𝑖 − 𝑦𝑖
𝜕𝑜𝑖
Anaconda distribution comes with more than 1,000 data packages as well
as the conda package and virtual environment manager, called Anaconda
Navigator, so it eliminates the need to learn to install each library
independently (Conda November 22).
The open source data packages can be individually installed from the
Anaconda repository with the conda install command or using the pip
install command that is installed with Anaconda. Pip packages provide
many of the features of conda packages and in most cases they can work
together (Conda November 22).
You can also make your own custom packages using the conda build
command, and you can share them with others by uploading them to
Anaconda Cloud, PyPI or other repositories (Conda November 22).
4.3 Spyder
Spyder is extensible with first- and third-party plugins, includes support for
interactive tools for data inspection and embeds Python-specific code
quality assurance and introspection instruments, such as Pyflakes, Pylint
and Rope. It is available cross-platform through Anaconda, on Windows
20
Spyder uses Qt for its GUI and is designed to use either of the PyQt or
PySide Python bindings. QtPy, a thin abstraction layer developed by the
Spyder project and later adopted by multiple other packages, provides the
flexibility to use either backend (Spyder November 22).
4.4 Keras
It was planned that Google will support Keras in the main TensorFlow
library, however Scholl selected Keras as a separate add-on, as according
to the concept Keras is more of an interface than a through machine
learning system. Keras provides a higher-level, more intuitive set of
abstractions that makes it simple to build neural networks, regardless of
the library of scientific computing used at the bottom level. Microsoft is
working on adding to the Keras and lower-level CNTK libraries (Keras
November 22).
activate tensorflow
The prompt will be flanked by the name of the environment.
3. To install Spyder:
conda install spyder
Spyder can be used now.
5. To open Spyder:
spyder
By deefault Tensorflow and Keras use CPU to work, but GPU can be used
instead increasing neural network speed and performance significantly.
4.7.2 CUDA
4.7.3 cuDDN
4.8.1 Prerequisites
A flatten library is responsible for preparing data to be the input for fully
connected layer.
24
classifier = Sequential()
# Step 1 - Convolution
# Step 2 - Pooling
The code above is the neural network itself. First objective is to create an
object of the sequential class. Since neural network is going to classify
images, the name of the object is classifier.
The next layer in the sequence is maxpooling layer. It reduces the size of
feature maps created after convolution operation leaving only important
features for system to detect. Pool size is 2 by 2 in this model.
As was mentioned above dense functions are used to add hidden layers. In
this case fully-connected layer is added. Parameter “unit” is representing
the number of nodes in the layer. There are no rules on what number of
nodes should be used, but in general practice number of 128 is
implemented. For this layer to be activated ReLU function is required.
The last layer in CNN is output layer. Number of output nodes is 1 which
is the predicted probability of one class. Sigmoid activation function is
required since the outcome in predictions is binary. For multiple outcomes
Softmax activation is needed.
When the core of CNN is created the next is to compile it. Compile method
uses following parameters: optimizer for stochastic descent algorithm, loss
function and metrics parameter to choose the performance metric. In this
model ‘adam” algorithm is used. “Binary cross-entropy” function is
implemented for binary outcome classifier. The last argument is metric.
Accuracy is what required for this CNN.
After the code is compiled the next task is to fit CNN into images. The
following code can be found on Keras documentation website in the
preprocessing section.
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
26
training_set = train_datagen.flow_from_directory('dataset/training_set',
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('dataset/test_set',
batch_size = 32,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 25,
validation_data = test_set,
validation_steps = 2000)
The last code section is responsible for fitting the images into CNN as well
as testing its performance on the test set. First arguments is the number
27
of images that are expected to be used and how many times the system
has to run through these images while “learning”. The last arguments are
for testing the efficiency of training. Number of images for test is to be
specified.
At this point the user can receive the first results. He needs to run the code
and wait. The training process can take from several minutes to several
hours depending on PC’s capabilities and whether neural network is
supported by CPU or GPU. The author’s first results came with a 75%
accuracy in 15 hours of training.
When the program starts working, the user will be able to see information
on the training process such as the estimated time to finish an epoch, the
loss that represents errors during training. The most important argument
is validation accuracy. It presents how efficiently the program recognizes
images that were not presented in the training set.
28
import numpy as np
test_image = image.img_to_array(test_image)
result = classifier.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
prediction = 'dog'
else:
prediction = 'cat'
Finally, the last part of the code is responsible for providing results based
on images that are not used during training process. First part is used to
set a path to an image that is going to be tested by CNN. Following next
are commands to transform test image into 3D array and adding to that
array another dimension which represents result for test image, which can
be either 0 or 1 in binary classification.
To specify which value belongs to which class additional lines of code are
added. User himself can determine the names of result variables. For
example, cats equal 0 and dogs equal 1.
When the code is finished, and neural network is trained, user has to
specify a path to image he wants to test and run the last part of code again.
The result will be shown in variable explorer.
29
5 CONCLUSIONS
The goal of this thesis project was to create a classifier based on a neural
network using knowledge on deep learning and Python programming. A
Convolutional Neural Network based on 2-dimensional input data was a
suitable example of such a system. It is important to note that other types
of neural networks as well as programming languages could be used to
achieve the same results.
During the working process the author learnt new programming language
and understood structure of CNN archetypes.
In the end, set targets were reached and a suitable Convolutional Neural
Network program was created to classify data by a binary outcome.
Further development of this project is possible in the future, with the
possibilities of creating more complex program for face or voice
recognition as well as for creating a multi-purpose application.
30
REFERENCES
Albright D. (2016, December 13), What Are Neural Networks and How Do
They Work? Retrieved from Make use of:
https://www.makeuseof.com/tag/what-are-neural-networks/
Neurofantstic (2017, May 1), Brain computation is a lot more analog than
we thought. Retrieved from:
https://neurofantastic.com/brain/2017/4/13/brain-computation-is-a-lot-
more-analog-than-we-thought
https://becominghuman.ai/artificial-neuron-networks-basics-
introduction-to-neural-networks-3082f1dcca8c
Singh A. (2017, October 18), How does the ReLU function work for z < 0?
Retrieved from:
https://stats.stackexchange.com/questions/308689/how-does-the-relu-
function-work-for-z-0
Appendix 1
Contents of the file “CNN.py”
classifier = Sequential()
# Step 1 - Convolution
# Step 2 - Pooling
# Step 3 - Flattening
classifier.add(Flatten())
33
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
training_set = train_datagen.flow_from_directory('dataset/training_set',
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('dataset/test_set',
batch_size = 32,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 25,
validation_data = test_set,
34
validation_steps = 2000)
import numpy as np
test_image = image.img_to_array(test_image)
result = classifier.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
prediction = 'dog'
else:
prediction = 'cat'