0% found this document useful (0 votes)
43 views69 pages

CAPSTONE Project Report Model

Uploaded by

Garvit Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views69 pages

CAPSTONE Project Report Model

Uploaded by

Garvit Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 69

Breast Cancer and its Recurrence

Identification using Image Processing and


Deep Learning Techniques

Submitted in partial fulfillment of the requirements for the degree of

Bachelor of Technology
in

Electronics and Instrumentation


Engineering
by

ALAN GEORGE (19BEI0001)


ARSHAQUE ABDUSALAM (19BEI0036)
THOMAS TOM (19BEI0051)
Under the guidance of
Dr.G.K.Rajini
School of Electrical Engineering
Vellore Institute of
Technology,Vellore.

May,2023
DECLARATION

I here by declare that the thesis entitled “Breast Cancer and its Recurrence
Identification using Image Processing and Deep Learning Techniques” submitted
by Alan George (19BEI0001), Arshaque Abdusalam (19BEI0036) and Thomas Tom
(19BEI0051), for the award of the degree of Bachelor of Technology in Electronics
and Instrumentation Engineering to Vellore Institute of Technology, Vellore is a
record of bonafide work carried out by me under the supervision of Dr. G.K.Rajini,
Professor, School of Electrical Engineering, VIT University, Vellore.

I further declare that the work reported in this thesis has not been submitted
and will not be submitted, either in part or in full, for the award of any other degree or
diploma in this institute or any other institute or university.

Place: Vellore

Date:20/04/2023 Signature of the Candidate


CERTIFICATE

This is to certify that the thesis entitled “Breast Cancer and its Recurrence
Identification using Image Processing and Deep Learning Techniques” submitted
by Alan George (19BEI0001), Arshaque Abdusalam (19BEI0036) and Thomas
Tom (19BEI0051), School of Electrical Engineering, Vellore Institute of Technology,
Vellore, for the award of the degree of Bachelor of Technology in Electronics and In-
strumentation Engineering , is a record of bonafide work carried out by him under my
supervision during the period, 01.09.2022 to 30.04.2023,, as per the Vellore Institute
of Technology code of academic and research ethics.

The contents of this report have not been submitted and will not be submitted
either in part or in full, for the award of any other degree or diploma in this institute or
any other institute or university. The thesis fulfills the requirements and regulations of
the University and in my opinion meets the necessary standards for submission.

Place: Vellore

Date: 20/04/2023 Signature of the


Guide (Dr.
G.K.Rajini)

Head of the Department/ EIE Dean/ SELECT


Acknowledgement

With immense pleasure and deep sense of gratitude, I wish to express my sincere
thanks to my supervisor Dr. G.K.Rajini, Professor, School of Electrical Engineering,
Vellore Institute of Technology, Vellore, without her motivation and continuous
encour- agement, this project work would not have been successfully completed.
I am grateful to the Chancellor of Vellore Institute of Technology, Vellore, Dr.
G.Viswanathan, the Vice Presidents, the Vice Chancellor for motivating me to carry
out research in the Vellore Institute of Technology, Vellore and also for providing me
with infrastructural facilities and many other resources needed for my project.
I express my sincere thanks to Dr.Mathew M Noel, Dean, School of Electrical
Engineering, Vellore Institute of Technology, Vellore for his kind words of support
and encouragement. I like to acknowledge the support rendered by my colleagues in
several ways throughout my project work.
I wish to extend my profound sense of gratitude to my parents for all the sac-
rifices they made during my research and also providing me with moral support and
encouragement whenever required.

Alan George
Place: Vellore Arshaque Abdusalam
Thomas Tom
Date: 20/04/2023 Student Names

i
Executive Summary

Breast Cancer is a major global public health issue that affects millions of women.
Early detection and timely treatment are essential in lowering mortality rates. Mam-
mography and biopsies, the two most common procedures for finding breast cancer,
and they can be invasive and subject to false positive and negative results. The goal of
this project is to create a system that can detect breast cancer using image processing
and its recurrence through deep learning.
The suggested system is divided into two modules. The system will first use digital
mammography images to identify indications of breast cancer using a different types
of image processing techniques and edge detection techniques like Laplacian of
Gaussian (LoG), Canny, Sobel, Prewitt algorithms. The mammogram image will be
processed to improve image quality and retrieve relevant information. In order to
predict the like- lihood of recurrence, machine learning algorithms like Decision Tree
(DT), K Nearest Neighbour (KNN), Gaussian Na¨ıve Bayes (GNB), Support Vector
Machine (SVM) and Attention Awareness will analyse the data acquired with relevant
information. The goal is to provide a non-invasive and effective tool for early
diagnosis and prediction of breast cancer. The suggested system has the potential to
improve the accuracy and ef- ficiency of breast cancer detection. Through
performance metric like Closest Distance Metric, Pixel Correspondence Metric, Grey
Scale Figure of Merit (FOM) and Peak Sig- nal to Noise Ratio (PSNR) for image
processing. Accuracy, sensitivity, precision, f1 score, Cohen’s kappa score for
machine learning, the system compares various image processing and machine
learning methods to determine which one is most accurate at detecting tumor and its
recurrence.

ii
TABLE OF CONTENTS

Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Executive Summary.....................................................................................................ii
List of Figures..............................................................................................................vi
List of Tables..............................................................................................................vii
List of Terms and Abbreviations.............................................................................viii

1 INTRODUCTION 1
1.1 Objective.............................................................................................................1
1.2 Motivation..........................................................................................................1
1.3 Background.........................................................................................................2
1.4 Digital Image Processing....................................................................................2
1.4.1 Image Acquisition and Pre-processing...................................................2
1.4.2 Morphological Operation.......................................................................3
1.4.3 Image Segmentation...............................................................................3
1.4.4 Edge Detection.......................................................................................3
1.5 Edge Detection Techniques................................................................................4
1.5.1 Laplacian of Gaussian(LoG)..................................................................4
1.5.2 Canny Algorithm....................................................................................4
1.5.3 Sobel Operator........................................................................................5
1.5.4 Prewitt Operator.....................................................................................6
1.6 Mammogram Image...........................................................................................7
1.7 Wavelet Transform.............................................................................................8
1.7.1 2D Wavelet Analyzer.............................................................................8
1.8 Machine Learning Models..................................................................................8
1.8.1 Decision Tree..........................................................................................9
1.8.2 K Nearest Neighbor..............................................................................10

iii
1.8.3 Support Vector Machine.......................................................................11
1.8.4 Gaussian Na¨ıve Bayes...............................................................................12
1.8.5 Attention Awareness Mechanism.........................................................12

2 PROJECT DESCRIPTION AND GOALS 13


2.1 Review of Literature.........................................................................................13
2.2 Project Description...........................................................................................17
2.2.1 Software................................................................................................19
2.3 Goals.................................................................................................................19

3 TECHNICAL SPECIFICATION 20
3.1 Software Specification......................................................................................20
3.1.1 MATLAB.............................................................................................20
3.1.2 Wavelet Analyzer.................................................................................21
3.1.3 Jupyter Notebook..................................................................................22
3.2 References........................................................................................................22

4 DESIGN APPROACH AND DETAILS 24


4.1 Design Approach / Materials and Methods......................................................24
4.1.1 Processing the Mammogram image using different Edge Detec-
tion Methods.........................................................................................24
4.1.2 Image Processing using 2D Wavelet Analyzer....................................25
4.1.3 Recurrence Prediction Using Machine Learning.................................26
4.2 Codes and Standards.........................................................................................27
4.3 Constraints, Alternatives and Tradeoffs...........................................................27
4.3.1 Constraints............................................................................................27
4.3.2 Alternatives...........................................................................................27
4.3.3 Tradeoffs...............................................................................................27

5 SCHEDULE, TASKS AND MILESTONES 28


5.1 Schedule............................................................................................................28
5.2 Tasks and Milestones........................................................................................29

iv
6 PROJECT DEMONSTRATION 30
6.1 Image Processing Using different Edge Detection Methods on Mammogram
30 6.1.1
LoG Algorithm.................................................................................................30
6.1.2 Canny Algorithm..................................................................................31
6.1.3 Sobel Algorithm...................................................................................31
6.1.4 Prewitt Algorithm.................................................................................32
6.2 Image Processing Using 2D Wavelet Analyzer...............................................33
6.2.1 Haar Wavelet........................................................................................33
6.2.2 Dauechies Wavelet...............................................................................34
6.2.3 Symmetric Wavelet..............................................................................34
6.2.4 Coiflets Wavelet...................................................................................35
6.2.5 Biorthogonal Wavelet...........................................................................36
6.2.6 Reverse Biorthogonal Wavelet.............................................................36
6.2.7 Daubechies-Meyer Wavelet.................................................................37
6.2.8 Farid-kim Wavelet................................................................................38
6.3 Recurrence Prediction Using Deep Learning...................................................38

7 COST ANALYSIS / RESULT AND DISCUSSION 41


7.1 Cost Analysis....................................................................................................41
7.1.1 MATLAB.............................................................................................41
7.1.2 Jupyter Notebook..................................................................................41
7.2 Results and Discussion.....................................................................................42
7.2.1 Image Processing Using different Edge Detection Methods................42
7.2.2 Image Processing Using 2D Wavelet Analyzer...................................43
7.2.3 Recurrence Prediction Using Deep learning........................................44

8 SUMMARY 45
8.1 Summary...........................................................................................................45
REFERENCES......................................................................................................45
LIST OF PUBLICATIONS............................................................................48
CURRICULUM VITAE.................................................................................49

v
LIST OF FIGURES

1.1 Mammogram image with cancerous tissue.....................................................7


1.2 Decision tree hierarchy..................................................................................10
1.3 KNN finding nearest neighbor......................................................................11
1.4 SVM classification........................................................................................11
4.1 Flow diagram of using different Edge Detection Techniques.......................25
4.2 Flow diagram of using different Wavelet Transform....................................25
4.3 Flow diagram of using Deep Learning..........................................................26
5.1 Gantt Chart....................................................................................................29
6.1 Output of LoG Operator................................................................................30
6.2 Output of Canny algorithm............................................................................31
6.3 Output of Sobel Operator..............................................................................31
6.4 Output of Prewitt Operator............................................................................32
6.5 Synthesised image of Haar Wavelet..............................................................33
6.6 Synthesised image of Dauechies Wavelet.....................................................34
6.7 Synthesised image of Symmetric Wavelet....................................................35
6.8 Synthesised image of Coiflets Wavelet.........................................................35
6.9 Synthesised image of Biorthogonal Wavelet................................................36
6.10 Synthesised image of Reverse Biorthogonal Wavelet...................................37
6.11 Synthesised image of Daubechies-Meyer Wavelet.......................................37
6.12 Synthesised image of Farid-kim Wavelet......................................................38
6.13 Different Machine Learning Algorithms.......................................................39
6.14 Attention Layer..............................................................................................39
6.15 Model Prediction...........................................................................................40

vi
LIST OF TABLES

5.1 Timeline of Project Tasks..............................................................................28


7.1 Performance Metrics of Edge Detection Methods........................................42
7.2 Performance Metrics of Wavelet Transform.................................................43
7.3 Performance Metrics of Machine Learning models......................................44

vii
LIST OF TERMS AND ABBREVIATIONS

2D Two Dimensional.....................................................................................................2

ATW-CNN Attention-aware Temporal Weighted Convolutional Neural Network 17

Bior Biorthogonal.........................................................................................................8

CAD Computer-Aided Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . 16

CLAHE Contrast-limited Adaptive Histogram Equalisation . . . . . . . . . . . 19

CNN Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 14

coif Coiflets...................................................................................................................8

db Daubechies...............................................................................................................8

DDSM Digital Database for Screening Mammography.............................................14

Dmey Daubechies-Meyer..............................................................................................8

DT Decision Tree..........................................................................................................ii

DWT Discrete Wavelet Transform.............................................................................14

EJCRF Equilibrium-Jaya Controlled Regula Falsi....................................................15

fk Farid-kim..................................................................................................................8

FOM Figure of Merit....................................................................................................ii

GLCM Gray Level Co-occurrence Matrix.................................................................14

GLRLM Gray Level Run Length Matrix...................................................................14

GNB Gaussian Na¨ıve Bayes..............................................................................................ii

HIN Heterogeneous Information Network.................................................................13

KNN K Nearest Neighbour...........................................................................................ii

LoG Laplacian of Gaussian........................................................................................., 4

MATLAB Matrix Laboratory.....................................................................................20

MIAS Mammography Image Analysis Society..........................................................14

MRI Magnetic Resonance Imaging..............................................................................1

viii
PSNR Peak Signal to Noise Ratio................................................................................ii

PSO Particle Swarm Optimisation..............................................................................13

Rbio Reverse Biorthogonal...........................................................................................8

RF Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

RNN Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 14

ROC Receiver Operating Characteristic Curve . . . . . . . . . . . . . . . . . . . 16

ROI Region of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

SVM Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

sym Symmetric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

TCN Temporal Convolutional Network . . . . . . . . . . . . . . . . . . . . . . 17

UCI University of California Irvine . . . . . . . . . . . . . . . . . . . . . . . . 19

VGG16 Visual Geometry Group 16 . . . . . . . . . . . . . . . . . . . . . . . . 15

ix
CHAPTER 1

INTRODUCTION

1.1 Objective

Breast cancer is a critical public health concern that affects millions of women
globally. As the leading cause of death among women, early detection and prompt
treatment are crucial in reducing mortality rates. However, the manual interpretation
of mammogra- phy images, widely used as a screening tool for breast cancer, is
subjective and prone to errors. This study aims to address these limitations by
developing an image pro- cessing and machine learning system for the early detection
and prediction of breast cancer recurrence. The system will use digital mammography
images and apply ma- chine learning algorithms to identify signs of breast cancer and
predict the likelihood of recurrence. The ultimate goal of this project is to provide a
non-invasive and efficient tool for early detection and prediction, contributing to the
advancement of breast cancer diagnosis and treatment.

1.2 Motivation

Mammography, ultrasound, and Magnetic Resonance Imaging (MRI) are just a few of
the medical imaging procedures that have been widely utilized to help diagnose breast
cancer early but there are some limitations, image processing and deep learning meth-
ods, which would allow for more precise and reliable breast cancer detection and re-
currence prediction. We can increase the accuracy of breast cancer diagnosis, lower the
number of insignificant biopsies, and ultimately save more lives by creating a breast
cancer detection system based on image processing and deep learning.

1
1.3 Background

Breast cancer is the most common cancer among women, according to the World Health
Organization, with an estimated 2.3 million new cases diagnosed each year and over
685,000 fatalities, Early detection and prompt treatment are crucial in reducing
fatality rates. Using image processing and deep learning techniques, identify the signs
of breast cancer and predict the recurrence.

1.4 Digital Image Processing

A digital image is described as a two-dimensional function f (x, y), where x and y are
spatial (plane) coordinates. The intensity of the image is defined as the amplitude of f
at any given combination of coordinates. The image is made up of a finite number of
components, each with a distinct position and value, and these components are called
as pixels.A Two Dimensional (2D) digital image can be represented as Eqn. 1.1.

 
f (x, y) = f (0, 0) f (0, 1) ··· f (0, N − 1)
(1.1)
. . . 
 f (1, 0) f (1, .1) ··· f (1, N − 
 1) . .
. 
.
f (M − 1, 0) f (M −. 1, 1) · · · f (M − 1, N − 1)

Digital image processing is the method of processing and analyzing digital images us-
ing algorithms and mathematical models. Digital image processing is done to improve
image quality, extract useful information from images for identification, recognition and
automate operations and manipulation that involve images by adding and removing
ob- jects. Image acquisition and pre-processing, morphological operation and
restoration, image segmentation, edge detection and image analysis are the various
steps involved in digital image processing.

1.4.1 Image Acquisition and Pre-processing

Image acquisition and pre-processing is the process of capturing image and loading it
to any digital system. Pre-processing of an image is a fundamental step in image
analysis, uses a variety of approaches to enhance the quality of the input image and
retrieve important data for further processing. This might involve resizing or scaling
the image to a size suitable for processing, using enhancement techniques like
contrast stretching or histogram equalisation to improve visibility of features, filtering
to remove noise or emphasise specific features, normalisation to ensure uniform
lighting conditions, and colour space conversion to speed up image analysis.
2
1.4.2 Morphological Operation

Morphological operations in image processing refer to a set of mathematical


operations that assess the structure and organisation of objects in an image. The two
most popular processes are dilation and erosion. With dilation, the edges of objects in
an image are expanded by the addition of pixels. This can be used to fill in minute
holes in objects or close small gaps between objects. On the other hand, erosion
involves reducing the size of object borders by removing pixels from their edges.
Other operations, such as opening and closing, can be used to remove small objects,
fill in holes or gaps, and smooth edges

1.4.3 Image Segmentation

Segmentation is a crucial step in image processing, which involves breaking a picture


into a number of segments or regions, each of which corresponds to a particular object
or area of interest. Creating uniform, relevant sections that can be separately evaluated
for processing. Image segmentation can be done using a variety of methods, includ-
ing thresholding, edge-based segmentation, region-based segmentation, and
clustering. By locating their edges or boundaries, putting similar regions together, or
grouping pixels based on similarity, these techniques are used to distinguish objects
from the background.

1.4.4 Edge Detection

Edge detection in image processing is the process of identifying the edges or bound-
aries between different objects or regions within an image. Edges are described as
sharp shifts in colours or intensity that take place where two factors or regions meet.
These alterations are intended to be recognised by edge detection algorithms, which
display them as a collection of connected lines or curves that signify the edges of the
objects in the image. The edge detection algorithms used in this project are gradient-
based operators and Gaussian based methods such as Sobel, Prewitt, and Laplacian of
Gaussian, Canny algorithms respectively.

3
1.5 Edge Detection Techniques

1.5.1 Laplacian of Gaussian(LoG)

Laplacian of Gaussian (LoG) is a Gaussian-based operator that computes the second


Laplacian derivative of a picture. The second-order derivative’s crossing of zero, or
the maximum level, is decided by the operation’s use of the zero-crossing method. It
is known as an edge location. While the Gaussian operator reduces noise, the
Laplacian operator seeks out sharp edges. The way LoG works is to smooth a picture
with a Gaus- sian kernel, calculate its Laplace, then compute the Laplacian of the
Gaussian kernel and convolve it with the image. This operator can easily detect the
edges despite being sensitive to noise. The LoG operator is represented by equation
Eqn. 1.2.

=
2 ∂2 + ∂2 (1.2)

∂x2 ∂y2

1.5.2 Canny Algorithm

Canny is an edge detection operator that is based on Gaussians. This operator extracts
features without altering or changing the feature of the image and it is unaffected by
noise. The Laplacian of the Gaussian operator served as the basis for the advanced
approach used by the Canny edge detector. In order to acquire magnitude along the x
and y dimensions, the noise in the input image is first removed using a Gaussian filter,
followed by computing the derivative of the Gaussian filter. Reduce the non-max
edge contributor pixel points if there is a cluster of neighbors for any curve in a
direction perpendicular to the specified edge. A Hysteresis thresholding technique to
maintain pixels with gradient magnitudes greater than them and ignore pixels with
threshold values lower than them are being carried out.
Gaussian filter Eqn. 1.3 is used as smoothening filter.

G(x, y) = 2
1 − x +y
2

e 2σ2 (1.3)
2πσ2

For gradient calculation magnitude and direction is calculated using sobel operator.Gx
and Gy are the horizontal and vertical operator as described in Eqn. 1.4.
   
−1 0 1 −1 −2 −1
   
1 2
−1 00 2
Gx = −2 1 Gy =  0 10 0  (1.4)
4
Gradient magnitude M (x, y) and direction θ(x, y) are calculated as described in Eqn.
1.5 and Eqn. 1.6.
q
M (x, y) = G2x (x, y) + Gy2 (x, y) (1.5)
Gy(x, y)
θ(x, y) = atan( ) (1.6)
Gx(x, y))

1.5.3 Sobel Operator

Sobel operator is a discrete differentiation operator. It determines the gradient approx-


imation of the image intensity function for image edge detection. For each pixel in an
image, it generates either the vector’s normal or the corresponding gradient vector.
The vertical and horizontal derivative approximations are computed using two 3 x 3
kernels or masks that are convolved with the input image respectively, one kernel for
each of the two perpendicular orientations. Sobel operator is very simple and time
efficient, very easy to find smooth edges but very sensitive to noise.
Vertical Gx and horizontal Gy mask of sobel operator are shown in Eqn. 1.7.
 
−1 0 1 −1 −2 −1
 
1 2
−1 0 1
Gx = −2 0 2 Gy =  0 10 0  (1.7)
Gradient magnitude M (x, y) and direction θ(x, y) are calculated as described in Eqn.
1.8 and Eqn. 1.9.
q
M (x, y) = G2x (x, y) + Gy2 (x, y) (1.8)
Gy(x, y)
θ(x, y) = atan( ) (1.9)
Gx(x, y))

5
1.5.4 Prewitt Operator

Prewitt operator resembles the sobel operator very much. It recognizes the image’s
hor- izontal and vertical edges. The difference between the corresponding pixel
intensity of the image is used to calculate edges. Two types of edges are detected by
the Prewitt op- erator, edges that go vertically or along the y-axis and horizontally or
along the x-axis. Anywhere there is a sharp shift in pixel intensity, the mask will
detect edges. Differ- entiation can be used to compute the edge as it is defined as a
shift in pixel intensities. Prewitt masks are first-order derivative masks.
Prewitt Operator deploy two masks, one for detecting horizontal edges Gx and other
one for detecting vertical edges Gx as shown in Eqn. 1.10.
 
−1 −1 −1 −1 0 1
 
1 1
Gx =  0 10 0  Gy = −1−1 00 11 (1.10)
Gradient magnitude M (x, y) and direction θ(x, y) are calculated as described in Eqn.
1.11 and Eqn. 1.12.
q
M (x, y) = G2x(x, y) + G2y (x, y) (1.11)
Gy(x, y)
θ(x, y) = atan( ) (1.12)
Gx(x, y))

6
1.6 Mammogram Image

A mammogram image is a biomedical image, it resembles X-ray image of breast used


for screening and detecting cancerous tissue. It is widely used for early detection and
helps to find small tumors or any abnormality in the tissue.

Fig. 1.1 Mammogram image with cancerous tissue.

Fig.1.1 represents the mammogram image with cancerous tissue of dimension 2251x5341.

7
1.7 Wavelet Transform

Wavelet transform is a mathematical method used to enhance the quality of digital


im- ages. The fundamental benefit of employing wavelets is that they can separate a
signal into time and frequency components, allowing for spatial and frequency
representation of the signal. The image can be divided into multiple resolution levels
using the wavelet transform instead of other transforms and using wavelet coefficients
the image can be reconstructed, de-noise and compress. It has the ability to process
images from lower to higher resolutions.

1.7.12D Wavelet Analyzer

A 2D wavelet analyzer applies wavelet transforms to a 2D image to decompose it into


multiple levels of detail. This tool extracts the high and low-frequency components of
the image at different resolutions, with each level representing a different frequency
band. The 2D wavelet analyzer can be used to analyze an image in both the spatial
and frequency domains. Using the wavelet analyser the image can be reconstructed,
de- noise or compress. In this project different type of wavelets like Haar, Daubechies
(db), Symmetric (sym),Coiflets (coif), Biorthogonal (Bior), Reverse Biorthogonal
(Rbio), Daubechies-Meyer (Dmey), Farid-kim (fk) wavelets are used.

1 ∞

T (a, b) = √
t − b dt (1.13)
a −∞ x(t)ψ
a

1.8 Machine Learning Models

Machine learning is a component of artificial intelligence, which helps in the perform-


ing a particular task by learning from a specific set of data. In order to generate an
output as a prediction or decision, it analyses the data using various statistical
methods to look for any relationships or patterns in the given data. Instead of
completing various tasks separately, machine learning aims to create algorithms and
models that can learn from data and make precise predictions or decisions.
The machine learning consist of different process, includes data collection, data
prepa- ration and pre-processing, model selection, model training, model evaluation,
model optimization and model deployment. The initial step in machine learning is
data collect- ing, which involves gathering the data set from multiple sources such as
open websites or any company’s databases. Data preparation and pre-processing, is
the preparation of collected data for machine learning process by cleaning data,
removing any dupli- cate, changing the formal of data to a suitable format and pre-
8
processing includes data

9
normalisation, scaling and feature extraction. Model selection and training includes
the selection of algorithm suitable for the data set and the task such as classification,
regres- sion, prediction. And model should be trained using the pre-processed data,
and feed the data into the model and modifying the parameter to get desirable output.
Model evaluation is the process of evaluating the trained model using performance
metrics like accuracy, sensitivity, precision, f1 score etc. Model optimization and
deployment, after evaluating performance metrics, optimising the model and
deploying it, with ap- propriate parameter adjustments. After the procedure and by
evaluating, the model is deployed for prediction on new data. In this project different
type of machine learning models and neural network like DT,KNN,SVM, GNB and
Attention Awareness Mech- anism are used.

1.8.1 Decision Tree

Decision Tree (DT) is a non-parametric supervised learning algorithm that can be


used for classification as well as regression tasks. A root node, branches, internal
nodes, and leaf nodes make up its hierarchical tree structure. Root node is the entire
data set, branches are decision rules, internal nodes are the features of data set and the
leaf nodes are outcomes. Decision tree learning uses a divide and conquer method by
searching for the best split points inside a tree. After that, this dividing procedure is
repeated top- down and recursively until most or all of the records have been assigned
to particular class labels. Pure leaf nodes, or data points belonging to one class, can
more easily be achieved in smaller trees. However, when a tree gets wider, it becomes
harder to main- tain it pure, which typically results in over fitting because not enough
data are falling into a particular sub tree. As a result of this, decision trees prefer small
trees.

Fig. 1.2 describes the hierarchy of DT with root node as the entire data set, branches
are decision rules, internal nodes are the features of data set and the leaf nodes are
outcomes.

10
Fig. 1.2 Decision tree hierarchy.

1.8.2 K Nearest Neighbor

The K-nearest neighbors (KNN) algorithm is a type of supervised machine learning


al- gorithm that can be used for both classification and regression task. KNN uses all
of the data for training and classification instead of having an individual training
phase, making it a lazy learning algorithm. KNN is also a non-parametric learning
algorithm because it makes no assumptions about the underlying data. KNN classifies
new data based on similarities between it and existing data and it classify new data
based on the similarities. KNN determines the Euclidean distance between the new
data point and other data to perform classification as described in Eqn. 1.14.

Fig. 1.3 explains the classification of data using KNN. There are two data sets of
colour pink and red. The black colour dot is the test data, find the distance between
test data and train data using Euclidean method and assume K=3, therefore two red
dots are near the black dot, hence black dot will be assigned to red colour data.

Euclidean distance = (x1 − x2)2 + (y1 − y2)2 (1.14)

11
Fig. 1.3 KNN finding nearest neighbor.

1.8.3 Support Vector Machine

Support Vector Machine(SVM) is a supervised machine learning algorithm used for


both classification and regression. it divides data into various classes and establishes
decision boundaries that separate and establish classes, and that decision boundary is
called hyperplane. The aim of this algorithm is to determine the hyperplane in the N
dimensional plain that classify the data points. The total number of features affects the
hyperplane’s size. The hyperplane is just a line if there are just two features in the
input. Svm also determines the extreme cases which is known as support vectors.
There are linear SVM and Non-linear SVM for linearly separable data and non-
linearly separable data.
Fig.1.4 describe the classification using SVM, two data sets green and blue colour. A
straight line is used to separate two data sets in 2D space, multiple lines can also be
used. SVM helps to find the best line or decision boundary to separate the data sets,
and this region is called hyperplane. SVM will find the closest points and classify the
data.

Fig. 1.4 SVM classification.

12
1.8.4 Gaussian Na ¨ıve Bayes

Gaussian Na¨ıve Bayes (GNB) is a supervised machine learning algorithm based on


Bayes theorem. It is the quickest machine learning algorithm for making immediate
predictions. If the feature independence assumption is correct, it can outperform other
models while using much less training data. It is used to tackle multi-class prediction
problems and helps in real time prediction. GNB is a probabilistic classifier with
Gaus- sian approach.

P (B | A ) · P
(PA(A
) | B) = (1.15)
P (B)

Eqn.1.15 represent the formula of Bayes theorem. It describes A and B are two
events, P(A) and P(B) are probability of occurrence of event A and B. Equation finds
the prob- ability of event A occurring while event B occurred

1.8.5 Attention Awareness Mechanism

Attention awareness mechanism is a machine learning technology called attention aware-


ness implemented as a neural network. Giving a machine learning model the capacity
to choose focus on particular input sequences or data sets rather than analyzing the en-
tire input sequence at once is known as attention awareness in machine learning. The
goal is to give the model the ability to continuously evaluate the relative weights of the
various input data elements so that it can concentrate on the data that is most relevant
to improve its overall performance. Based on the relevance of the input data, the at-
tention mechanism learns to provide weights to various components. The next layer of
the network receives the weighted sum of the input data that was computed using these
weights. Weights can be changed to enhance the model’s performance on the task after
they are learned during training. Attention layer have sub layers feed forward layers,
softmax layer, context vector layer. Feed forward layer hepls to identify important state
by finding high score of attention layer. Context vector layer give information about
how much attention should be provided to inputs.

exp(e0i)
E0i = n
(1.16)
Σ exp(e0i)
j=1

Eqn 1.16 describes the calculation of attention weights E0i by applying the softmax on
the scores.

13
CHAPTER 2

PROJECT DESCRIPTION AND GOALS

2.1 Review of Literature

The paper ”Particle Swarm Optimisation Feature Selection for Breast Cancer Recur-
rence Prediction” (Sakri et al. 2018) investigates the application of Particle Swarm
Op- timisation (PSO) to the prediction of breast cancer recurrence. The authors
compared their PSO-based feature selection method to other commonly used feature
selection methods and discovered that the PSO-based method outperformed the others
in terms of accuracy and number of features selected. They also tested different
machine learning algorithms and discovered that combining PSO-selected features
with SVM produced the best accuracy in predicting breast cancer recurrence. The
authors believe that their proposed method for selecting features using PSO will lead
to more accurate and effi- cient prediction models in the future.

”Attention-aware Heterogeneous Graph Neural Network” (Zhang and Xu 2021)


proposes attention-based graph neural network that handles heterogeneous graphs for
node classification tasks. It employs an attention mechanism to assess the significance
of various node types and features, as well as to adaptively aggregate information
from neighbouring nodes. The AHGNN is made up of a graph convolutional network
and an attention mechanism, and it captures the rich relationships between different
types of nodes using a novel heterogeneous adjacency matrix. AHGNN outperformed
other models in terms of accuracy and F1-score when tested on four real-world
datasets, according to the authors. The results also demonstrated that AHGNN is
resistant to hy- perparameters and can effectively handle imbalanced datasets. The
authors concluded that AHGNN is an alternative method for embedding an
Heterogeneous Information Network (HIN), and its performance is dependent on the
characteristics of the dataset.

”Early Diagnosis of Breast Cancer Using Image Processing Techniques” (Younis


et al. 2022) is a research article that proposes a computer-aided diagnosis (CAD)
system for early detection of breast cancer using image processing techniques. The
method- ology involves preprocessing the mammogram images using techniques
14
such as con-

15
trast enhancement, normalization, and noise reduction. The authors then extract
texture features from the preprocessed images using a variety of feature extraction
methods, such as the Gray Level Co-occurrence Matrix (GLCM), Gray Level Run
Length Matrix (GLRLM), and the Discrete Wavelet Transform (DWT).The extracted
features are then used to train a variety of machine learning classifiers, including
KNN, Decision Tree and SVM. Each classifier’s performance is assessed using
performance metrics such as accuracy, sensitivity, specificity, and F1-score. The
study’s findings show that the proposed CAD system detects breast cancer with high
accuracy and sensitivity, out- performing previous studies that used similar
techniques. The authors also compare the performance of various feature extraction
methods and discover that DWT has the highest accuracy and sensitivity. The best
performing classifier is found to be the KNN classifier, with an accuracy of 96.25%
and a sensitivity of 96.08%.

”A novel deep learning model for breast cancer detection using mammography im-
ages” (Oyelade and Ezugwu 2022) proposes a novel architecture that combines a deep
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN).Methodology
involved pre-processing mammography images with a variety of image enhancement
techniques such as contrast enhancement and noise reduction. After that, the pre-
processed images were fed into the proposed deep learning model,which included a
CNN and an RNN. The CNN extracted features from mammography images, while
the RNN captured temporal dependencies between the extracted features.The proposed
model was tested using two publicly available datasets the Digital Database for Screen-
ing Mammography (DDSM) and the Mammography Image Analysis Society (MIAS).
The proposed model was evaluated on the publicly available breast dataset and
achieved a high accuracy of 95.98%, sensitivity of 97.46%, and specificity of92.47%.
”On the scalability of machine-learning algorithms for breast cancer prediction in big
data context” (Alghunaim and Al-Baity 2019) proposes a dataset of over 1.5 million
breast cancer cases using data from the Surveillance, Epidemiology, and End Results
programme.On both datasets, the authors assessed the performance of three machine
learning algorithms SVM,KNN, and ANN. The algorithms were trained on different
subsets of the data and tested using 10-fold cross-validation.The study’s findings re-
vealed that SVM had the highest classification accuracy on both datasets, followed by
KNN and ANN. The study also discovered that increasing the amount of data used for
training did not necessarily improve classification accuracy, indicating that the size of
the datasets may limit the performance of the algorithms.

”Automated Breast Cancer Detection Models Based on Transfer Learning” (Al-


ruwaili and Gouda 2022) proposes using transfer learning to improve image-based
can- cer diagnosis by fine-tuning pre-trained deep learning models on smaller medical
16
image

17
datasets. They use three pre-trained models to test this approach on two mammogra-
phy datasets: Visual Geometry Group 16 (VGG16), ResNet50, and InceptionV3. The
models are trained with the Keras deep learning framework using binary cross-
entropy loss and the Adam optimizer, with data augmentation techniques used to
increase the training dataset size. The results show that the pre-trained models detect
breast cancer accurately, with ResNet50 outperforming the others. The authors
compare their ap- proach to other cutting-edge methods and show that it outperforms
them.

”A CNN-SVM based computer aided diagnosis of breast Cancer using histogram


K- means segmentation technique”(Sahu et al. 2023) proposes a computer-aided
diagnosis system for detecting breast cancer that uses histogram K-means
segmentation, CNN, and SVM classification. The system uses histogram equalisation
to improve contrast and remove noise from mammogram images before segmenting
them based on inten- sity values using the K-means histogram technique. Based on
the extracted features, a CNN extracts features from the segmented images, and an
SVM classifies the images as cancerous or non-cancerous. Using a publicly available
dataset, the proposed system demonstrated high accuracy, sensitivity, and specificity
in detecting breast cancer.

”Breast Cancer Classification from Mammogram Images Using Enhanced Deep


Learning Features and Equilibrium-Jaya Controlled Regula Falsi-Based Features Se-
lection” (Jabeen et al. 2023) proposes a method for breast cancer classification using
mammogram images, based on enhanced deep learning features and an Equilibrium-
Jaya Controlled Regula Falsi (EJCRF) based feature selection technique.Data collec-
tion The study’s mammogram images were obtained from the DDSM.Preprocessing
to improve image quality, the mammogram images were preprocessed using contrast
en- hancement and noise removal techniques.Deep learning features were extracted
from mammogram images by using a pre-trained deep CNN.Selection of features to
select the most important features, EJCRF algorithm was used classification. The
selected features were used to train a breast cancer classification SVM.The study’s
findings re- vealed that the proposed method classified breast cancer from mammogram
images with an accuracy of 97.52% and a sensitivity of 96.53%.

”A Bottom-Up Review of Image Analysis Methods for Suspicious Region Detection


in Mammograms”(Oza et al. 2021) discusses various image analysis methods for detect-
ing suspicious regions in mammograms, with a focus on feature, clustering, segmen-
tation, and classification-based methods. Deep learning methods, particularly CNNs,
were found to be the most commonly used for suspicious region detection in 96
studies reviewed. The size and diversity of the training and testing datasets were
18
found to be important, with larger and more diverse datasets producing better results.
Texture-based

19
methods were found to be the most effective, particularly those based on Gabor filters
and wavelet transforms. K-means and fuzzy C-means clustering methods were also
effective for segmenting suspicious regions, but due to the variability of mammogram
images, thresholding-based segmentation methods were less effective.

”Breast Cancer Detection, Segmentation and Classification on Histopathology Im-


ages Analysis: A Systematic Review”(Krithiga and Geetha 2021) proposes a compre-
hensive review of the literature on the various approaches used for detecting,
segment- ing, and classifying breast cancer from histopathology images.The study
included 114 research papers published between 2014 and 2019. The authors
categorised the tech- niques used in these papers into six categories: image
enhancement, segmentation, fea- ture extraction, classification, deep learning, and
Computer-Aided Diagnosis (CAD) systems.According to the review, the most
commonly used software tools in the studies were MATLAB and Image. The
methodology included image pre-processing, segmen- tation, and feature extraction,
which were then used for classification. Deep learning approaches such as CNN and
CAD systems were also used in some studies. Deep learning-based approaches had
the highest accuracy in breast cancer detection, followed by machine learning
techniques, according to the findings. The use of advanced feature extraction and
segmentation methods such as multi-scale wavelet transform, multi-level
thresholding, and morphological operations also contributed to the high detection
accu- racy of breast cancer.

”Classification of Breast Cancer in Mammograms with Deep Learning Adding a


Fifth Class” (Castro-Tapia et al. 2021) proposes a deep learning approach is proposed
in the paper to classify mammograms into five categories: normal, benign, malignant,
carcinoma in situ, and atypical hyperplasia. The authors begin their methodology by
pre-processing the data, which includes cropping the mammogram images and perform-
ing normalization, resizing, and histogram equalisation. The dataset is then divided
into training, validation, and testing sets, and data augmentation is performed on the
train- ing set.The CNN model is trained using the augmented data, and its
performance is assessed using the testing set. Accuracy, precision, recall, F1 score,
and area under the Receiver Operating Characteristic Curve (ROC) are among the
evaluation metrics. The results show that the proposed deep learning approach
successfully classifies mammo- grams into the five classes, with an accuracy of
85.22%. The research also shows that adding a fifth class improves overall
classification performance and provides more de- tailed information about breast
tissue abnormalities.

”Automated breast cancer detection by reconstruction independent component


20
anal- ysis (RICA) based hybrid features using machine learning paradigms”(Hussain
et al.

21
2022)proposes a methodology for extracting hybrid features that combined texture,
morphology, cale Invariant Feature Transform, GLCM, entropy, elliptic fourier
descrip- tors, RICA, and sparse filtering techniques.And some machine learning
models like SVM, DT, KNN, and Naive Bayes classifiers to identify breast cancer.
SVM was the best results produced based on textural features using the single feature
extraction ap- proach and it follows sensitivity (92.23%), specificity (94.60%), PPV
(93.16%), NPV (93.85%), TA (93.55%), and ROC AUC (0.9803). On the basis of
textural features, other classifiers performances include SVM Gaussian, which
achieves TA (92.55%), Decision Tree, which achieves TA (87.65%), and Nave
Bayes, which achieves TA (84.65%).

2.2 Project Description

”Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural


Network” (Wang et al. 2018) proposes a new method for action recognition in videos
called Attention-aware Temporal Weighted Convolutional Neural Network (ATW-
CNN). It’s made up of three parts a Temporal Convolutional Network (TCN), an
attention mechanism, and a temporal weighted pooling layer. The TCN extracts the
video’s tem- poral features, while the attention mechanism prioritises important frames
by multiply- ing attention weights and TCN output features. The weighted mean of
the attention- enhanced features is used by the temporal weighted pooling layer to
compute the sig- nificance of each frame. The authors tested their proposed method
on three datasets and compared it to other state-of-the-art methods, demonstrating that
ATW-CNN out- performs them in accuracy and F1-score. The authors also performed
ablation studies to evaluate the impact of each module on ATW-CNN performance,
and discovered that both the attention mechanism and the temporal weighted pooling
layer contribute to improved performance. ATW-CNN achieved an accuracy of
95.4% on the UCF101 dataset, outperforming the previous best method by 2.4%.

”HWDCNN: Multi-class recognition in breast histopathology with Haar wavelet


decomposed image based convolution neural network” (Kausar et al. 2019) proposes
preprocessing and classification are the two stages of the proposed system for mam-
mography image classification. The wavelet transform is applied to the input mam-
mography images during the preprocessing stage to extract features. The Haar wavelet
transform was used by the authors to decompose the image into low-frequency and
high-frequency components, and only the low-frequency component was fed into the
deep learning model. A deep CNN, Inception V3, is fine-tuned for the classification
stage to classify mammography images into benign and malignant classes. The ac-
curacy metric was used in the evaluation, and the proposed system achieved a higher

22
accuracy of 96.87 when compared to other methods evaluated. In addition, the au-
thors compared the performance of various wavelet types and discovered that the
Haar wavelet performed the best.

”Comparative analysis of breast cancer detection using machine learning and biosen-
sors. Intelligent Medicine” (Amethiya et al. 2022) proposes a study comparing the
per- formance of various machine learning algorithms for breast cancer detection,
includ- ing SVM, KNN and Random Forest (RF). Also compared machine learning’s
perfor- mance to that of biosensors based on Surface Plasmon Resonance and
electrochemical impedance spectroscopy.For data analysis and machine learning
model development, the authors used MATLAB software. The models’ performance
was assessed using ac- curacy, sensitivity, specificity, and F1-score as evaluation metrics.
In terms of classifica- tion accuracy, the study’s findings revealed that SVM
outperformed the other machine learning algorithms. The SVM algorithm achieved
95% accuracy, 91 sensitivity, and 98 specificity. In comparison, the biosensors had a
lower accuracy of 82, sensitivity of 75
, and specificity of 89.

”Deep Learning for Breast Cancer Diagnosis from Mammograms—A


Comparative Study” (Tsochatzidis et al. 2019) compared the accuracy of six deep
learning models for mammogram-based breast cancer detection. The models were
trained and tested using pre-processed mammograms, and found fine-tuned the last
fully connected layer and added a new output layer with five classes representing
different stages of cancer and benign cases. With an accuracy of 89.73% and an AUC-
ROC of 0.926, ResNet50 outperformed all other models. Discovered that adding a
fifth class for benign cases improved the models’ performance by allowing them to
more accurately differentiate between benign and malignant cases.
”Tuberculosis disease: Diagnosis by image processing”(Perez C et al. 2019) propose a
health-related image analysis using deep learning use in clinical diagnosis. And other
recent attempts to use deep learning as a tool for diagnostic usage. Chest X-rays are
one way to check for tuberculosis by looking at the X-ray, you can find any
irregularities. This paper suggests a technique for detecting tuberculosis in medical X-
ray imaging. Three alternative classification methods—closest neighbours, logistic
regression, and support vector machines—were used to evaluate the system. The
RESNET50 neural network and deep learning are used to extract the features. The
two classification sce- narios that were used were cross-validation and the creation of
training and test sets. The situation where the training and test sets were created with
an accuracy of more than 85% had the best outcomes.SVM is a classification
technique that performs the best in the two cases used in this paper.

23
2.2.1 Software

”Wavelet based feature extraction method for breast cancer diagnosis” (Benmazou
and Merouani 2018)proposed using a combination of median filtering, adaptive
histogram equalization, and Contrast-limited Adaptive Histogram Equalisation
(CLAHE) to pre- process the mammogram images. These image processing
techniques can be imple- mented using a different type of software tools, including
the MATLAB Image Pro- cessing Toolbox, OpenCV, and Python libraries such as
scikit-image and Pillow.

”Prediction of breast cancer, comparative review of machine learning techniques,and


their analysis” (Fatima et al. 2020) implement and evaluate machine learning algorithms
using Python and several popular libraries, including Scikit-learn, Pandas, NumPy,
and Matplotlib.

2.3 Goals

A major global public health issue that affects millions of women is breast cancer.
Early detection and timely treatment are essential in lowering mortality rates because this
con- dition is the main cause of death for women. However, there is a frequently used
man- ual interpretation of mammography scans subjective and error-prone when
utilised as a breast cancer screening method. By creating an image processing and
machine learn- ing system for the early identification and prediction of breast cancer
recurrence, this study seeks to address these shortcomings. The technology will
employ machine learn- ing algorithms to analyse digital mammography images in
order to detect breast cancer indications and forecast the possibility of a
recurrence.This project’s ultimate goal is to develop a non-invasive, effective tool for
early detection and prediction, advancing breast cancer diagnosis and therapy. A
series of processes, including pre-processing, segmentation, morphological
operations, and edge detection, were used to process the mammogram picture of the
malignant breasts. Following that, as described in Fig.4.1, various edge detection
approaches were used to assess the resulting images. Several wavelet transforms are
utilised with a 2D wavelet analyzer tool for the mammography image. Using
performance metric, as shown in Fig.4.2, different edge detection meth- ods and
wavelet transforms were evaluated for their efficacy.Additionally, a data set from the
University of California Irvine (UCI) machine learning repository was used to train
different machine learning models and neural networks to predict recurrence .
Performance metrics were used to assess how well the trained models predicted recur-
rence, as shown in Fig.4.3.

24
CHAPTER 3

TECHNICAL SPECIFICATION

3.1 Software Specification

3.1.1 MATLAB

High-level programming language and numerical computing environment Matrix


Lab- oratory (MATLAB) is widely used in research, engineering, and finance.
MathWorks created and keeps up with it.
The capability of MATLAB to handle complex mathematical computations, its
sup- port for matrix operations, and its substantial library of built-in functions for
tasks like signal processing, image analysis, and optimisation are some of its
important strengths. Applications for MATLAB are numerous and include:
Analysing and displaying data researchers, engineers, and scientists use MATLAB
because it has robust capabilities for data analysis and visualisation. MATLAB is the
perfect platform for building and testing algorithms for a variety of applications, from
image processing to machine learning. Modelling and simulation are both supported
by MATLAB, which includes models of physical, biological, and financial systems.
In image and signal processing large range of built-in functions like filtering,
feature extraction, and pattern recognition are available in MATLAB, which is
frequently used for processing and analysing images and signals.
As a conclusion, MATLAB is a potent tool that has a wide range of uses, from
scientific research to industrial design and development.

25
3.1.2 Wavelet Analyzer

Wavelet analysis is a mathematical technique for multi-resolution signal and picture


analysis. The Wavelet Analyzer tool is one of the many features offered by MATLAB.
A graphical user interface is offered by the Wavelet Analyzer app in MATLAB
for exploring and analysing signals using wavelets. This program allows to visualise
wavelet coefficients, conduct wavelet transforms, and analyse signals in terms of their
time-frequency.
The Wavelet Analyzer tool allows you to load data from different sources,
including MATLAB variables, workspaces, files, and URLs.The kind of wavelet and
the degree of decomposition to perform a wavelet transform can be adjusted. The
wavelet coeffi- cients in the time-frequency plane is visualized using graphs,
including the scalogram and spectrogram.The Wavelet Analyzer software offers tools
for wavelet transforms as well as signal denoising utilising wavelet thresholding
methods, feature extraction, and classification of signals using wavelet-based feature
vectors.
As a conclusion, the MATLAB Wavelet Analyzer is a potential tool for investigating
and examining signals in a multi-resolution framework employing wavelets.

26
3.1.3 Jupyter Notebook

Jupyter notebook is a tool for live code, equations, visualisations, and narrative text
using the open-source web application. Python, R, and Julia are just a few of the pro-
gramming languages that are supported. In jupyter notebook code is written in cells
and run it in an interactive environment. Each cell may include raw text, Markdown
text, or code. The output of a code cell is shown just below the cell when you run it.
This enables flexible and iterative testing and experimentation with code.
The capability of Jupyter Notebook to produce visualisations and plots right in-
side the notebook is one of its key advantages. To generate dynamic charts and graphs
and even include multimedia content like photographs and movies,utilization of visu-
alisation tools like Matplotlib and Seaborn are available. GitHub, Google Colab, and
nbviewer are just a few of the platforms that make it simple to share and collaborate
on notebooks. This makes it an effective tool for scientific study, education, and data
analysis. Jupyter Notebook is an all-around flexible and strong tool that lets you
create and share interactive documents with live code, visualisations, and narrative
text. It is extensively used in many different industries, including data science,
machine learning, and academic study.

3.2 References

1) Particle Swarm Optimization Feature Selection for Breast Cancer Recurrence


Predic- tion(Sapiah et al,2018)

2) Attention-aware Heterogeneous Graph Neural Network(Jintao et al,2021)

3)Action Recognition by an Attention-Aware Temporal Weighted Convolutional


Neural Network( Le wang et al,2018)

4) HWDCNN: Multi-class recognition in breast histopathology with Haar wavelet


decomposed image based convolution neural network” by Kausa et al(2019)

5) Wavelet based feature extraction method for breast cancer diagnosis” by


Benma- zou et al(2022)

6) Prediction of breast cancer, comparative review of machine learning techniques,


and their analysis( Fatima N et al,2020)

7) Early diagnosis of breast cancer using image processing techniques( YS Younis


et al,2022).

27
8) ”Tuberculosis disease: Diagnosis by image processing” by Parez et al(2019).

9) Comparative analysis of breast cancer detection using machine learning and biosen-
sors (Y Amethya et al,2022).

10) On the scalability of machine-learning algorithms for breast cancer prediction


in big data context (Sara Alghunaim et al,2019)

11) A novel wavelet decomposition and transformation convolutional neural net-


work with data augmentation for breast cancer detection using digital mammogram(Oyelade
et al,2022).

12) Automated Breast Cancer Detection Models Based on Transfer Learning (Madal-
lah Alruwaili et al,2022).

13) A CNN-SVM based computer aided diagnosis of breast Cancer using


histogram K-means segmentation technique (Yatendra Sahu et al 2022)

14) Breast Cancer Classification from Mammogram Images Using Enhanced Deep
Learning Features and Equilibrium-Jaya Controlled Regula Falsi-Based Features Se-
lection (Kiran Jabeen et. al ,2023)

15) A Bottom-Up Review of Image Analysis Methods for Suspicious Region


Detec- tion in Mammograms(Parita Oza et. Al,2021)

16) Breast Cancer Detection, Segmentation and Classification on Histopathology


Images Analysis: A Systematic Review ( R. Krithiga et al,2020).

17) Classification of Breast Cancer in Mammograms with Deep Learning Adding a


Fifth Class( Salvador et al,2021).

18) Deep Learning for Breast Cancer Diagnosis from Mammograms—A Compara-
tive Study (Lazaros Tsochatzidis et al,2019)

19) Automated breast cancer detection by reconstruction independent component


analysis (RICA) based hybrid features using machine learning paradigms(Hussain et
al,2022)

28
CHAPTER 4

DESIGN APPROACH AND DETAILS

4.1 Design Approach / Materials and Methods

4.1.1 Processing the Mammogram image using different Edge Detection Methods

The Mammogram image of a cancerous breast is taken. It is then pre-processed which


removes noise that is present in the image using different filters like Gaussian. Then
the image is segmented, this is an important step because it allows us to identify the
Region of Interest (ROI) for further investigation. After the segmentation process the
image undergoes different morphological operations such as erosion and dilation, to
change the shape and size of objects in an image. Erosion is used in mammography
image processing to remove small regions from the image, while dilation is used to
fill in any gaps in the ROI. Then four operators are used to do edge detection. This
step involves determining the boundaries of the objects in the image. The operators
used are LoG, Canny Edge Detection, Sobel operator and Prewitt Operator. After
performing edge detection the result image highlights the tumor part and quality of
image has changed. To evaluate four operators, performance metrics like closest
distance metric, pixel cor- respondence, grey scale FOM and PSNR are used. The
model evaluation is done and the operators are compared using performance metrics,
as shown in Fig. 4.1.

29
Fig. 4.1 Flow diagram of using different Edge Detection Techniques

4.1.2 Image Processing using 2D Wavelet Analyzer

Using the 2D wavelet Analyzer toolbox in MATLAB, different type of 2D wavelets


including the Haar, Daubechies (db), Symmetric(sym), Coiflets (coif), Biorthogonal
(Bior), Reverse Biorthogonal (Rbio), Daubechies-Meyer (Dmey), Farid-kim (fk) wavelets
are used individually and as a result synthesized images are generated. Then evaluate
the model with the performance metrics like closest distance metric, pixel correspon-
dence, grey scale FOM and PSNR are used, as shown in Fig. 4.2.

Fig. 4.2 Flow diagram of using different Wavelet Transform

30
4.1.3 Recurrence Prediction Using Machine Learning

A Breast cancer data set is taken from UCI machine learning repository. The source
is from institute of Oncology University Medical Center Ljubljana, Yugoslavia. The
data set includes 201 instances of one class and 85 instances of another class. The
instances are described by 9 features, the features are Age, menopause, tumor-size, in-
vasive nodes, node capsules, degree of malignancy, breast (right or left), breast quadrant
and irradiation, and two events in a class- non recurrence and recurrence.
The data set is taken, processed and fed into the four machine learning classifiers De-
cision Tree, K Nearest Neighbor, Gaussian Naive Bayes and Support Vector Machine.
Also Attention Awareness which is a neural network implementation in machine
learn- ing. Then the machine learning models are trained with the processed data and
predicted the recurrence event by deploying the models. After the prediction models
were eval- uated by determining the accuracy,sensitivity, precision, F1 score and
Cohen’s Kappa score. Finally evaluate the model and compare these different
classifiers according to the performance metrics, as shown in Fig. 4.3.

Fig. 4.3 Flow diagram of using Deep Learning

31
4.2 Codes and Standards

IEEE 1471-2000, which provides a framework for software architecture


IEEE 1012-2016, which provides guidelines for software verification and validation.
MATLAB – MISRA, ISO/IEC 9126: ISO/IEC 9126, IEEE 1063, which provides guide-
lines for software quality metrics.
IEEE 829, which provides guidelines for software test documentation.
DICOM, Digital Imaging and Communications in Medicine is the international stan-
dard for medical images and related information. It specifies the formats for
exchanging medical images with the information and quality required for clinical
usage.

4.3 Constraints, Alternatives and Tradeoffs

4.3.1 Constraints

Due to a lack of resources and privacy issues, high quality data sets for image
process- ing are not easily accessible.
Due to a lack of resources, training data sets with appropriate characteristics for recur-
rence prediction are also unavailable. and some of the information was corrupted.
While using online MATLAB software, the processing time was high.

4.3.2 Alternatives

An image of breast with cancerous tissue from kaggle was used for image processing
because access to mammography scans was restricted.
Due to a lack of enough data, a data set from UCI with less features was used to
predict recurrence.

4.3.3 Tradeoffs

As a result of using a single data set for training and testing the machine learning
model the accuracy of predictive models was inconsistent.

32
CHAPTER 5

SCHEDULE, TASKS AND MILESTONES

5.1 Schedule

Table 5.1 Timeline of Project Tasks

Task Description Duration


Problem State- Chose the title of the Project which is “Breast Cancer and its September
ment, Literature Recurrence Identification using Image Processing and Deep to October
Survey, Data set Learning Techniques” and defined the problem statement.
Collection Re- searched about 25 to 30 articles,journals and did
literature sur- vey. Acquired the dataset for machine learning.
Decided the al- Decided which all algorithms and operators are going to be October to
gorithms and used for Image processing, machine learning and wavelet November
per- formance ana- lyzers and also what performance metrics should use to
metrics for the evalu- ate and compare the algorithms.
models
Image Processing Started with pre-processing the mammogram image, did seg- November
Model mentation and morphological operations too. And edge detec- to Decem-
tion with Canny Algorithm ber
Image processing Finished all the other algorithms like LOG, Sobel and Prewitt December
model continued and successfully evaluated the model with the different to January
using 2D wavelet perfor- mance metrics. Also started with the 2D wavelet
analyzer analyzer
2D wavelet an- Implemented 8 types of wavelets and synthesized the images. January to
alyzer, Machine Then successfully evaluated them using performance metrics. February
Learning Machine Learning process started by data cleaning and data
splitting. And did 2 algorithms namely DT and GNB.
Machine Learn- Finished off with rest of the algorithms and evaluated the February to
ing Cont., Atten- model using performance metrics and compared the algorithms March
tion Awareness with each other to find out which one is the best. Also imple-
mented the attention awareness to this process
Thesis The final step was to compile all the works and prepared the March to
project thesis and research paper. April

Table 5.1 describes the project tasks done from September to April

33
5.2 Tasks and Milestones

Fig. 5.1 Gantt Chart

Fig.5.1 shows the gantt chart of the task and milestone of the project from September
to April

34
CHAPTER 6

PROJECT DEMONSTRATION

6.1 Image Processing Using different Edge Detection Methods on


Mam- mogram

For identifying the tumors tissue in the mammogram image as in Fig.1.1 is subjected
to different edge detection methods, including LoG, Canny, Sobel and Prewitt
operators in MATLAB. And as a result processed images are obtained. The processed
images are then compared with original image and evaluated using performance
metric.

6.1.1 LoG Algorithm

Fig. 6.1 Output of LoG Operator

Fig.6.1 shows the edge detected image after applying LoG operator.

35
6.1.2 Canny Algorithm

Fig. 6.2 Output of Canny algorithm

Fig.6.2 shows the edge detected image after applying Canny operator.

6.1.3 Sobel Algorithm

Fig. 6.3 Output of Sobel Operator

Fig.6.3 shows the edge detected image after applying Sobel operator.

36
6.1.4 Prewitt Algorithm

Fig. 6.4 Output of Prewitt Operator

Fig.6.4 shows the edge detected image after applying Prewitt operator.

37
6.2 Image Processing Using 2D Wavelet Analyzer

The Mammogram image as in Fig.1.1 is subjected to 2D wavelet transform using variety


of wavelets like the Haar, Daubechies (db), Symmetric(sym), Coiflets (coif),
Biorthog- onal (Bior), Reverse Biorthogonal (Rbio), Daubechies-Meyer (Dmey),
Farid-kim (fk) in MATLAB Wavelet Analyzer tool. As a result sysnthesised images
are obtained. The obtained images are compared with original image and evaluated
using performance metric.

6.2.1 Haar Wavelet

Haar wavelet is a simple orthogonal wavelet transform. It performs 2D Haar wavelet


transform on mammogram image Fig.1.1 using wavelet analyzer in MATLAB.
Fig.6.5 shows the synthesised image after applying Haar wavelet.

Fig. 6.5 Synthesised image of Haar Wavelet

38
6.2.2 Dauechies Wavelet

Dauechies wavelet is an extension of Haar, it uses long filters and smooth scaling
func- tions on mammogram image Fig.1.1.
Fig.6.6 shows the synthesised image after applying Dauechies wavelet.

Fig. 6.6 Synthesised image of Dauechies Wavelet

6.2.3 Symmetric Wavelet

Symmetric wavelet is similar to Dauechies wavelet but has different filter coefficients.Symmetric
wavelet is helpful for pictures with sharp edges and colour.
Fig.6.7 shows the synthesised image after applying Symmetric wavelet.

39
Fig. 6.7 Synthesised image of Symmetric Wavelet

6.2.4 Coiflets Wavelet

Coiflets have a more compact support, which results in fewer coefficients and a more
effective representation of image data.

Fig. 6.8 Synthesised image of Coiflets Wavelet

Fig.6.8 shows the synthesised image after applying coiflets wavelet.

40
6.2.5 Biorthogonal Wavelet

Biorthogonal wavelet are not necessarily orthogonal, it has two set of filters to divide
the signal or image into various frequency bands.
Fig.6.9 shows the synthesised image after applying Biorthogonal wavelet.

Fig. 6.9 Synthesised image of Biorthogonal Wavelet

6.2.6 Reverse Biorthogonal Wavelet

Reverse biorthogonal wavelet used to capture image or signal very precisely, they
elim- inate the noise without distorting the original image.
Fig.6.10 shows the synthesised image after applying Reverse Biorthogonal wavelet.

41
Fig. 6.10 Synthesised image of Reverse Biorthogonal Wavelet

6.2.7 Daubechies-Meyer Wavelet

Daubechies-Meyer wavelet has smooth and compact support, it helps to get


synthesised image that need higher precision in image or signal reading.
Fig.6.11 shows the synthesised image after applying Daubechies-Meyer wavelet.

Fig. 6.11 Synthesised image of Daubechies-Meyer Wavelet

42
6.2.8 Farid-kim Wavelet

Farid-kim wavelet has high levels of directional selectivity and spatial localization. it
also has compact support to record the signal or image details very precisely.
Fig.6.12 shows the synthesised image after applying Farid-kim wavelet.

Fig. 6.12 Synthesised image of Farid-kim Wavelet

6.3 Recurrence Prediction Using Deep Learning

The data set acquired from UCI machine learning repository is processed and fed into
the machine learning models like Decision Tree, K Nearest Neighbor, Gaussian Naive
Bayes, Support Vector Machine and Attention Awareness model. The models are de-
ployed and predicted the outcomes. And the models are evaluated using performance
metric.

43
Fig. 6.13 Different Machine Learning Algorithms

Fig.6.13 shows the different machine learning algorithms used for recurrence pre-
diction.

Fig. 6.14 Attention Layer

Fig.6.14 shows the attention awareness mechanism used for recurrence prediction.

44
Fig. 6.15 Model Prediction

Fig.6.15 shows the model prediction of recurrence using the data.

45
CHAPTER 7

COST ANALYSIS / RESULT AND DISCUSSION

7.1 Cost Analysis

7.1.1 MATLAB

The image processing part of the project was executed on MATLAB/Simulink,this com-
ponent is free of cost as they come with a university license. Nevertheless, the costs of
these licenses are $45.

7.1.2 Jupyter Notebook

The recurrence prediction part of the project is done using Jupyter Notebook and it is
an open software.

46
7.2 Results and Discussion

7.2.1 Image Processing Using different Edge Detection Methods

After the tumor identification in mammogram using different edge detection methods,
the output images are compared with input image and evaluated using performance met-
ric.

Table 7.1 Performance Metrics of Edge Detection Methods

Methods LoG Canny Sobel Prewitt


Closest Distance 16721576 24122940 27098932 3056300
Metric
Pixel Correspon- 0.7390 0.0180 0.0213 0.0189
dence Metric
Grey scale FOM 0.03361 1.0009 1.0149 1.0188
PSNR 8.1112 20.6352 15.621 21.074

From Table 7.1 performance metric of different edge detection methods, the Canny
algorithm appears to do the best overall, according to the metrics. The Canny algo-
rithm is able to precisely identify the related pixels on each side of an edge because it
has the highest pixel correspondence metric. The Canny algorithm also has the
highest grey scale FOM, demonstrating its ability to recognise edges precisely
regardless of the brightness of the image. The Canny algorithm, which produces high-
quality images after edge identification, has the highest PSNR of all the algorithms.
The Sobel and Prewitt algorithms have significantly lower PSNR values than the
Canny algorithm, they do perform well in terms of pixel correspondence and grey
scale FOM values. The LoG algorithm performs the worst overall, with the lowest
closest distance metric and PSNR values.

47
7.2.2 Image Processing Using 2D Wavelet Analyzer

After the image processing using 2D wavelet analyzer tool in MATLAB, the synthesised
images compare with input image and the wavelets are evaluated using performance
metric.

Table 7.2 Performance Metrics of Wavelet Transform

Pixel
Closest Distance
Wavelets Correspondence Grey scale FOM PSNR
Metric
Metric
Haar 27354068 0.0595 1.2976 14.5267
Db 28952916 0.0558 1.2783 14.9116
Sym 27354068 0.0595 1.2976 14.5267
Coif 28952916 0.0558 1.2783 14.9116
Bior 28952916 0.0558 1.2783 14.9116
Rbio 27354068 0.0595 1.2976 14.5267
Dmey 27354068 0.0595 1.2976 14.5267
Fk 27354068 0.0595 1.2976 14.5267

From Table 7.2 performance metrics of different wavelet transforms, all of them
generated comparable outcomes based on the metric. In fact, all wavelet families
share the same closest distance and grey form, and all share a similar range for pixel
corre- spondence and PSNR.

48
7.2.3 Recurrence Prediction Using Deep learning

The data set is fed into the machine learning models and Attention awareness, models
trained, deployed and predicted the outcomes. After the prediction the models are
eval- uated using performance metric.

Table 7.3 Performance Metrics of Machine Learning models

Sensitiv- Cohen’s
Models Accuracy F1 score Precision
ity(recall) kappa score
DT 62.765 32.258 36.363 41.666 0.1064
GNB 71.276 51.612 54.237 57.142 0.3338
KNN 58.510 32.258 33.898 35.714 0.0377
SVM 63.829 54.838 50.0 45.945 0.2201
Attention 82.89 40.0 53.33 80.0 0.337
awareness

From Table 7.3 performance metrics of different machine learning models, the atten-
tion mechanism appears to be the best performing algorithm, with the highest
accuracy (82.89%) and precision (80.0%), as well as a strong F1 score (53.33%).
When com- pared to other algorithms like GNB and SVM, it has a lesser
sensitivity/recall (40.0%). The Cohen’s kappa score (0.337) is higher than that of the
other algorithms, showing better agreement between predicted and actual labels.
With the second highest accuracy (71.276%) and F1 score (54.237%), as well as the
highest precision (57.142%), GNB perform well. It also has the highest sensitivity/recall
(51.612%), showing that it is effective at correctly identifying positive instances. Its Co-
hen’s kappa score (0.3338) is also greater than those of the other algorithms,
indicating improved agreement between predicted and actual labels.
With the accuracy (63.829%), sensitivity/recall (54.838%), and a respectable F1 score
(50.0%), SVM also performs well. Even though, it has a lesser precision than GNB
(45.945%)and greater than attention mechanism. Except for precision, decision trees
perform the worst across all parameters. KNN also performs fairly poorly across all
parameters, suggesting that it might not be the best for the recurrence prediction. . As
a result, the attention process is suggested for predicting cancer recurrence.

49
CHAPTER 8

SUMMARY

8.1Summary

The specific data set and evaluation parameters can affect the performance of various
algorithms. Based on the measures considered, it appears that the Canny algorithm is
overall the best for the image processing task. The evaluation of various wavelet fam-
ilies in image compression leads to the conclusion that, at least for this particular data
set and evaluation metric, the choice of wavelet family may not significantly affect
the overall quality of image compression. Similar to how attention awareness
mechanism looks to be the highest performing algorithm overall for the classification
problem and recurrence prediction based on the metrics evaluated. The use of an
attention mecha- nism with RNNs helps the model to concentrate on the most relevant
parts of the input sequence at each time step to make accurate prediction.

50
REFERENCES

Alghunaim, S. and Al-Baity, H. H. (2019), ‘On the scalability of machine-learning al-


gorithms for breast cancer prediction in big data context’, Ieee Access 7, 91535–
91546.

Alruwaili, M. and Gouda, W. (2022), ‘Automated breast cancer detection models based
on transfer learning’, Sensors 22(3), 876.

Amethiya, Y., Pipariya, P., Patel, S. and Shah, M. (2022), ‘Comparative analysis of
breast cancer detection using machine learning and biosensors’, Intelligent Medicine
2(2), 69–81.

Benmazou, S. and Merouani, H. F. (2018), Wavelet based feature extraction method for
breast cancer diagnosis, in ‘2018 4th International Conference on Advanced
Technolo- gies for Signal and Image Processing (ATSIP)’, IEEE, pp. 1–5.

Castro-Tapia, S., Castan˜eda-Miranda, C. L., Olvera-Olvera, C. A., Guerrero-Osuna,


H. A., Ortiz-Rodriguez, J. M., Mart´ınez-Blanco, M., D´ıaz-Florez, G., Mendiola-
Santiban˜ez, J. D., Sol´ıs-Sa´nchez, L. O. et al. (2021), ‘Classification of breast cancer
in mammograms with deep learning adding a fifth class’, Applied Sciences 11(23),
11398.

Fatima, N., Liu, L., Hong, S. and Ahmed, H. (2020), ‘Prediction of breast cancer,
comparative review of machine learning techniques, and their analysis’, IEEE Access
8, 150360–150376.

Hussain, L., Qureshi, S. A., Aldweesh, A., Pirzada, J. u. R., Butt, F. M., Eldin, E. T.,
Ali, M., Algarni, A. and Nadim, M. A. (2022), ‘Automated breast cancer detection
by reconstruction independent component analysis (rica) based hybrid features using
machine learning paradigms’, Connection Science 34(1), 2784–2806.

Jabeen, K., Khan, M. A., Balili, J., Alhaisoni, M., Almujally, N. A., Alrashidi, H., Tariq,
U. and Cha, J.-H. (2023), ‘Bc2netrf: Breast cancer classification from mammogram
images using enhanced deep learning features and equilibrium-jaya controlled regula
falsi-based features selection’, Diagnostics 13(7), 1238.

Kausar, T., Wang, M., Idrees, M. and Lu, Y. (2019), ‘Hwdcnn: Multi-class
recognition in breast histopathology with haar wavelet decomposed image based
51
convolution neural network’, Biocybernetics and Biomedical Engineering 39(4), 967–
982.

52
Krithiga, R. and Geetha, P. (2021), ‘Breast cancer detection, segmentation and classifi-
cation on histopathology images analysis: a systematic review’, Archives of Computa-
tional Methods in Engineering 28, 2607–2619.

Oyelade, O. N. and Ezugwu, A. E. (2022), ‘A novel wavelet decomposition and trans-


formation convolutional neural network with data augmentation for breast cancer de-
tection using digital mammogram’, Scientific Reports 12(1), 5913.

Oza, P., Sharma, P., Patel, S. and Bruno, A. (2021), ‘A bottom-up review of image
analysis methods for suspicious region detection in mammograms’, Journal of
Imaging 7(9), 190.

Perez C, E., Morales C, W., Guzman C, R. and Cordova F, T. (2019), ‘Tuberculosis


disease: Diagnosis by image processing’.

Sahu, Y., Tripathi, A., Gupta, R. K., Gautam, P., Pateriya, R. and Gupta, A. (2023),
‘A cnn-svm based computer aided diagnosis of breast cancer using histogram k-
means segmentation technique’, Multimedia Tools and Applications 82(9), 14055–
14075.

Sakri, S. B., Rashid, N. B. A. and Zain, Z. M. (2018), ‘Particle swarm optimization


feature selection for breast cancer recurrence prediction’, IEEE Access 6, 29637–29647.

Tsochatzidis, L., Costaridou, L. and Pratikakis, I. (2019), ‘Deep learning for breast
can- cer diagnosis from mammograms—a comparative study’, Journal of Imaging
5(3), 37.

Wang, L., Zang, J., Zhang, Q., Niu, Z., Hua, G. and Zheng, N. (2018), ‘Action recog-
nition by an attention-aware temporal weighted convolutional neural network’,
Sensors 18(7), 1979.

Younis, Y. S., Ali, A. H., Alhafidhb, O. K. S., Yahia, W. B., Alazzam, M. B., Hamad,
A. A. and Meraf, Z. (2022), ‘Early diagnosis of breast cancer using image processing
techniques’, Journal of Nanomaterials 2022, 1–6.

Zhang, J. and Xu, Q. (2021), ‘Attention-aware heterogeneous graph neural network’,


Big Data Mining and Analytics 4(4), 233–241.

53
LIST OF PUBLICATION

1.G.K. Rajini, Alan George, Thomas Tom, Arshaque Abdusalam. ”Detection and
Recurrence of Breast Cancer through Image Processing and Attention Aware-
ness: A Comparative Analysis of Algorithms.”Communicated to Asian Journal
of Pharmaceutical and Clinical research.(2023)

54
CURRICULUM VITAE

55
56
57

You might also like