CAPSTONE Project Report Model
CAPSTONE Project Report Model
Bachelor of Technology
in
May,2023
DECLARATION
I here by declare that the thesis entitled “Breast Cancer and its Recurrence
Identification using Image Processing and Deep Learning Techniques” submitted
by Alan George (19BEI0001), Arshaque Abdusalam (19BEI0036) and Thomas Tom
(19BEI0051), for the award of the degree of Bachelor of Technology in Electronics
and Instrumentation Engineering to Vellore Institute of Technology, Vellore is a
record of bonafide work carried out by me under the supervision of Dr. G.K.Rajini,
Professor, School of Electrical Engineering, VIT University, Vellore.
I further declare that the work reported in this thesis has not been submitted
and will not be submitted, either in part or in full, for the award of any other degree or
diploma in this institute or any other institute or university.
Place: Vellore
This is to certify that the thesis entitled “Breast Cancer and its Recurrence
Identification using Image Processing and Deep Learning Techniques” submitted
by Alan George (19BEI0001), Arshaque Abdusalam (19BEI0036) and Thomas
Tom (19BEI0051), School of Electrical Engineering, Vellore Institute of Technology,
Vellore, for the award of the degree of Bachelor of Technology in Electronics and In-
strumentation Engineering , is a record of bonafide work carried out by him under my
supervision during the period, 01.09.2022 to 30.04.2023,, as per the Vellore Institute
of Technology code of academic and research ethics.
The contents of this report have not been submitted and will not be submitted
either in part or in full, for the award of any other degree or diploma in this institute or
any other institute or university. The thesis fulfills the requirements and regulations of
the University and in my opinion meets the necessary standards for submission.
Place: Vellore
With immense pleasure and deep sense of gratitude, I wish to express my sincere
thanks to my supervisor Dr. G.K.Rajini, Professor, School of Electrical Engineering,
Vellore Institute of Technology, Vellore, without her motivation and continuous
encour- agement, this project work would not have been successfully completed.
I am grateful to the Chancellor of Vellore Institute of Technology, Vellore, Dr.
G.Viswanathan, the Vice Presidents, the Vice Chancellor for motivating me to carry
out research in the Vellore Institute of Technology, Vellore and also for providing me
with infrastructural facilities and many other resources needed for my project.
I express my sincere thanks to Dr.Mathew M Noel, Dean, School of Electrical
Engineering, Vellore Institute of Technology, Vellore for his kind words of support
and encouragement. I like to acknowledge the support rendered by my colleagues in
several ways throughout my project work.
I wish to extend my profound sense of gratitude to my parents for all the sac-
rifices they made during my research and also providing me with moral support and
encouragement whenever required.
Alan George
Place: Vellore Arshaque Abdusalam
Thomas Tom
Date: 20/04/2023 Student Names
i
Executive Summary
Breast Cancer is a major global public health issue that affects millions of women.
Early detection and timely treatment are essential in lowering mortality rates. Mam-
mography and biopsies, the two most common procedures for finding breast cancer,
and they can be invasive and subject to false positive and negative results. The goal of
this project is to create a system that can detect breast cancer using image processing
and its recurrence through deep learning.
The suggested system is divided into two modules. The system will first use digital
mammography images to identify indications of breast cancer using a different types
of image processing techniques and edge detection techniques like Laplacian of
Gaussian (LoG), Canny, Sobel, Prewitt algorithms. The mammogram image will be
processed to improve image quality and retrieve relevant information. In order to
predict the like- lihood of recurrence, machine learning algorithms like Decision Tree
(DT), K Nearest Neighbour (KNN), Gaussian Na¨ıve Bayes (GNB), Support Vector
Machine (SVM) and Attention Awareness will analyse the data acquired with relevant
information. The goal is to provide a non-invasive and effective tool for early
diagnosis and prediction of breast cancer. The suggested system has the potential to
improve the accuracy and ef- ficiency of breast cancer detection. Through
performance metric like Closest Distance Metric, Pixel Correspondence Metric, Grey
Scale Figure of Merit (FOM) and Peak Sig- nal to Noise Ratio (PSNR) for image
processing. Accuracy, sensitivity, precision, f1 score, Cohen’s kappa score for
machine learning, the system compares various image processing and machine
learning methods to determine which one is most accurate at detecting tumor and its
recurrence.
ii
TABLE OF CONTENTS
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Executive Summary.....................................................................................................ii
List of Figures..............................................................................................................vi
List of Tables..............................................................................................................vii
List of Terms and Abbreviations.............................................................................viii
1 INTRODUCTION 1
1.1 Objective.............................................................................................................1
1.2 Motivation..........................................................................................................1
1.3 Background.........................................................................................................2
1.4 Digital Image Processing....................................................................................2
1.4.1 Image Acquisition and Pre-processing...................................................2
1.4.2 Morphological Operation.......................................................................3
1.4.3 Image Segmentation...............................................................................3
1.4.4 Edge Detection.......................................................................................3
1.5 Edge Detection Techniques................................................................................4
1.5.1 Laplacian of Gaussian(LoG)..................................................................4
1.5.2 Canny Algorithm....................................................................................4
1.5.3 Sobel Operator........................................................................................5
1.5.4 Prewitt Operator.....................................................................................6
1.6 Mammogram Image...........................................................................................7
1.7 Wavelet Transform.............................................................................................8
1.7.1 2D Wavelet Analyzer.............................................................................8
1.8 Machine Learning Models..................................................................................8
1.8.1 Decision Tree..........................................................................................9
1.8.2 K Nearest Neighbor..............................................................................10
iii
1.8.3 Support Vector Machine.......................................................................11
1.8.4 Gaussian Na¨ıve Bayes...............................................................................12
1.8.5 Attention Awareness Mechanism.........................................................12
3 TECHNICAL SPECIFICATION 20
3.1 Software Specification......................................................................................20
3.1.1 MATLAB.............................................................................................20
3.1.2 Wavelet Analyzer.................................................................................21
3.1.3 Jupyter Notebook..................................................................................22
3.2 References........................................................................................................22
iv
6 PROJECT DEMONSTRATION 30
6.1 Image Processing Using different Edge Detection Methods on Mammogram
30 6.1.1
LoG Algorithm.................................................................................................30
6.1.2 Canny Algorithm..................................................................................31
6.1.3 Sobel Algorithm...................................................................................31
6.1.4 Prewitt Algorithm.................................................................................32
6.2 Image Processing Using 2D Wavelet Analyzer...............................................33
6.2.1 Haar Wavelet........................................................................................33
6.2.2 Dauechies Wavelet...............................................................................34
6.2.3 Symmetric Wavelet..............................................................................34
6.2.4 Coiflets Wavelet...................................................................................35
6.2.5 Biorthogonal Wavelet...........................................................................36
6.2.6 Reverse Biorthogonal Wavelet.............................................................36
6.2.7 Daubechies-Meyer Wavelet.................................................................37
6.2.8 Farid-kim Wavelet................................................................................38
6.3 Recurrence Prediction Using Deep Learning...................................................38
8 SUMMARY 45
8.1 Summary...........................................................................................................45
REFERENCES......................................................................................................45
LIST OF PUBLICATIONS............................................................................48
CURRICULUM VITAE.................................................................................49
v
LIST OF FIGURES
vi
LIST OF TABLES
vii
LIST OF TERMS AND ABBREVIATIONS
2D Two Dimensional.....................................................................................................2
Bior Biorthogonal.........................................................................................................8
coif Coiflets...................................................................................................................8
db Daubechies...............................................................................................................8
Dmey Daubechies-Meyer..............................................................................................8
DT Decision Tree..........................................................................................................ii
fk Farid-kim..................................................................................................................8
viii
PSNR Peak Signal to Noise Ratio................................................................................ii
RF Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
sym Symmetric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
ix
CHAPTER 1
INTRODUCTION
1.1 Objective
Breast cancer is a critical public health concern that affects millions of women
globally. As the leading cause of death among women, early detection and prompt
treatment are crucial in reducing mortality rates. However, the manual interpretation
of mammogra- phy images, widely used as a screening tool for breast cancer, is
subjective and prone to errors. This study aims to address these limitations by
developing an image pro- cessing and machine learning system for the early detection
and prediction of breast cancer recurrence. The system will use digital mammography
images and apply ma- chine learning algorithms to identify signs of breast cancer and
predict the likelihood of recurrence. The ultimate goal of this project is to provide a
non-invasive and efficient tool for early detection and prediction, contributing to the
advancement of breast cancer diagnosis and treatment.
1.2 Motivation
Mammography, ultrasound, and Magnetic Resonance Imaging (MRI) are just a few of
the medical imaging procedures that have been widely utilized to help diagnose breast
cancer early but there are some limitations, image processing and deep learning meth-
ods, which would allow for more precise and reliable breast cancer detection and re-
currence prediction. We can increase the accuracy of breast cancer diagnosis, lower the
number of insignificant biopsies, and ultimately save more lives by creating a breast
cancer detection system based on image processing and deep learning.
1
1.3 Background
Breast cancer is the most common cancer among women, according to the World Health
Organization, with an estimated 2.3 million new cases diagnosed each year and over
685,000 fatalities, Early detection and prompt treatment are crucial in reducing
fatality rates. Using image processing and deep learning techniques, identify the signs
of breast cancer and predict the recurrence.
A digital image is described as a two-dimensional function f (x, y), where x and y are
spatial (plane) coordinates. The intensity of the image is defined as the amplitude of f
at any given combination of coordinates. The image is made up of a finite number of
components, each with a distinct position and value, and these components are called
as pixels.A Two Dimensional (2D) digital image can be represented as Eqn. 1.1.
f (x, y) = f (0, 0) f (0, 1) ··· f (0, N − 1)
(1.1)
. . .
f (1, 0) f (1, .1) ··· f (1, N −
1) . .
.
.
f (M − 1, 0) f (M −. 1, 1) · · · f (M − 1, N − 1)
Digital image processing is the method of processing and analyzing digital images us-
ing algorithms and mathematical models. Digital image processing is done to improve
image quality, extract useful information from images for identification, recognition and
automate operations and manipulation that involve images by adding and removing
ob- jects. Image acquisition and pre-processing, morphological operation and
restoration, image segmentation, edge detection and image analysis are the various
steps involved in digital image processing.
Image acquisition and pre-processing is the process of capturing image and loading it
to any digital system. Pre-processing of an image is a fundamental step in image
analysis, uses a variety of approaches to enhance the quality of the input image and
retrieve important data for further processing. This might involve resizing or scaling
the image to a size suitable for processing, using enhancement techniques like
contrast stretching or histogram equalisation to improve visibility of features, filtering
to remove noise or emphasise specific features, normalisation to ensure uniform
lighting conditions, and colour space conversion to speed up image analysis.
2
1.4.2 Morphological Operation
Edge detection in image processing is the process of identifying the edges or bound-
aries between different objects or regions within an image. Edges are described as
sharp shifts in colours or intensity that take place where two factors or regions meet.
These alterations are intended to be recognised by edge detection algorithms, which
display them as a collection of connected lines or curves that signify the edges of the
objects in the image. The edge detection algorithms used in this project are gradient-
based operators and Gaussian based methods such as Sobel, Prewitt, and Laplacian of
Gaussian, Canny algorithms respectively.
3
1.5 Edge Detection Techniques
=
2 ∂2 + ∂2 (1.2)
∇
∂x2 ∂y2
Canny is an edge detection operator that is based on Gaussians. This operator extracts
features without altering or changing the feature of the image and it is unaffected by
noise. The Laplacian of the Gaussian operator served as the basis for the advanced
approach used by the Canny edge detector. In order to acquire magnitude along the x
and y dimensions, the noise in the input image is first removed using a Gaussian filter,
followed by computing the derivative of the Gaussian filter. Reduce the non-max
edge contributor pixel points if there is a cluster of neighbors for any curve in a
direction perpendicular to the specified edge. A Hysteresis thresholding technique to
maintain pixels with gradient magnitudes greater than them and ignore pixels with
threshold values lower than them are being carried out.
Gaussian filter Eqn. 1.3 is used as smoothening filter.
G(x, y) = 2
1 − x +y
2
e 2σ2 (1.3)
2πσ2
For gradient calculation magnitude and direction is calculated using sobel operator.Gx
and Gy are the horizontal and vertical operator as described in Eqn. 1.4.
−1 0 1 −1 −2 −1
1 2
−1 00 2
Gx = −2 1 Gy = 0 10 0 (1.4)
4
Gradient magnitude M (x, y) and direction θ(x, y) are calculated as described in Eqn.
1.5 and Eqn. 1.6.
q
M (x, y) = G2x (x, y) + Gy2 (x, y) (1.5)
Gy(x, y)
θ(x, y) = atan( ) (1.6)
Gx(x, y))
5
1.5.4 Prewitt Operator
Prewitt operator resembles the sobel operator very much. It recognizes the image’s
hor- izontal and vertical edges. The difference between the corresponding pixel
intensity of the image is used to calculate edges. Two types of edges are detected by
the Prewitt op- erator, edges that go vertically or along the y-axis and horizontally or
along the x-axis. Anywhere there is a sharp shift in pixel intensity, the mask will
detect edges. Differ- entiation can be used to compute the edge as it is defined as a
shift in pixel intensities. Prewitt masks are first-order derivative masks.
Prewitt Operator deploy two masks, one for detecting horizontal edges Gx and other
one for detecting vertical edges Gx as shown in Eqn. 1.10.
−1 −1 −1 −1 0 1
1 1
Gx = 0 10 0 Gy = −1−1 00 11 (1.10)
Gradient magnitude M (x, y) and direction θ(x, y) are calculated as described in Eqn.
1.11 and Eqn. 1.12.
q
M (x, y) = G2x(x, y) + G2y (x, y) (1.11)
Gy(x, y)
θ(x, y) = atan( ) (1.12)
Gx(x, y))
6
1.6 Mammogram Image
Fig.1.1 represents the mammogram image with cancerous tissue of dimension 2251x5341.
7
1.7 Wavelet Transform
1 ∞
∫
T (a, b) = √
t − b dt (1.13)
a −∞ x(t)ψ
a
9
normalisation, scaling and feature extraction. Model selection and training includes
the selection of algorithm suitable for the data set and the task such as classification,
regres- sion, prediction. And model should be trained using the pre-processed data,
and feed the data into the model and modifying the parameter to get desirable output.
Model evaluation is the process of evaluating the trained model using performance
metrics like accuracy, sensitivity, precision, f1 score etc. Model optimization and
deployment, after evaluating performance metrics, optimising the model and
deploying it, with ap- propriate parameter adjustments. After the procedure and by
evaluating, the model is deployed for prediction on new data. In this project different
type of machine learning models and neural network like DT,KNN,SVM, GNB and
Attention Awareness Mech- anism are used.
Fig. 1.2 describes the hierarchy of DT with root node as the entire data set, branches
are decision rules, internal nodes are the features of data set and the leaf nodes are
outcomes.
10
Fig. 1.2 Decision tree hierarchy.
Fig. 1.3 explains the classification of data using KNN. There are two data sets of
colour pink and red. The black colour dot is the test data, find the distance between
test data and train data using Euclidean method and assume K=3, therefore two red
dots are near the black dot, hence black dot will be assigned to red colour data.
√
Euclidean distance = (x1 − x2)2 + (y1 − y2)2 (1.14)
11
Fig. 1.3 KNN finding nearest neighbor.
12
1.8.4 Gaussian Na ¨ıve Bayes
P (B | A ) · P
(PA(A
) | B) = (1.15)
P (B)
Eqn.1.15 represent the formula of Bayes theorem. It describes A and B are two
events, P(A) and P(B) are probability of occurrence of event A and B. Equation finds
the prob- ability of event A occurring while event B occurred
exp(e0i)
E0i = n
(1.16)
Σ exp(e0i)
j=1
Eqn 1.16 describes the calculation of attention weights E0i by applying the softmax on
the scores.
13
CHAPTER 2
The paper ”Particle Swarm Optimisation Feature Selection for Breast Cancer Recur-
rence Prediction” (Sakri et al. 2018) investigates the application of Particle Swarm
Op- timisation (PSO) to the prediction of breast cancer recurrence. The authors
compared their PSO-based feature selection method to other commonly used feature
selection methods and discovered that the PSO-based method outperformed the others
in terms of accuracy and number of features selected. They also tested different
machine learning algorithms and discovered that combining PSO-selected features
with SVM produced the best accuracy in predicting breast cancer recurrence. The
authors believe that their proposed method for selecting features using PSO will lead
to more accurate and effi- cient prediction models in the future.
15
trast enhancement, normalization, and noise reduction. The authors then extract
texture features from the preprocessed images using a variety of feature extraction
methods, such as the Gray Level Co-occurrence Matrix (GLCM), Gray Level Run
Length Matrix (GLRLM), and the Discrete Wavelet Transform (DWT).The extracted
features are then used to train a variety of machine learning classifiers, including
KNN, Decision Tree and SVM. Each classifier’s performance is assessed using
performance metrics such as accuracy, sensitivity, specificity, and F1-score. The
study’s findings show that the proposed CAD system detects breast cancer with high
accuracy and sensitivity, out- performing previous studies that used similar
techniques. The authors also compare the performance of various feature extraction
methods and discover that DWT has the highest accuracy and sensitivity. The best
performing classifier is found to be the KNN classifier, with an accuracy of 96.25%
and a sensitivity of 96.08%.
”A novel deep learning model for breast cancer detection using mammography im-
ages” (Oyelade and Ezugwu 2022) proposes a novel architecture that combines a deep
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN).Methodology
involved pre-processing mammography images with a variety of image enhancement
techniques such as contrast enhancement and noise reduction. After that, the pre-
processed images were fed into the proposed deep learning model,which included a
CNN and an RNN. The CNN extracted features from mammography images, while
the RNN captured temporal dependencies between the extracted features.The proposed
model was tested using two publicly available datasets the Digital Database for Screen-
ing Mammography (DDSM) and the Mammography Image Analysis Society (MIAS).
The proposed model was evaluated on the publicly available breast dataset and
achieved a high accuracy of 95.98%, sensitivity of 97.46%, and specificity of92.47%.
”On the scalability of machine-learning algorithms for breast cancer prediction in big
data context” (Alghunaim and Al-Baity 2019) proposes a dataset of over 1.5 million
breast cancer cases using data from the Surveillance, Epidemiology, and End Results
programme.On both datasets, the authors assessed the performance of three machine
learning algorithms SVM,KNN, and ANN. The algorithms were trained on different
subsets of the data and tested using 10-fold cross-validation.The study’s findings re-
vealed that SVM had the highest classification accuracy on both datasets, followed by
KNN and ANN. The study also discovered that increasing the amount of data used for
training did not necessarily improve classification accuracy, indicating that the size of
the datasets may limit the performance of the algorithms.
17
datasets. They use three pre-trained models to test this approach on two mammogra-
phy datasets: Visual Geometry Group 16 (VGG16), ResNet50, and InceptionV3. The
models are trained with the Keras deep learning framework using binary cross-
entropy loss and the Adam optimizer, with data augmentation techniques used to
increase the training dataset size. The results show that the pre-trained models detect
breast cancer accurately, with ResNet50 outperforming the others. The authors
compare their ap- proach to other cutting-edge methods and show that it outperforms
them.
19
methods were found to be the most effective, particularly those based on Gabor filters
and wavelet transforms. K-means and fuzzy C-means clustering methods were also
effective for segmenting suspicious regions, but due to the variability of mammogram
images, thresholding-based segmentation methods were less effective.
21
2022)proposes a methodology for extracting hybrid features that combined texture,
morphology, cale Invariant Feature Transform, GLCM, entropy, elliptic fourier
descrip- tors, RICA, and sparse filtering techniques.And some machine learning
models like SVM, DT, KNN, and Naive Bayes classifiers to identify breast cancer.
SVM was the best results produced based on textural features using the single feature
extraction ap- proach and it follows sensitivity (92.23%), specificity (94.60%), PPV
(93.16%), NPV (93.85%), TA (93.55%), and ROC AUC (0.9803). On the basis of
textural features, other classifiers performances include SVM Gaussian, which
achieves TA (92.55%), Decision Tree, which achieves TA (87.65%), and Nave
Bayes, which achieves TA (84.65%).
22
accuracy of 96.87 when compared to other methods evaluated. In addition, the au-
thors compared the performance of various wavelet types and discovered that the
Haar wavelet performed the best.
”Comparative analysis of breast cancer detection using machine learning and biosen-
sors. Intelligent Medicine” (Amethiya et al. 2022) proposes a study comparing the
per- formance of various machine learning algorithms for breast cancer detection,
includ- ing SVM, KNN and Random Forest (RF). Also compared machine learning’s
perfor- mance to that of biosensors based on Surface Plasmon Resonance and
electrochemical impedance spectroscopy.For data analysis and machine learning
model development, the authors used MATLAB software. The models’ performance
was assessed using ac- curacy, sensitivity, specificity, and F1-score as evaluation metrics.
In terms of classifica- tion accuracy, the study’s findings revealed that SVM
outperformed the other machine learning algorithms. The SVM algorithm achieved
95% accuracy, 91 sensitivity, and 98 specificity. In comparison, the biosensors had a
lower accuracy of 82, sensitivity of 75
, and specificity of 89.
23
2.2.1 Software
”Wavelet based feature extraction method for breast cancer diagnosis” (Benmazou
and Merouani 2018)proposed using a combination of median filtering, adaptive
histogram equalization, and Contrast-limited Adaptive Histogram Equalisation
(CLAHE) to pre- process the mammogram images. These image processing
techniques can be imple- mented using a different type of software tools, including
the MATLAB Image Pro- cessing Toolbox, OpenCV, and Python libraries such as
scikit-image and Pillow.
2.3 Goals
A major global public health issue that affects millions of women is breast cancer.
Early detection and timely treatment are essential in lowering mortality rates because this
con- dition is the main cause of death for women. However, there is a frequently used
man- ual interpretation of mammography scans subjective and error-prone when
utilised as a breast cancer screening method. By creating an image processing and
machine learn- ing system for the early identification and prediction of breast cancer
recurrence, this study seeks to address these shortcomings. The technology will
employ machine learn- ing algorithms to analyse digital mammography images in
order to detect breast cancer indications and forecast the possibility of a
recurrence.This project’s ultimate goal is to develop a non-invasive, effective tool for
early detection and prediction, advancing breast cancer diagnosis and therapy. A
series of processes, including pre-processing, segmentation, morphological
operations, and edge detection, were used to process the mammogram picture of the
malignant breasts. Following that, as described in Fig.4.1, various edge detection
approaches were used to assess the resulting images. Several wavelet transforms are
utilised with a 2D wavelet analyzer tool for the mammography image. Using
performance metric, as shown in Fig.4.2, different edge detection meth- ods and
wavelet transforms were evaluated for their efficacy.Additionally, a data set from the
University of California Irvine (UCI) machine learning repository was used to train
different machine learning models and neural networks to predict recurrence .
Performance metrics were used to assess how well the trained models predicted recur-
rence, as shown in Fig.4.3.
24
CHAPTER 3
TECHNICAL SPECIFICATION
3.1.1 MATLAB
25
3.1.2 Wavelet Analyzer
26
3.1.3 Jupyter Notebook
Jupyter notebook is a tool for live code, equations, visualisations, and narrative text
using the open-source web application. Python, R, and Julia are just a few of the pro-
gramming languages that are supported. In jupyter notebook code is written in cells
and run it in an interactive environment. Each cell may include raw text, Markdown
text, or code. The output of a code cell is shown just below the cell when you run it.
This enables flexible and iterative testing and experimentation with code.
The capability of Jupyter Notebook to produce visualisations and plots right in-
side the notebook is one of its key advantages. To generate dynamic charts and graphs
and even include multimedia content like photographs and movies,utilization of visu-
alisation tools like Matplotlib and Seaborn are available. GitHub, Google Colab, and
nbviewer are just a few of the platforms that make it simple to share and collaborate
on notebooks. This makes it an effective tool for scientific study, education, and data
analysis. Jupyter Notebook is an all-around flexible and strong tool that lets you
create and share interactive documents with live code, visualisations, and narrative
text. It is extensively used in many different industries, including data science,
machine learning, and academic study.
3.2 References
27
8) ”Tuberculosis disease: Diagnosis by image processing” by Parez et al(2019).
9) Comparative analysis of breast cancer detection using machine learning and biosen-
sors (Y Amethya et al,2022).
12) Automated Breast Cancer Detection Models Based on Transfer Learning (Madal-
lah Alruwaili et al,2022).
14) Breast Cancer Classification from Mammogram Images Using Enhanced Deep
Learning Features and Equilibrium-Jaya Controlled Regula Falsi-Based Features Se-
lection (Kiran Jabeen et. al ,2023)
18) Deep Learning for Breast Cancer Diagnosis from Mammograms—A Compara-
tive Study (Lazaros Tsochatzidis et al,2019)
28
CHAPTER 4
4.1.1 Processing the Mammogram image using different Edge Detection Methods
29
Fig. 4.1 Flow diagram of using different Edge Detection Techniques
30
4.1.3 Recurrence Prediction Using Machine Learning
A Breast cancer data set is taken from UCI machine learning repository. The source
is from institute of Oncology University Medical Center Ljubljana, Yugoslavia. The
data set includes 201 instances of one class and 85 instances of another class. The
instances are described by 9 features, the features are Age, menopause, tumor-size, in-
vasive nodes, node capsules, degree of malignancy, breast (right or left), breast quadrant
and irradiation, and two events in a class- non recurrence and recurrence.
The data set is taken, processed and fed into the four machine learning classifiers De-
cision Tree, K Nearest Neighbor, Gaussian Naive Bayes and Support Vector Machine.
Also Attention Awareness which is a neural network implementation in machine
learn- ing. Then the machine learning models are trained with the processed data and
predicted the recurrence event by deploying the models. After the prediction models
were eval- uated by determining the accuracy,sensitivity, precision, F1 score and
Cohen’s Kappa score. Finally evaluate the model and compare these different
classifiers according to the performance metrics, as shown in Fig. 4.3.
31
4.2 Codes and Standards
4.3.1 Constraints
Due to a lack of resources and privacy issues, high quality data sets for image
process- ing are not easily accessible.
Due to a lack of resources, training data sets with appropriate characteristics for recur-
rence prediction are also unavailable. and some of the information was corrupted.
While using online MATLAB software, the processing time was high.
4.3.2 Alternatives
An image of breast with cancerous tissue from kaggle was used for image processing
because access to mammography scans was restricted.
Due to a lack of enough data, a data set from UCI with less features was used to
predict recurrence.
4.3.3 Tradeoffs
As a result of using a single data set for training and testing the machine learning
model the accuracy of predictive models was inconsistent.
32
CHAPTER 5
5.1 Schedule
Table 5.1 describes the project tasks done from September to April
33
5.2 Tasks and Milestones
Fig.5.1 shows the gantt chart of the task and milestone of the project from September
to April
34
CHAPTER 6
PROJECT DEMONSTRATION
For identifying the tumors tissue in the mammogram image as in Fig.1.1 is subjected
to different edge detection methods, including LoG, Canny, Sobel and Prewitt
operators in MATLAB. And as a result processed images are obtained. The processed
images are then compared with original image and evaluated using performance
metric.
Fig.6.1 shows the edge detected image after applying LoG operator.
35
6.1.2 Canny Algorithm
Fig.6.2 shows the edge detected image after applying Canny operator.
Fig.6.3 shows the edge detected image after applying Sobel operator.
36
6.1.4 Prewitt Algorithm
Fig.6.4 shows the edge detected image after applying Prewitt operator.
37
6.2 Image Processing Using 2D Wavelet Analyzer
38
6.2.2 Dauechies Wavelet
Dauechies wavelet is an extension of Haar, it uses long filters and smooth scaling
func- tions on mammogram image Fig.1.1.
Fig.6.6 shows the synthesised image after applying Dauechies wavelet.
Symmetric wavelet is similar to Dauechies wavelet but has different filter coefficients.Symmetric
wavelet is helpful for pictures with sharp edges and colour.
Fig.6.7 shows the synthesised image after applying Symmetric wavelet.
39
Fig. 6.7 Synthesised image of Symmetric Wavelet
Coiflets have a more compact support, which results in fewer coefficients and a more
effective representation of image data.
40
6.2.5 Biorthogonal Wavelet
Biorthogonal wavelet are not necessarily orthogonal, it has two set of filters to divide
the signal or image into various frequency bands.
Fig.6.9 shows the synthesised image after applying Biorthogonal wavelet.
Reverse biorthogonal wavelet used to capture image or signal very precisely, they
elim- inate the noise without distorting the original image.
Fig.6.10 shows the synthesised image after applying Reverse Biorthogonal wavelet.
41
Fig. 6.10 Synthesised image of Reverse Biorthogonal Wavelet
42
6.2.8 Farid-kim Wavelet
Farid-kim wavelet has high levels of directional selectivity and spatial localization. it
also has compact support to record the signal or image details very precisely.
Fig.6.12 shows the synthesised image after applying Farid-kim wavelet.
The data set acquired from UCI machine learning repository is processed and fed into
the machine learning models like Decision Tree, K Nearest Neighbor, Gaussian Naive
Bayes, Support Vector Machine and Attention Awareness model. The models are de-
ployed and predicted the outcomes. And the models are evaluated using performance
metric.
43
Fig. 6.13 Different Machine Learning Algorithms
Fig.6.13 shows the different machine learning algorithms used for recurrence pre-
diction.
Fig.6.14 shows the attention awareness mechanism used for recurrence prediction.
44
Fig. 6.15 Model Prediction
45
CHAPTER 7
7.1.1 MATLAB
The image processing part of the project was executed on MATLAB/Simulink,this com-
ponent is free of cost as they come with a university license. Nevertheless, the costs of
these licenses are $45.
The recurrence prediction part of the project is done using Jupyter Notebook and it is
an open software.
46
7.2 Results and Discussion
After the tumor identification in mammogram using different edge detection methods,
the output images are compared with input image and evaluated using performance met-
ric.
From Table 7.1 performance metric of different edge detection methods, the Canny
algorithm appears to do the best overall, according to the metrics. The Canny algo-
rithm is able to precisely identify the related pixels on each side of an edge because it
has the highest pixel correspondence metric. The Canny algorithm also has the
highest grey scale FOM, demonstrating its ability to recognise edges precisely
regardless of the brightness of the image. The Canny algorithm, which produces high-
quality images after edge identification, has the highest PSNR of all the algorithms.
The Sobel and Prewitt algorithms have significantly lower PSNR values than the
Canny algorithm, they do perform well in terms of pixel correspondence and grey
scale FOM values. The LoG algorithm performs the worst overall, with the lowest
closest distance metric and PSNR values.
47
7.2.2 Image Processing Using 2D Wavelet Analyzer
After the image processing using 2D wavelet analyzer tool in MATLAB, the synthesised
images compare with input image and the wavelets are evaluated using performance
metric.
Pixel
Closest Distance
Wavelets Correspondence Grey scale FOM PSNR
Metric
Metric
Haar 27354068 0.0595 1.2976 14.5267
Db 28952916 0.0558 1.2783 14.9116
Sym 27354068 0.0595 1.2976 14.5267
Coif 28952916 0.0558 1.2783 14.9116
Bior 28952916 0.0558 1.2783 14.9116
Rbio 27354068 0.0595 1.2976 14.5267
Dmey 27354068 0.0595 1.2976 14.5267
Fk 27354068 0.0595 1.2976 14.5267
From Table 7.2 performance metrics of different wavelet transforms, all of them
generated comparable outcomes based on the metric. In fact, all wavelet families
share the same closest distance and grey form, and all share a similar range for pixel
corre- spondence and PSNR.
48
7.2.3 Recurrence Prediction Using Deep learning
The data set is fed into the machine learning models and Attention awareness, models
trained, deployed and predicted the outcomes. After the prediction the models are
eval- uated using performance metric.
Sensitiv- Cohen’s
Models Accuracy F1 score Precision
ity(recall) kappa score
DT 62.765 32.258 36.363 41.666 0.1064
GNB 71.276 51.612 54.237 57.142 0.3338
KNN 58.510 32.258 33.898 35.714 0.0377
SVM 63.829 54.838 50.0 45.945 0.2201
Attention 82.89 40.0 53.33 80.0 0.337
awareness
From Table 7.3 performance metrics of different machine learning models, the atten-
tion mechanism appears to be the best performing algorithm, with the highest
accuracy (82.89%) and precision (80.0%), as well as a strong F1 score (53.33%).
When com- pared to other algorithms like GNB and SVM, it has a lesser
sensitivity/recall (40.0%). The Cohen’s kappa score (0.337) is higher than that of the
other algorithms, showing better agreement between predicted and actual labels.
With the second highest accuracy (71.276%) and F1 score (54.237%), as well as the
highest precision (57.142%), GNB perform well. It also has the highest sensitivity/recall
(51.612%), showing that it is effective at correctly identifying positive instances. Its Co-
hen’s kappa score (0.3338) is also greater than those of the other algorithms,
indicating improved agreement between predicted and actual labels.
With the accuracy (63.829%), sensitivity/recall (54.838%), and a respectable F1 score
(50.0%), SVM also performs well. Even though, it has a lesser precision than GNB
(45.945%)and greater than attention mechanism. Except for precision, decision trees
perform the worst across all parameters. KNN also performs fairly poorly across all
parameters, suggesting that it might not be the best for the recurrence prediction. . As
a result, the attention process is suggested for predicting cancer recurrence.
49
CHAPTER 8
SUMMARY
8.1Summary
The specific data set and evaluation parameters can affect the performance of various
algorithms. Based on the measures considered, it appears that the Canny algorithm is
overall the best for the image processing task. The evaluation of various wavelet fam-
ilies in image compression leads to the conclusion that, at least for this particular data
set and evaluation metric, the choice of wavelet family may not significantly affect
the overall quality of image compression. Similar to how attention awareness
mechanism looks to be the highest performing algorithm overall for the classification
problem and recurrence prediction based on the metrics evaluated. The use of an
attention mecha- nism with RNNs helps the model to concentrate on the most relevant
parts of the input sequence at each time step to make accurate prediction.
50
REFERENCES
Alruwaili, M. and Gouda, W. (2022), ‘Automated breast cancer detection models based
on transfer learning’, Sensors 22(3), 876.
Amethiya, Y., Pipariya, P., Patel, S. and Shah, M. (2022), ‘Comparative analysis of
breast cancer detection using machine learning and biosensors’, Intelligent Medicine
2(2), 69–81.
Benmazou, S. and Merouani, H. F. (2018), Wavelet based feature extraction method for
breast cancer diagnosis, in ‘2018 4th International Conference on Advanced
Technolo- gies for Signal and Image Processing (ATSIP)’, IEEE, pp. 1–5.
Fatima, N., Liu, L., Hong, S. and Ahmed, H. (2020), ‘Prediction of breast cancer,
comparative review of machine learning techniques, and their analysis’, IEEE Access
8, 150360–150376.
Hussain, L., Qureshi, S. A., Aldweesh, A., Pirzada, J. u. R., Butt, F. M., Eldin, E. T.,
Ali, M., Algarni, A. and Nadim, M. A. (2022), ‘Automated breast cancer detection
by reconstruction independent component analysis (rica) based hybrid features using
machine learning paradigms’, Connection Science 34(1), 2784–2806.
Jabeen, K., Khan, M. A., Balili, J., Alhaisoni, M., Almujally, N. A., Alrashidi, H., Tariq,
U. and Cha, J.-H. (2023), ‘Bc2netrf: Breast cancer classification from mammogram
images using enhanced deep learning features and equilibrium-jaya controlled regula
falsi-based features selection’, Diagnostics 13(7), 1238.
Kausar, T., Wang, M., Idrees, M. and Lu, Y. (2019), ‘Hwdcnn: Multi-class
recognition in breast histopathology with haar wavelet decomposed image based
51
convolution neural network’, Biocybernetics and Biomedical Engineering 39(4), 967–
982.
52
Krithiga, R. and Geetha, P. (2021), ‘Breast cancer detection, segmentation and classifi-
cation on histopathology images analysis: a systematic review’, Archives of Computa-
tional Methods in Engineering 28, 2607–2619.
Oza, P., Sharma, P., Patel, S. and Bruno, A. (2021), ‘A bottom-up review of image
analysis methods for suspicious region detection in mammograms’, Journal of
Imaging 7(9), 190.
Sahu, Y., Tripathi, A., Gupta, R. K., Gautam, P., Pateriya, R. and Gupta, A. (2023),
‘A cnn-svm based computer aided diagnosis of breast cancer using histogram k-
means segmentation technique’, Multimedia Tools and Applications 82(9), 14055–
14075.
Tsochatzidis, L., Costaridou, L. and Pratikakis, I. (2019), ‘Deep learning for breast
can- cer diagnosis from mammograms—a comparative study’, Journal of Imaging
5(3), 37.
Wang, L., Zang, J., Zhang, Q., Niu, Z., Hua, G. and Zheng, N. (2018), ‘Action recog-
nition by an attention-aware temporal weighted convolutional neural network’,
Sensors 18(7), 1979.
Younis, Y. S., Ali, A. H., Alhafidhb, O. K. S., Yahia, W. B., Alazzam, M. B., Hamad,
A. A. and Meraf, Z. (2022), ‘Early diagnosis of breast cancer using image processing
techniques’, Journal of Nanomaterials 2022, 1–6.
53
LIST OF PUBLICATION
1.G.K. Rajini, Alan George, Thomas Tom, Arshaque Abdusalam. ”Detection and
Recurrence of Breast Cancer through Image Processing and Attention Aware-
ness: A Comparative Analysis of Algorithms.”Communicated to Asian Journal
of Pharmaceutical and Clinical research.(2023)
54
CURRICULUM VITAE
55
56
57