BATCH7 ECE3 BreastCancerDetectionUsingAI
BATCH7 ECE3 BreastCancerDetectionUsingAI
INTELLIGENCE
Bachelor of Technology
in
Electronics and Communication Engineering
by
CERTIFICATE
This is to certify that the project titled BREAST CANCER DETECTION USING
ARTIFICIAL INTELLIGENCE a bonafide record of the work done by M J V
PRAKASH (19131A04D8), P HARI PRAKASH (19131A04J1), M VINAY
KUMAR (19131A04D5) , M VIDYA SAGAR (19131A04D9) in partial fulfillment
of the requirements for the award of the degree of Bachelor of Technology in
Electronics and Communication Engineering of the Gayatri Vidya Parishad
College of Engineering (Autonomous) affiliated to Jawaharlal Nehru Technological
University, Kakinada during the year 2022-2023.
DECLARATION
This is to certify that the project titled BREAST CANCER DETECTION USING
ARTIFICIAL INTELLIGENCE a bonafide record of the work done by M J V
PRAKASH (19131A04D8), P HARI PRAKASH (19131A04J1), M VINAY KUMAR
(19131A04D5), M VIDYA SAGAR (19131A04D9) in partial fulfillment of the
requirement for the award of degree of B. Tech. in Electronics and Communication
Engineering to Gayatri Vidya Parishad College of Engineering (Autonomous), affiliated
to J.N.T. University, Kakinada comprises only our original work and due
acknowledgement has been made in the text to all other material used.
Date:
M J V PRAKASH (19131A04D8) :
ACKNOWLEDGEMENT
We take this opportunity to thank one and all who have helped in making
this possible. We are grateful to Gayatri Vidya Parishad College of Engineering
(Autonomous), for giving the opportunity to work on a project report as a part of
the curriculum.
With a great sense of pleasure and privilege, we extend our gratitude and
sincere thanks to, Dr. N. Deepika Rani, Professor and Head of the Electronics and
Communication Engineering Department, for her encouragement.
With a great sense of pleasure and privilege, we extend our gratitude and
sincere thanks to Dr. A. Bala Koteswara Rao, Principal, Gayatri Vidya Parishad
College of Engineering (Autonomous) for his continuous encouragement during
the course of study.
We would like to express our sincere thanks to all the faculty and staff of
the Department of Electronics and Communication Engineering for their advice and
cooperation.
We owe our thanks much more than our words can express to our families
who sacrificed in all respects with love and affection to complete this thesis work
satisfactorily.
Finally, we take this opportunity to thank all the people who helped us in
completion of thesis work, directly or indirectly, and for their timely
encouragement and faithful services.
ABSTRACT
Breast Cancer is one of the major causes of death in women. Many research has been done
on the diagnosis and detection of breast cancer using various image processing techniques.
Nonetheless, the disease remains as one of the deadliest diseases. Since the cause of breast
cancer stays obscure, prevention becomes impossible. Thus, early detection of tumor in
breast is the only way to cure breast cancer. We have proposed a Convolutional Neural
Network (CNN) algorithm and also Support Vector Machine (SVM) for Breast Cancer
Detection.
Firstly, the image pre-processing of the mammogram is carried out which helps in
removing noise in the image, if any. Second the segmentation techniques were used with
which the tumor part dilates in the breast and erodes the remaining parts. Along with the
above two image processing techniques, feature extraction is also done using PYTHON.
Finally, the features extracted are used for classification of mammograms into benign and
malignant. The image classification process is done with python using about
6000(approx.) images.
CONTENTS
Page No.
CERTIFICATE II
DECLARATION III
ACKNOWLEDGEMENT IV
ABSTRACT V
CONTENTS VI
LIST OF TABLES IX
LIST OF FIGURES X
CHAPTER 1 INTRODUCTION 11
2.6. MAMMOGRAMS 6
CONCLUSION 59
FUTURE SCOPE 60
REFERENCES 61
LIST OF TABLES
TABLE 4.1. Features 22
METHDOLOGY
1.1. PROJECT OBJECTIVE
The objective of the project is to detect the initial phase tumors which shall not be prone
to human error using image processing techniques such as image pre-processing, image
segmentation, features extraction and selection and image classification.
Firstly, the image pre-processing of the mammogram is carried out which helps in
removing noise in the image, if any. Secondly the segmentation techniques were used
with which the tumor part dilates in the breast and erodes the remaining parts. Along with
the above two image processing techniques, feature extraction is also done using
PYTHON. Finally, the features extracted are used for classification of mammograms into
either benign or malignant. The image classification process is done with python using
about 6000(approx.) images.
Chapter 8 presents the proposed work which is used in the detection of breast
Chapter 9 presents the simulation results of the detection of breast cancer using
It’s important to understand that most breast lumps are benign and not cancer. Non-
cancerous breast tumors are abnormal growths, but they do not spread outside of the
breast. They are not life threatening, but some types of benign breast lumps can increase
a woman's risk of getting breast cancer. Any breast lump or change needs to be checked
by a health care professional to determine if it is benign or malignant (cancer) and if it
might affect your future cancer risk.[2]
Most breast cancers begin in the ducts that carry milk to the nipple (ductal
cancers)
Some start in the glands that make breast milk (lobular cancers)
There are also other types of breast cancer that are less common like phyllodes
tumor and angiosarcoma
A small number of cancers start in other tissues in the breast. These cancers are
called sarcomas and lymphomas and are not really thought of as breast cancers.
Figure 2.1. Breast
Although many types of breast cancer can cause a lump in the breast, not all do. Many
breast cancers are also found on screening mammograms, which can detect cancers at
an earlier stage, often before they can be felt, and before symptoms develop.
Once a biopsy is done, breast cancer cells are tested for proteins called estrogen
receptors, progesterone receptors and HER2. The tumor cells are also closely looked at
in the lab to find out what grade it is. The specific proteins found and the tumor grade
can help decide treatment options.
The lymph system is a network of lymph (or lymphatic) vessels found throughout the
body that connects lymph nodes (small bean-shaped collections of immune system cells).
The clear fluid inside the lymph vessels, called lymph, contains tissue byproducts and
waste material, as well as immune system cells. The lymph vessels carry lymph fluid
away from the breast. In the case of breast cancer, cancer cells can enter those lymph
vessels and start to grow in lymph nodes.
Most of the lymph vessels of the breast drain into:
The American Cancer Society's estimates for breast cancer in the United States for 2023
are:
About 297,790 new cases of invasive breast cancer will be diagnosed in women.
About 55,720 new cases of ductal carcinoma in situ (DCIS) will be diagnosed.
About 43,700 women will die from breast cancer.[3]
2.4.2. TRENDS IN BREAST CANCER
In recent years, incidence rates have increased by 0.5% per year. Breast cancer is the
second leading cause of cancer death in women. The chance that a woman will die from
breast cancer is about 1 in 39 (about 2.6%). Since 2007, breast cancer death rates have
been steady in women younger than 50, but have continued to decrease in older women.
From 2013 to 2018, the death rate went down by 1% per year. These decreases are
believed to be the result of finding breast cancer earlier through screening and increased
awareness, as well as better treatments. At this time there are more than 3.8 million breast
cancer survivors in the United States. This includes women still being treated and those
who have completed treatment.
The most common symptom of breast cancer is a new lump or mass. A painless, hard
mass that has irregular edges is more likely to be cancer, but breast cancers can be tender,
soft, or round. They can even be painful. For this reason, it's important to have any new
breast mass, lump, or breast change checked by an experienced health care professional.
Other possible symptoms of breast cancer include:
Although any of these symptoms can be caused by things other than breast cancer, if you
have them, they should be reported to a health care professional so the cause can be
found.
Remember that knowing what to look for does not take the place of having regular
mammograms and other screening tests. Screening tests can help find breast cancer early,
before any symptoms appear. Finding breast cancer early gives you a better chance of
successful treatment.
2.6. MAMMOGRAMS
Mammograms are low-dose x-rays of the breast. Regular mammograms can help find
breast cancer at an early stage, when treatment is most successful. A mammogram can
often find breast changes that could be cancer years before physical symptoms develop.
Results from many decades of research clearly show that women who have regular
mammograms are more likely to have breast cancer found early, are less likely to need
aggressive treatment like surgery to remove the breast (mastectomy) and chemotherapy,
and are more likely to be cured.
Mammograms are not perfect. They miss some cancers. And sometimes a woman will
need more tests to find out if something found on a mammogram is or is not cancer.
There’s also a small possibility of being diagnosed with a cancer that never would have
caused any problems had it not been found during screening. (This is called
overdiagnosis.)
There are two types of mammograms. A screening mammogram is used to look for signs
of breast cancer in women who don’t have any breast symptoms or problems. X-ray
pictures of each breast are taken, typically from 2 different angles. Mammograms can
also be used to look at a woman’s breast if she has breast symptoms or if a change is seen
on a screening mammogram. When used in this way, they are called diagnostic
mammograms. They may include extra views (images) of the breast that aren’t part of
screening mammograms. Sometimes diagnostic mammograms are used to screen women
who were treated for breast cancer in the past.
In the past, mammograms were typically printed on large sheets of film. Today, digital
mammograms are much more common. Digital images are recorded and saved as files in
a computer.
Breast MRI (magnetic resonance imaging) uses radio waves and strong magnets to make
detailed pictures of the inside of the breast. It is used:
To help determine the extent of breast cancer: Breast MRI is sometimes used in
women who already have been diagnosed with breast cancer, to help measure the
size of the cancer, look for other tumors in the breast, and to check for tumors in
the opposite breast. But not every woman who has been diagnosed with breast
cancer needs a breast MRI.
To screen for breast cancer: For certain women at high risk for breast cancer, a
screening MRI is recommended along with a yearly mammogram. MRI is not
recommended as a screening test by itself because it can miss some cancers that a
mammogram would find.
Figure 2.1. Breast MRI
Although MRI can find some cancers not seen on a mammogram, it’s also more likely
to find things that turn out not to be cancer (called a false positive). This can result in a
woman getting tests and/or biopsies that end up not being needed. This is why MRI is
not recommended as a screening test for women at average risk of breast cancer.
Breast ultrasound uses sound waves to make a computer picture of the inside of the
breast. It can show certain breast changes, like fluid-filled cysts, that are harder to
identify on mammograms. It is used when:
When other tests show that there is breast cancer, then biopsy is done. Needing a breast
biopsy doesn’t necessarily mean that there is cancer. Most biopsy results are not cancer,
but a biopsy is the only way to find out for sure. During a biopsy, a doctor will remove
small pieces from the suspicious area so they can be looked at in the lab to see if they
contain cancer cells.
There are different kinds of breast biopsies. Some are done using a hollow needle, and
some use an incision (cut in the skin). Each has pros and cons.
In an FNA biopsy, a very thin, hollow needle attached to a syringe is used to
withdraw (aspirate) a small amount of tissue from a suspicious area. The needle
used for an FNA biopsy is thinner than the one used for blood tests.
A core biopsy uses a larger needle to sample breast changes felt by the doctor or
seen on an ultrasound, mammogram, or MRI. This is often the preferred type of
biopsy if breast cancer is suspected.
In rare cases, surgery is needed to remove all or part of the lump for testing. This is called
a surgical or open biopsy. Most often, the surgeon removes the entire mass or abnormal
area as well as a surrounding margin of normal breast tissue.
Regardless of the type of biopsy, the biopsy samples will be sent to a lab where a
specialized doctor called a pathologist will look at them. It typically will take at least a
few days for you to find out the results.[4]
CHAPTER 3
SURVEY
The survey of the associated work is made to study existing methods for the detection of
Breast Cancer using various Image Processing Techniques. The associated work on this
subject matter is branded in the literature survey in which the concentration is mainly on
various techniques on detection of Breast Cancer.
Siddhartha Gupta, Sudha R, Neha Sinha, Challa Babu in the Proposed work, a
variety of algorithms has been applied but the best one suited for cancer detection is the
combination of K Means, Closing, Dilation and Canny Edge Detection algorithm.[6]
Prannoy Giri and K Saravana Kumar This paper mainly studies the multiple
image processing algorithms which can be extensively used for finding cancerous cells.
The techniques in computer aided mammography includes image pre-processing, image
segmentation, feature extraction, feature selection and classification. Further
developments are required to extract more features to find pattern in tumor to have a
better understanding on them. Texture analysis method can be used to classify between
benign and malignant masses by means to identify the micro-calcification in the
mammography.[7]
Dina A. Ragab, Maha Sharkas, Stephen Marshall and Jinchang Ren This
paper mainly works on DCNN (Deep Convolutional Neural Network) and SVM and they
have used Region Based Segmentation method and which gives an 88% of accuracy in
SVM and 73.6% of accuracy in DCNN.[8]
CHAPTER 4
PROCESSING AND ANALYSIS
where Red, Green, and Blue are the RGB values of the pixel, and the weights are based
on the relative brightness of each colour.
Edge detection is another important technique in image processing that involves detecting
boundaries or edges in an image. Edges are defined as sudden changes in intensity or
colour within an image, and they can be used for a variety of applications, such as object
detection and segmentation.
There are several methods for performing edge detection, but one of the most commonly
used is the Canny edge detection algorithm. This algorithm involves several steps,
including smoothing the image to remove noise, calculating the gradient of the image to
find regions of rapid change in intensity, and applying thresholding to identify edges.[9]
The Canny algorithm is a multi-stage process that involves the following steps:
1. Smoothing: The image is convolved with a Gaussian filter to reduce noise and
blur the edges.
2. Gradient calculation: The gradient magnitude and direction are calculated for each
pixel in the image.
4. Double thresholding: Two threshold values are applied to the gradient magnitude,
and pixels above the high threshold are considered as strong edges, while pixels
below the low threshold are considered as non-edges. Pixels between the two
thresholds are considered as weak edges.
5. Edge tracking by hysteresis: The weak edges are connected to the strong edges if
they are adjacent to each other, forming continuous edges.
The output of the Canny algorithm is a binary image that indicates the location of edges
in the original image.
INPU
T IMAGE OUTPUT IMAGE
The various Features extracted from the mammography images are Mean, Variance,
Entropy, Skewness, Kurtosis, Mean Symmetry, Mean Concave Points and Mean
smoothness.[10]
Mean:
The mean value is the ratio of the sum of pixel values and the total number of pixel
values. Mean value gives the contribution of individual pixel intensity for the entire
image.
To calculate the mean intensity of this image, we can use the same formula as before:
Mean = (I1 + I2 + ... + I16) / N
where I1, I2, ..., I16 are the intensities of the 16 pixels, and N is the total number of
pixels (which is 16 in this case).
Example: Consider the following distribution of pixels. Calculate the average of all
the pixel values and replace the value of the centre pixel with the mean.
[10 20 30 40]
[50 60 70 80]
[90 100 110 120]
[130 140 150 160]
Mean= (10+20+30+40+50+60+70+80+90+100+110+120+130+140+150+160)/16=85
Variance:
The variance of pixel intensities in the image can be calculated using the formula:
where Ii is the intensity of the ith pixel, mean is the mean intensity of the image, and N is
the total number of pixels.
[10 20 30 40]
[50 60 70 80]
[90 100 110 120]
[130 140 150 160]
Using the pixel intensities and mean calculated earlier, we can substitute the values into
the formula to get:
So, the variance of pixel intensities for the given image is 2266.667.
Standard Deviation:
Standard deviation is defined as the tendency of the values in a data set to deviate from
the average value. The standard deviation is the average amount of variability in your
data set.
[10 20 30 40]
[50 60 70 80]
[90 100 110 120]
[130 140 150 160]
Using the pixel intensities and Variance calculated earlier, the square root of the Variance
gives the Standard Deviation.
So, the Standard Deviation of pixel intensities for the given image is 47.610.
Skewness:
Skewness measures how “lopsided” the distribution of pixels is. In terms of digital image
processing, Darker and glossier surfaces tend to be more positively skewed than lighter
and matte surfaces. Hence, we can use skewness in making judgments about image
surfaces. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A
distribution, or data set, is symmetric if it looks the same to the left and right of the centre
point. The skewness for a normal distribution is zero, and any symmetric data should
have a skewness near zero. Negative values for the skewness indicate data that are
skewed left and positive values for the skewness indicate data that are skewed right. By
skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed
right means that the right tail is long relative to the left tail. It is shown in the below
figure.
The skewness of pixel intensities in the image can be calculated using the formula:
where Ii is the intensity of the ith pixel, mean is the mean intensity of the image, sigma is
the standard deviation of pixel intensities in the image, and N is the total number of
pixels.
Using the pixel intensities and mean calculated earlier, we can calculate the standard
deviation of pixel intensities using the formula:
Substituting the values of pixel intensities and mean into the above formula, we get:
Using this value of sigma, we can substitute the values of pixel intensities, mean, and
sigma into the formula for skewness to get its value.
Entropy:
Example: Consider an image as given below. Observe the number of transitions from
0↔1
000000000 →0
011110111 →3
000111101 →3
000111111 →1
Kurtosis:
Kurtosis is a measure of the combined weight of distribution’s tails relative to the centre
of the distribution. Sometimes it is quite hard to distinguish from noise and image
content, especially if you handle with low contrast textures. So, if we want to be able to
make a statement of how good an algorithm works on that, we have to establish a new
numeric quantity, which is called “kurtosis”.
Kurtosis is used to measure how heavy the tails of the distribution differ from the tails of
a normal distribution. In other words, it identifies whether the tails of the distribution
contain extreme values. In digital image processing kurtosis values are interpreted in
combination with noise and resolution measurement. High kurtosis values should go hand
in hand with low noise and low resolution. Images with moderate amounts of salt and
pepper noise are likely to have a high kurtosis value
An excess kurtosis is a metric that compares the kurtosis of a distribution against the
kurtosis of a normal distribution. The kurtosis of a normal distribution equals 3.
Therefore, the excess kurtosis is found using the formula below:
Example: consider the following distribution. Mean, variance, standard deviation of the
distribution is 94, 27, 5 respectively. This calculation involves following steps:
Subtract the mean of the distribution from each value of the pixel. This results the
following distribution.
6 6 6
6 -44 6
6 6 6
Divide each value of the pixel with standard deviation and multiply the result four
Add all the above values and divide the result with the product of no: of rows and
columns.
Mean Symmetry:
To calculate the mean symmetry of an image, you need to first define what you mean by
symmetry. One common definition of symmetry for a 2D image is mirror symmetry,
which means that the image can be divided into two halves that are mirror images of each
other.
To calculate the mean symmetry of an image using this definition, you can follow these
steps for a 4x4 matrix:
1. Define the mirror line: For a 4x4 matrix, there are two possible mirror lines - the
vertical line that divides the matrix into two 2x4 halves, and the horizontal line
that divides the matrix into two 4x2 halves. Choose one of these lines as the
mirror line for your calculation.
2. Reflect the image: Reflect the half of the image on one side of the mirror line
across the line to create a mirror image of that half. For example, if you choose
the vertical line as the mirror line, reflect the left half of the image across the line
to create a mirror image of the right half.
3. Calculate the difference: Calculate the absolute difference between the original
half of the image and its mirror image. For each pixel in the half, subtract the
corresponding pixel value in the mirror image from the original pixel value, take
the absolute value of the difference, and sum up all the differences.
4. Repeat for the other half: Repeat steps 2 and 3 for the other half of the image on
the other side of the mirror line.
5. Calculate the mean: Add up the total differences from both halves and divide by
the total number of pixels in the image to get the mean symmetry value.
Original image:
1234
5678
9876
5432
Reflected image:
4321
8765
6789
2345
Absolute Difference:
3113
3113
3113
3113
So, the mean symmetry value for this image using the vertical mirror line is 1.875.
Mean Smoothness:
To calculate the mean smoothness of an image, you need to first define what you mean
by smoothness. One common definition of smoothness for a 2D image is how rapidly the
pixel intensities change from one pixel to the next.
To calculate the mean smoothness of an image, you can follow these steps for a 4x4
matrix:
1. Calculate the gradient: Calculate the gradient of the image using a Sobel or Scharr
filter, which highlights edges and is commonly used for edge detection. The
gradient is a measure of how rapidly the pixel intensities change from one pixel to
the next in both the x and y directions.
2. Calculate the absolute gradient: Take the absolute value of the gradient at each
pixel to get the magnitude of the gradient.
3. Calculate the smoothness: Calculate the smoothness of the image as the average
of the magnitudes of the gradient at each pixel. This gives a measure of how
rapidly the pixel intensities change on average across the image.
Original image:
1234
5678
9876
5432
Gradient:
-2 -2 -2 -2
-2 -2 -2 -2
-2 2 2 2
2 2 2 2
Absolute gradient:
2222
2222
2222
2222
Smoothness: (2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2) / 16 = 2
So, the mean smoothness value for this image is 2. If we normalize this value, we get:
So, the normalized mean smoothness value for this image is 0.0884.
All these values of the features are stored and passed through the classifier. A
classification model attempts to draw some conclusion from observed values. Given
one or more inputs a classification model will try to predict the value of one or more
outcomes. Outcomes are labels that can be applied to a dataset. As more data are
entered the better the prediction and accuracy.
Classification is the process of predicting the class of given data points. Classes are
sometimes called targets/ labels or categories. Classification predictive modelling is
the task of approximating a mapping function (f) from input variables (X) to discrete
output variables (y).
Classification belongs to the category of supervised learning where the targets are
also provided with the input data. There are many applications in classification in
many domains such as in credit approval, medical diagnosis, target marketing etc.
Image classification analyses the numerical properties of various image features and
organizes data into categories. ... In the subsequent testing phase, these feature-space
partitions are used to classify image features. The description of training classes is an
extremely important component of the classification process.
The objective of image classification is to identify and portray, as a unique gray level
(or color), the features occurring in an image in terms of the object or type of land
cover these features actually represent on the ground. Image classification is perhaps
the most important part of digital image analysis.
The process of dividing a particle-laden gas stream into two, ideally at a particular
particle size, known as the cut size. An important industrial application of classifiers
is to reduce over grinding in a mill by separating the grinding zone output into fine
and coarse fractions.
Classification belongs to the category of supervised learning where the targets also
provided with the input data.
Here in the project, we used the RandomForestClassifier and SVC classifier.
SVC Classifier:
SVC (Support Vector Classification) is one of the implementations of SVM for
classification problems. The SVC classifier is used to find the optimal hyperplane
that can classify the data points into different classes.
The SVC classifier works by finding the support vectors, which are the data points
closest to the decision boundary. These support vectors are used to define the
hyperplane and maximize the margin between the classes.
One of the advantages of using SVC is that it can handle non-linear data by using
kernel functions. By transforming the data into a higher-dimensional space, the SVC
classifier can find a hyperplane that can separate the data into different classes. Some
commonly used kernel functions in SVC include the linear kernel, polynomial
kernel, RBF kernel, and sigmoid kernel.
SVC is a powerful algorithm that can be used for a wide range of classification
problems. By selecting the appropriate hyperparameters and kernel function, SVC
can achieve high accuracy and generalization performance.
CHAPTER 5
VECTOR MACHINES
In SVM, the decision boundary is chosen in such a way that it maximizes the margin
between the two classes. The margin is the distance between the decision boundary and
the closest points of each class. SVM tries to find the decision boundary that has the
largest margin, as this is likely to generalize well to new, unseen data.
SVM has several advantages over other machine learning algorithms, such as high
accuracy, ability to handle large datasets, and good generalization performance.
However, SVM also has some limitations, such as the need for careful parameter tuning,
sensitivity to the choice of kernel function, and difficulty in handling multi-class
classification problems.
One of the advantages of using SVC is that it can handle non-linear data by using kernel
functions. By transforming the data into a higher-dimensional space, the SVC classifier
can find a hyperplane that can separate the data into different classes. Some commonly
used kernel functions in SVC include the linear kernel, polynomial kernel, RBF kernel,
and sigmoid kernel.
SVC is a powerful algorithm that can be used for a wide range of classification problems.
By selecting the appropriate hyperparameters and kernel function, SVC can achieve high
accuracy and generalization performance.[11]
NumPy is a popular Python library for numerical and scientific computing. It provides an
array object that is faster and more efficient than Python's built-in lists, and also provides
a wide range of mathematical functions for working with these arrays.
5.4.2. Pandas
Pandas is a popular open-source Python library that is used for data manipulation and
analysis. It provides highly efficient data structures and tools for working with structured
data such as tabular, time-series, and matrix data. Pandas is built on top of the NumPy
library and is used extensively in data science, machine learning, and finance.
It provides efficient data structures and tools for data manipulation, cleaning, and
analysis, making it an essential tool for anyone working with structured data in Python.
5.4.3. openCV
OpenCV (Open Source Computer Vision Library) is an open-source computer vision and
machine learning software library. It includes a wide range of image and video
processing tools and algorithms, such as object detection, face recognition, feature
detection, and optical flow.
OpenCV has interfaces for many programming languages. In Python, the OpenCV library
can be used by installing the "OpenCV-python" package using pip. The library is widely
used in research, industry, and academia for a variety of computer vision and machine
learning tasks.
5.4.4. [Link]
It is a module in the popular Python library scikit-learn that provides a set of functions for
preprocessing and scaling data before it is used for modeling. The module includes a
variety of methods that can be used to preprocess the data, including feature scaling,
normalization, and transformation.
By applying these preprocessing methods, data can be made more suitable for machine
learning algorithms, potentially leading to more accurate and reliable models.
Normalization: The MinMaxScaler class can be used to normalize the features of a
dataset to a specified range (e.g., [0, 1]). This is useful for algorithms that are sensitive to
the scale of the input data, such as neural networks.
5.4.5. [Link]
The [Link] module is a widely used Python library for creating static,
animated, and interactive visualizations in Python. It provides a range of functions for
creating plots, histograms, bar charts, scatter plots, and many other types of
visualizations.
pyplot provides a wide range of customization options for creating plots, such as setting
axis labels, adding legends, changing line styles, and controlling colors.
It integrates well with other scientific computing libraries, such as NumPy and SciPy,
making it easy to create plots from data stored in these libraries.
5.4.6. [Link]
1. Standardize the data: PCA assumes that the data is standardized (i.e., has zero
mean and unit variance) so that all variables are on the same scale.
2. Compute the covariance matrix: PCA calculates the covariance matrix of the
standardized data to identify the relationships between the variables.
3. Compute the eigenvectors and eigenvalues: PCA decomposes the covariance
matrix into its eigenvectors and eigenvalues. The eigenvectors represent the
directions of maximum variance in the data, and the eigenvalues represent the
magnitude of the variance in each eigenvector.
4. Choose the number of principal components: PCA selects the number of principal
components based on the proportion of the total variance explained by each
component.
5. Transform the data: PCA transforms the original data into a new set of variables
that are linear combinations of the original variables, called principal components.
5.4.8. sklearn.model_selection
arrays: the dataset to be split. This can be a single array or a tuple of arrays.
test_size: the fraction of the dataset to be used for testing. This can be a float
between 0 and 1, or an integer specifying the number of samples to use for
testing.
train_size: the fraction of the dataset to be used for training. This can be a float
between 0 and 1, or an integer specifying the number of samples to use for
training. If both train_size and test_size are specified, test_size will take priority.
random_state: a seed value for the random number generator used to split the
dataset. This is optional, but can be useful for reproducibility.
5.4.9. [Link]
The SVC class from the [Link] module is used to train a support vector machine
(SVM) classification model. SVMs are a type of machine learning algorithm used for
classification, regression, and outlier detection. They are particularly useful when
working with high-dimensional datasets.
5.4.10. [Link]
Classification Metrics:
accuracy_score: computes the accuracy of classification predictions
confusion_matrix: computes a confusion matrix from true and predicted labels
precision_score: computes the precision of classification predictions
recall_score: computes the recall of classification predictions
f1_score: computes the F1 score of classification predictions
The confusion matrix is a matrix used to determine the performance of the classification
models for a given set of test data. It can only be determined if the true values for test
data are known. The matrix itself can be easily understood, but the related terminologies
may be confusing. Since it shows the errors in the model performance in the form of a
matrix, hence also known as an error matrix.
True Negative: Model has given prediction No, and the real or actual value was
also No.
True Positive: The model has predicted yes, and the actual value was also true.
False Negative: The model has predicted no, but the actual value was Yes, it is
also called as Type-II error.
False Positive: The model has predicted Yes, but the actual value was No. It is
also called a Type-I error.
Accuracy =0.89
Precision: It can be defined as the number of correct outputs provided by the model or
out of all positive classes that have predicted correctly by the model, how many of them
were actually true. It can be calculated using the below formula:
Precision =0.88
Recall: It is defined as the out of total positive classes, how our model predicted
correctly. The recall must be as high as possible.
Recall =0.75
F-measure: If two models have low precision and high recall or vice versa, it is difficult
to compare these models. So, for this purpose, we can use F-score. This score helps us to
evaluate the recall and precision at the same time. The F-score is maximum if the recall is
equal to the precision. It can be calculated using the below formula:
F1 Score =0.80
CHAPTER 6
NEURAL NETWORK
The basic idea behind ANNs is to create a network of interconnected nodes, or "neurons,"
that can process and transmit information. Each neuron takes input from one or more
other neurons, performs a simple computation on that input, and then passes its output to
other neurons in the network. By adjusting the strength of the connections between
neurons, the network can learn to recognize patterns in data and make predictions based
on that data.
Artificial Neural Networks (ANNs) are a core technique in deep learning, which is a
subfield of machine learning focused on training models with multiple layers of neural
networks. ANNs in deep learning are used for a wide range of tasks, including image and
speech recognition, natural language processing, and predictive modeling.[13]
The technique of using ANNs in deep learning involves several key steps:
1. Data preparation: The first step is to prepare the data by preprocessing and
transforming it into a format suitable for training an ANN. This may involve tasks
such as normalization, feature scaling, and one-hot encoding.
5. Testing: Finally, the model is tested on a separate test set to measure its
performance on new, unseen data.
In deep learning, ANNs are often designed with many layers, which is why they are also
called deep neural networks. These deep neural networks are able to learn increasingly
complex features and patterns in the data as they process it through multiple layers.
There are also several variations of ANNs used in deep learning, including Convolutional
Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative
Adversarial Networks (GANs). These variations are designed to handle specific types of
data or problems and have their own unique architectures and training algorithms.
In a deep neural network, there are typically many layers of neurons, each layer
processing the output of the previous layer. This allows the network to learn increasingly
complex features of the data as it progresses through the layers. For example, in a CNN
designed for image recognition, the early layers might learn to detect simple features like
edges and corners, while later layers might learn to recognize more complex shapes and
objects.
To train an ANN in deep learning, a large amount of labeled data is fed into the network,
and the weights of the connections between neurons are adjusted using an optimization
algorithm (such as stochastic gradient descent) to minimize the difference between the
network's predictions and the true labels. This process is repeated over many iterations
until the network can accurately classify new, unseen data.
The input layer receives input data, which could be a vector, an image, or any other
structured data format. The input layer neurons simply pass on the input data to the first
hidden layer neurons.
The hidden layers process the input data through a series of mathematical operations to
extract relevant features and patterns. Each neuron in a hidden layer receives input from
multiple neurons in the previous layer, performs a computation on that input, and passes
the result on to the next layer.
The output layer produces the final output of the network, which could be a class label, a
probability distribution over multiple classes, a regression value, or any other type of
output depending on the problem being solved.
Each neuron in a neural network has a set of weights and biases associated with it, which
are learned during the training process. These weights and biases control the strength and
direction of the connections between neurons, and determine the output of each neuron
based on its input.[14]
The primary advantage of CNNs is their ability to automatically learn spatial hierarchies
of features from the raw input data, without the need for manual feature engineering. This
is achieved through the use of convolutional layers, which apply a set of learnable filters
to the input image to produce a set of feature maps. These feature maps capture important
spatial and structural information from the input image, and subsequent layers in the
network use this information to extract higher-level features and make predictions.
One notable example of a CNN architecture is the famous VGGNet, which was
developed by the Visual Geometry Group at the University of Oxford. VGGNet consists
of 16 or 19 layers, depending on the variant, and has achieved excellent performance on
the ImageNet dataset. Other popular CNN architectures include AlexNet, ResNet,
Inception, and MobileNet.
In summary, CNNs are a powerful and widely used type of neural network for analyzing
visual imagery, and have achieved state-of-the-art performance on a range of computer
vision tasks. If you are working on a project related to computer vision, CNNs are
definitely worth exploring further.[14]
A CNN (Convolutional Neural Network) layer is one of the building blocks of a deep
learning model designed for image or video recognition tasks. It is a specialized layer that
applies a mathematical operation called convolution to the input image, which extracts
specific features from the image.
The convolution operation involves sliding a small filter (also known as a kernel) across
the input image, and computing the dot product between the filter and the corresponding
pixels in the image. The result of this operation is a feature map that highlights specific
patterns or edges in the input image.
Figure 7.1. Convolutional Layer
Max pooling works by dividing the feature map into non-overlapping rectangular regions,
called pooling windows or kernels. For each region, the maximum value is selected and
retained, while the other values are discarded. The result is a new feature map with a
reduced spatial dimension and a higher level of abstraction.
In deep learning, a fully connected layer (also called a dense layer) is a type of neural
network layer in which each neuron is connected to every neuron in the previous layer,
and each connection is associated with a weight parameter. The output of a fully
connected layer is a linear transformation of the input, followed by an activation function.
One limitation of fully connected layers is that they can be computationally expensive,
particularly for large input sizes and large numbers of neurons. Additionally, they can be
prone to overfitting if the number of parameters for training.
The core of TensorFlow is its data flow graph, which represents the mathematical
operations and transformations that are applied to the data in the model. The graph
consists of a series of nodes that represent operations and a set of edges that represent the
data flowing between those operations. This graph is designed to be highly flexible and
scalable, allowing for the efficient processing of large datasets and the distribution of
computations across multiple devices.
One of the key features of TensorFlow is its ability to automatically compute gradients
for any function defined in the graph, using the backpropagation algorithm. This enables
efficient training of neural networks and other models, as it allows the optimization
algorithms to iteratively adjust the model's parameters to minimize the error between the
predicted output and the actual output.
7.5.2. "get_dataset_partitions_tf"
that returns three datasets: a training dataset ("train_ds"), a validation dataset ("val_ds"),
and a test dataset ("test_ds").
Based on the name of the function, it's likely that this function is used to split a larger
dataset into smaller partitions for the purposes of training, validating, and testing a
machine learning model.
In general, it's common practice to split a dataset into these three partitions in order to
evaluate the performance of a machine learning model. The training dataset is used to
train the model, the validation dataset is used to tune the model's hyperparameters and
evaluate its performance during training, and the test dataset is used to evaluate the final
performance of the model after it has been trained.
Without more information about the "get_dataset_partitions_tf" function, it's difficult to
provide more specific details about what it does or how it partitions the dataset. However,
it's likely that the function uses TensorFlow APIs to load and preprocess the dataset, and
then splits it into the desired partitions using some form of random sampling or stratified
sampling.
7.5.3. KERAS
Keras is a popular open-source software library for building and training machine
learning models, particularly deep neural networks. It is built on top of TensorFlow and
provides a simple and intuitive interface for defining and training complex models.
Keras provides a high-level API that allows developers to easily build and configure
different types of neural networks, including convolutional neural networks (CNNs),
recurrent neural networks (RNNs), and more. It also supports a wide range of layers,
activations, loss functions, and optimization algorithms that can be combined in different
ways to create custom models.
One of the key advantages of Keras is its ease of use and flexibility. The API is designed
to be intuitive and user-friendly, with a simple and consistent syntax that makes it easy to
define and train complex models. Keras also supports a range of backends, including
TensorFlow, Microsoft Cognitive Toolkit, and Theano, allowing developers to choose the
best option for their needs.[16]
The name "Adam" stands for "Adaptive Moment Estimation," which refers to the
algorithm's use of both first and second-order moments of the gradients to update the
model's parameters. Specifically, the algorithm maintains an exponential moving average
of the past gradients and squared gradients, and uses these estimates to compute adaptive
learning rates for each parameter.
The adaptive learning rates in Adam allow the optimizer to make larger updates to the
parameters when the gradients are small and smaller updates when the gradients are
large. This helps to prevent the optimizer from getting stuck in local minima and to
converge more quickly to the global minimum of the loss function.
Adam also includes several hyperparameters that can be tuned to improve its
performance on a specific problem, including the learning rate, the decay rate for the
moving averages, and the epsilon value used to prevent division by zero.
In practice, Adam is often the optimizer of choice for deep neural networks due to its
efficiency and effectiveness. However, it may not always be the best choice for every
problem, and other optimizer such as Adagrad, RMSProp, or SGD with momentum may
perform better depending on the specific characteristics of the problem and the model
being trained.[17]
In machine learning, training a model involves iteratively updating the model parameters
to minimize the error on the training data. The concepts of epochs, batch size, and steps
for epochs are related to this process and are explained below:
Epochs: An epoch is one complete iteration over the entire training dataset. During each
epoch, the model goes through the entire training dataset and updates its parameters
based on the average loss across all the data points. The number of epochs is typically a
hyperparameter that needs to be tuned to achieve the best performance on the validation.
Batch size: In practice, it is not always feasible to feed the entire training dataset to the
model at once due to memory constraints. Therefore, the training dataset is divided into
smaller batches, and the model is trained on each batch in turn. The number of data points
in each batch is called the batch size. The batch size is typically a hyperparameter that
needs to be tuned to achieve the best performance on the validation set.
Steps per epoch: The number of steps per epoch is the number of batches that the model
processes before completing one epoch. For example, if the training dataset has 1000 data
points, and the batch size is 10, then there will be 100 batches in one epoch, and the
number of steps per epoch will be 100.
CHAPTER 8
WORK
The mammography image dataset has been downloaded from the aiplanet website and
categorized it into benign and malignant and set the paths to benign and malignant and by
using python library cv2 read the images.
8.1.3. Image Preprocessing
The main goal of the pre-processing is to improve the image quality to make it ready to
further processing by removing or reducing the unrelated and surplus parts in the
background of the mammogram images. The noise and high frequency components
removed by filters. Some of the methods for image pre-processing are image
enhancement, image smoothening, noise removal, edge detection, etc.
Input for this stage is a mammogram image which was converted into a grayscale
image.
In digital image processing and computer vision, image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels, also known as image
objects). The goal of segmentation is to simplify and/or change the representation of an
image into something that is more meaningful and easier to analyze.
There are several methods for performing edge detection, but one of the most commonly
used is the Canny edge detection algorithm. This algorithm involves several steps,
including smoothing the image to remove noise, calculating the gradient of the image to
find regions of rapid change in intensity, and applying thresholding to identify edges.
The Canny algorithm is a multi-stage process that involves the following steps:
1. Smoothing: The image is convolved with a Gaussian filter to reduce noise and
blur the edges.
2. Gradient calculation: The gradient magnitude and direction are calculated for each
pixel in the image.
4. Double thresholding: Two threshold values are applied to the gradient magnitude,
and pixels above the high threshold are considered as strong edges, while pixels
below the low threshold are considered as non-edges. Pixels between the two
thresholds are considered as weak edges.
5. Edge tracking by hysteresis: The weak edges are connected to the strong edges if
they are adjacent to each other, forming continuous edges.
The output of the Canny algorithm is a binary image that indicates the location of edges
in the original image.
The basic idea behind the Hough Transform is to transform the image space into a
parameter space, where each pixel in the image is represented as a point in the parameter
space. In the case of line detection, for example, the parameters are the slope and y-
intercept of the line. Each pixel in the image is then transformed into a curve in the
parameter space, which corresponds to a line in the image.
The technique of extracting the features is useful when you have a large data set and
need to reduce the number of resources without losing any important or relevant
information. Feature extraction helps to reduce the amount of redundant data from
the data set.
In the end, the reduction of the data helps to build the model with less machine’s
efforts and also increase the speed of learning and generalization steps in the
machine learning process.
Feature extraction is a very important process for the overall system performance in
the classification of micro-calcifications. The features extracted are distinguished
according to the method of extraction and the image characteristics. The features
which are implemented here is texture features and statistical measures like Mean,
Standard deviation, Variance, Mean Smoothness, Mean Symmetry, Skewness,
Entropy and Kurtosis are explained in the Chapter 3.
After Extaction of features we have created a path to generate a CSV file and
Then, read the CSV file and convert it to pandas Data Frame.
The PCA algorithm works by first computing the covariance matrix of the dataset,
which represents the relationships between the different features. The eigenvectors
and eigenvalues of this matrix are then calculated, which represent the directions of
maximum variation in the data and the amount of variation along each of these
directions, respectively. The eigenvectors with the highest eigenvalues are the
principal components of the dataset.
PCA can be used for feature selection by selecting only the top k principal
components that capture the most variation in the data. This reduces the
dimensionality of the dataset while retaining the most important features. By
reducing the number of features, PCA can also help to reduce overfitting and
improve the performance of machine learning models.
The SVC classifier works by finding the support vectors, which are the data points
closest to the decision boundary. These support vectors are used to define the hyperplane
and maximize the margin between the classes.
One of the advantages of using SVC is that it can handle non-linear data by using kernel
functions. By transforming the data into a higher-dimensional space, the SVC classifier
can find a hyperplane that can separate the data into different classes. Some commonly
used kernel functions in SVC include the linear kernel, polynomial kernel, RBF kernel,
and sigmoid kernel.
SVC is a powerful algorithm that can be used for a wide range of classification problems.
By selecting the appropriate hyperparameters and kernel function, SVC can achieve high
accuracy and generalization performance.
Step 1: Initialize the Batch Size and Epochs 32 and 50 respectively. The purpose
of Batch size and no of epochs Batch size refers to the number of data points
processed in one iteration of a neural network, while the number of epochs refers
to the number of times the entire dataset is passed through the network during
training.
Step 2: Now load the dataset of mammography images which consists both
Step 4: Define the function resize & rescale and Data augmentation. The purpose
of image resizing and rescaling is to modify the size and scale of an image,
respectively, which can improve the accuracy of the model by ensuring that all
images are of a consistent size and scale. Data augmentation can help improve the
accuracy of a model by exposing it to a wider range of variations in the data, and
can also help prevent overfitting by increasing the diversity of the training data.
Step 5: Define CNN model which we have discussed in the Chapter 7 present in
the various layers such as Convolution Layer, Max-Pooling, Flatten Layer and
Step 6: Compile the model using Adam optimizer. The purpose of the Adam
Step 7: Fit the model in order the model is trained on a portion of the available
Step 8: Plot the training and Validation accuracy and then save the model.
CHAPTER 9
RESULTS
9.1. OUTPUT OF SVM
The main disadvantage of SVM Model is that when dealing with high number of instant
inputs The functioning of the Model may affect. However, the SVM still offers certain
advantages, such as its ability to handle nonlinear relationships between inputs and its
suitability for binary classification problems
CNNs are better suited for image processing and CNNs can automatically learn relevant
features, including their ability to automatically learn relevant features, scalability for
large datasets, and superior accuracy.
Hence, the project helps in detecting the cancerous tumor before its spreads to other parts
of the body and increases the chances of successful diagnosis.
FUTURE SCOPE
Breast Cancer Detection using CNN algorithm training model as an accuracy of 98%, it
means that the prediction of a test image is more accurate and in time consuming is less.
So we want to build a website using CNN Deep learning algorithm and the test images
which are perform in medical field for Breast Cancer Detection takes a one week of time
to predict either the patient is “Benign” or “Malignant”. So, by this project the prediction
is done in fraction of seconds. Whether the patient is “Benign” or “Malignant”. Early
detection of the tumor is a vital process that benefits the diagnosis of Breast cancer this
can be achieved by the website of Breast Cancer Detection.
REFERENCES
[1] DeSantis CE, Bray F, Ferlay J, Lortet-Tieulent J, Anderson BO, Jemal A. International
Variation in Female Breast Cancer Incidence and Mortality Rates. Cancer Epidemiol
Biomarkers Prev. 2015; 24(10): 1495-506.
[2] Henry NL, Shah PD, Haider I, Freer PE, Jagsi R, Sabel MS. Chapter 88: Cancer of the Breast.
In: Niederhuber JE, Armitage JO, Doroshow JH, Kastan MB, Tepper JE, eds. Abeloff’s Clinical
Oncology. 6th ed. Philadelphia, Pa: Elsevier; 2020.
[3] American Cancer Society. Cancer Facts and Figures 2023. Atlanta, Ga: American Cancer
Society; 2023.
[4] Jagsi R, King TA, Lehman C, Morrow M, Harris JR, Burstein HJ. Chapter 79: Malignant
Tumors of the Breast. In: DeVita VT, Lawrence TS, Lawrence TS, Rosenberg SA, eds. DeVita,
Hellman, and Rosenberg’s Cancer: Principles and Practice of Oncology. 11th ed. Philadelphia,
Pa: Lippincott Williams & Wilkins; 2019.
[5] [Link], [Link], [Link], [Link], Amit Karn “An Accurate Breast Cancer
Detection and Classification using Image Processing” Department of ECE, Sri Eshwar College
of Engineering, Coimbatore, India Volume 9, Issue 3, March 2021.
[6] Siddhartha Gupta School Of Electrical Engineering VIT Vellore, India “Breast Cancer
Detection Using Image Processing Techniques” Innovations in Power and Advanced
Computing Technologies (i-PACT) 2019.
[7] Prannoy Giri* and K Saravanakumar Department of Computer Science, Christ University,
India. “Breast Cancer Detection using Image Processing Techniques” ISSN: 0974-6471 June
2017, Vol. 10, No. (2): Pgs. 391-399 volume 9,2019.
[8] Dina A. Ragab, Maha Sharkas, Stephen Marshall and Jinchang Ren Electronics and
Communications Engineering Department, Arab Academy for Science, Technology, and
Maritime Transport (AASTMT), “Breast cancer detection using deep convolutional neural
networks and support vector machines” 2019.
[9] X. Liu and D. Wang, “Image and Texture Segmentation Using Local Histograms”, IEEE Trans.
Med. Img., vol.15, pp. 3066-3076, 2006.
[11] M. Yang, S. Cui, Y. Zhang, J. Zhang and X. Li, "Data and Image Classification of
Haematococcus pluvialis Based on SVM Algorithm," 2021 China Automation Congress (CAC),
Beijing, China, 2021, pp. 522-525, doi: 10.1109/CAC53003.2021.9727433.
[13] M. C. Irmak, M. B. H. Taş, S. Turan and A. Haşiloğlu, "Comparative Breast Cancer Detection
with Artificial Neural Networks and Machine Learning Methods," 2021 29th Signal Processing
and Communications Applications Conference (SIU), Istanbul, Turkey, 2021, pp. 1-4, doi:
10.1109/SIU53274.2021.9477991.
[14] K. Mridha, "Early Prediction of Breast Cancer by using Artificial Neural Network and Machine
Learning Techniques," 2021 10th IEEE International Conference on Communication Systems
and Network Technologies (CSNT), Bhopal, India, 2021, pp. 582-587, doi:
10.1109/CSNT51715.2021.9509658.
[16] K. Duvvuri, H. Kanisettypalli and S. Jayan, "Detection of Brain Tumor Using CNN and CNN-
SVM," 2022 3rd International Conference for Emerging Technology (INCET), Belgaum, India,
2022, pp. 1-7, doi: 10.1109/INCET54531.2022.9824725.