A CAPSTONE PROJECT REPORT
ON
“Breast Cancer Prediction using Machine Learning ”
PROGRAM CODE:- CO 5 I
COURSE NAME:- Capstone Project Planning
Course code:- 22058
DEPARTMENT OF COMPUTER ENGINEERING
(2024-2025)
BY
ROLL NO. NAME OF STUDENTS SIGN.
2509 Sneha Naik
2510 Khushi Nanekar
2513 Tanvi Newase
UNDER THE GUIDANCE OF
Prof. [Link]
1
COMPUTER ENGINEERING DEPARTMENT
VISION AND MISSION OF THE INSTITUTE
❖ VISION:
Achieve excellence in quality technical education by imparting knowledge, skills and abilities
to build a better technocrat.
❖ MISSION:
M1: Empower the students by inculcating various technical and soft skills.
M2:Upgrade teaching-learning process and industry-institute interaction continuously.
VISION AND MISSION OF THE COMPUTER DEPARTMENT
❖ VISION:
“Enhance skills by providing value based technical education for fulfilling global needs in
the field of computer engineering.”
❖ MISSION:
M1: To provide quality education in computer engineering by improving psychomotor
skills.
M2: To develop positive attitude, communication skills, team spirit and entrepreneurship.
M3: To develop awareness about societal and ethical responsibility for professionalism.
2
PROGRAM OUTCOMES (POs)
PO1: Basic and Discipline specific knowledge: Apply knowledge of basic mathematics, science and
engineering fundamentals and engineering specialization to solve the engineering problems..
PO2: Problem analysis: Identify and analyze well-defined engineering problems using codified standard
methods.
PO3: Design/ development of solutions: Design solutions for well-defined technical problems and assist
with the design of systems components or processes to meet specified needs
PO4: Engineering Tools, Experimentation and Testing: Apply modern engineering tools and
Appropriate technique to conduct standard tests and measurements.
PO5: Engineering practices for society, sustainability and environment: Apply appropriate
Technology in context of society, sustainability, environment and ethical practices.
PO6: Project Management: Use engineering management principles individually, as a Team member or
a leader to manage projects and effectively communicate about well-defined engineering activities.
PO7: Life-long learning: Ability to analyse individual needs and engage in updating in the context of
technological changes
PROGRAM SPECIFIC OUTCOMES (PSO)
The Diploma in Computer Engineering will prepare students to attain:
PSO1: Computing Knowledge: Apply computing knowledge with standard practices to
develop software.
PSO2: Computer Engineering maintenance: Maintain computer hardware and software
system.
3
CERTIFICATE
This is to certify that Group Project entitled “ Breast Cancer Prediction using
Machine Learning ” has been completed under Capstone Project Planning for the
Computer Engineering Department , Batch 2024-25.
The members of the team:
1. Sneha Naik
2. Khushi Nanekar
3. Tanvi Newase
Project Guide
[Link]
Computer Engineering Department
4
Acknowledgements
The satisfaction & euphoria that accompany the successful completion of any task would be
incomplete without the mention of the people , who made it possible , whose constant guidance &
encouragement aided us in the completion of our project .
Its our priviledge to express voice of gratitude and respect to all those who guided us and inspired
us in the completion of this project.
We would like to thank our guide Prof. A. N. Gedam for his precious guidance & effectually care
which happens to be the psyche of this report.
We would also like to express our heartfelt gratitude to Capstone Project Co-Ordinator Prof. A. N.
Gedam and Prof.V. N. Kukre, HOD of Computer Department, for his continuous encouragement
& valuable guidance .
And of course , we would like to thank the management of AISSMS Polytechnic for providing us
such an opportunity to learn from these experience.
Roll No Name Signature
2509 Sneha Naik
2510 Khushi Nanekar
2513 Tanvi Newase
5
Abstract
Breast cancer is type of tumor that occurs in the tissues of the breast. It is most common type of
cancer found in women around the world and it is among the leading causes of deaths in women.
This article presents the comparative analysis of machine learning, deep learning and data mining
techniques being used for the prediction of breast cancer. Many researchers have put their efforts
on breast cancer diagnoses and prognoses, every technique has different accuracy rate and it varies
for different situations, tools and datasets being used.
Our main focus is to comparatively analyze different existing Machine Learning and Data Mining
techniques in order to and out the most appropriate method that will support the large dataset with
good accuracy of prediction. The main purpose of this review is to highlight all the previous studies
of machine learning algorithms that are being used for breast cancer prediction and this article
provides the all necessary information to the beginners who want to analyze the machine learning
algorithms to gain the base of deep learning.
6
CONTENTS
Chapter Topics Page No.
1.1 Idea of Project 8
Chapter 1:
Introduction 1.2 Motivation of project 8
1.3 Brief descriptions 8
2.1 Literature Survey for problem 9
Chapter 2:
Literature Survey identification
3.1 Problem definition and scope. 10
Chapter 3:
3.2 Project Role 10
Scope of Project
3.3 Project plan 10
3.4 Software and hardware requirement/ 11
specification
4.1 System Architecture 12
Chapter 4: Methodology
4.2 Conclusion
Chapter 5: 14
5.1 Bibliography
References and Bibliography
7
Chapter 1
Introduction
1.1 Idea of Project:
Over the years, cancer research has evolved significantly, with scientists working to improve
early detection and treatment strategies. Early-stage screening has become crucial in
identifying various types of cancer before they progress, leading to new methods for predicting
treatment outcomes. With advancements in medical technology, an extensive amount of
cancer-related data is now available, providing valuable insights for research. However,
accurately predicting cancer outcomes remains a challenge for physicians.
This project focuses on using machine learning techniques to analyze cancer datasets and
predict whether a case is benign or malignant. Benign cancer cells do not spread, while
malignant cancer cells can invade other parts of the body, making them far more dangerous.
By applying machine learning algorithms, we aim to enhance the prediction process, helping
to classify cancers accurately and potentially guiding early intervention. This approach not
only supports healthcare providers but also demonstrates the potential of machine learning in
medical diagnosis and cancer outcome prediction.
1.2 Motivation of project:
Breast cancer is a leading cause of cancer-related deaths among women globally, underscoring
the critical need for accurate, timely diagnosis and treatment. Traditional diagnostic methods,
such as clinical examinations and imaging, have limitations in accuracy and efficiency, which
can delay the detection and effective management of the disease. With advancements in
machine learning and data mining, there is an opportunity to revolutionize breast cancer
diagnosis and prognosis by leveraging these technologies for improved accuracy, faster
detection, and better patient outcomes. This project is motivated by the potential of machine
learning to address these challenges, enabling early detection and personalized treatment
strategies that could significantly reduce mortality rates and enhance the quality of life for
patients.
1.3 Brief description:
This project focuses on the application of machine learning and data mining techniques for
breast cancer prediction and diagnosis. It provides a comprehensive review of state-of-the-art
methodologies, highlighting major algorithms such as classification, regression, clustering,
and ensemble learning. The research explores the use of these techniques to differentiate
between benign and malignant tumors, stage cancer accurately, and predict outcomes
effectively. Additionally, deep learning approaches and their applications in breast cancer
detection are discussed. The project evaluates the effectiveness and efficiency of various
machine learning models, aiming to identify the most promising methods for practical use in
medical diagnostics. The findings are intended to guide future research and development in
this critical area, paving the way for more reliable, scalable, and accessible diagnostic tools.
8
Chapter 2
Literature Survey
2.1 Literature Survey for problem identification
• Prof. Ajit N. Gedam, Kajol B. Deshmane, Nishigandha N. Jadhav, Ritul M. Adhav, and
Akanksha N. Ghodake (2021) –
This study examines the use of machine learning algorithms such as SVM, Decision Tree, and ANN
for breast cancer detection. The highest accuracy was achieved with SVM at 95.7%, suggesting its
effectiveness in predictive accuracy for breast cancer recurrence.
• Noreen Fatima, Li Liu, Sha Hon, Haroon Ahmed (2020):
This paper covers a comprehensive review of machine learning, deep learning, and data mining
techniques for breast cancer prediction, comparing various algorithms such as Support Vector
Machine (SVM), Decision Tree, Random Forest, Naive Bayes, and others. It highlights the
predictive accuracy of these algorithms, discusses ensemble and deep learning approaches, and
emphasizes finding the most suitable model for handling large datasets in breast cancer prediction.
• Pedro Henriques Abreu, Miriam Seoane Santos, Miguel Henriques Abreu, Bruno
Andrade, and Daniel Castro Silva (2016):
The paper titled , "Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A
Systematic Review," reviews key challenges in predicting breast cancer recurrence with machine
learning. Scarcity of recurrence data, lack of consensus on ideal features, and class imbalance hinder
model accuracy. Common predictors like age and tumor size are frequently used, but model
performance varies across Decision Trees, Naive Bayes, and SVMs, with issues in sensitivity and
interpretability. Improved data quality, feature selection, and handling of imbalanced classes are
needed to enhance predictive reliability.
• Zeyad Qasim Habeeb Al-Zaydi, Branislav Vuksanovic, and Imad Qasim Habeeb (2024):
The paper "Breast Cancer Detection Using Image Processing and Machine Learning," reviews
advancements in breast cancer detection. It discusses early CAD systems, which relied on manual
feature engineering and had limited accuracy due to image variability. The authors highlight deep
learning improvements, particularly with CNNs, which enhance detection accuracy through
automated feature extraction. Due to limited specialized datasets, transfer learning is commonly
used, allowing models trained on larger datasets to be adapted for breast cancer detection. While
promising, deep learning still faces challenges in data availability and advanced processing needs.
• Ataollahi MR, Sharifi J, Paknahad MR, and Paknahad A (2015):
The paper titled "Breast Cancer and Associated Factors: explores breast cancer prevalence, quality
of life impacts, and associated risk factors. The authors note that breast cancer is among the most
prevalent cancers globally, with significant physical, mental, and social effects, particularly among
Iranian women. Key risk factors include age, family history, genetic mutations, hormonal
influences, lifestyle choices like alcohol use and obesity, and environmental exposures. Social and
family support systems are also discussed
as vital in reducing the emotional toll of the disease. Despite progress in understanding risk factors,
the precise etiology of breast cancer remains uncertain
9
Chapter 3
Scope of project
3.1 Problem definition & scope
Breast cancer is one of the leading types of cancer in many countries, including India.
Although the survival rate is high—with early diagnosis enabling 97% of women to survive
more than five years the death toll from this disease has risen significantly over the last few
decades. The critical challenge in improving outcomes is early detection. Therefore, in
addition to medical solutions, integrating data science approaches is essential to address the
mortality associated with this illness. This analysis aims to identify which features are most
helpful in predicting whether cancer is malignant or benign and to examine general trends
that may guide model selection and hyperparameter tuning. The ultimate goal is to classify
breast cancer as either benign or malignant, using machine learning classification methods
to develop a function that can accurately predict the discrete class of new input data.
3.2 Project Role
Team Member Role
Sneha Naik Project Manager, documentation
Khushi Nanekar Quality assurance tester, Technical Lead
Tanvi Newase System Architect, Machine Learning
Engineer
10
3.3 Project plan
Sr. no. Duration Tasks
1. 15-September-2024 To 25-September-2024 Selection of domain
2. 29-September-2024 To 06-October-2024 Topic selection
3. 07- October -2024 To 23- October -2024 Base paper selection.
4. 25- October -2024 To 30- October -2024 Existing System.
5. 25- October -2024 To 30- October -2024 Requirement collection from Industry
6. 1- November -2024 To 7-November-2024 Project proposal plan.
7. 3rd week of November Presentation on project Plan.
8. 14-February-2025 To 21- February -2025 Module design.
9 22- February -2025 To 01-March-2025 GUI .
10 02- March -2025 To 30- March -2025 Implementation.
11 01-April-2025 To 11-April-2025 Testing.
12 12-April-2025 To 15-April-2025 Installation of application
13 Approx last week of April Final project presentation
11
3.4 Software and Hardware Specifications
Software Specifications :
Sr. No. Resource Configuration
1. Operating System Windows 11
2. Coding Language Python
Algorithms- AAN, LR,
DT,SVM
3. Open For image processing
Cv/PIL(optional)
4. Development VS code
Environment
Hardware Specifications :
[Link] Resource Configuration
1 Processor Intel® Pentium® CPU B960
2 Speed 2.20 GHz
3 RAM 4 GB
5 Key Board Standard Windows Keyboard
6 Mouse Two or Three Button Mouse
12
Chapter 4
Methodology
4.1 Sytsem Architecture
Diagramatical Representation
Fig. working of Algorithm
• ARTIFICIAL NEURAL NETWORK (ANN)
Artificial Neural Network is a common algorithm for the data mining process. A neural network
consists of the input layer, hidden layer, and output layer. This technique is used to extract the
pattern that is too complex. The algorithm is based on parallel processing, distributed memory
collective solution, and network architecture.
• LOGISTICS REGRESSION (LR)
It is a supervised learning algorithm that includes more dependent variables. The response of
this algorithm is in the binary form. Logistics regression can provide the continuous outcome of
specific data. This algorithm consists of a statistical model with binary variables.
13
• DECISION TREE (DT)
Decision tree is based on classification and regression model. A dataset is divided into a smaller
number of subsets. These smaller sets of data can make predictions with the highest level of
precision
• SUPPORT VECTOR MACHINE (SVM)
It is a supervised learning algorithm that is used for both classification and regression problems.
It consists of theoretical and numeric functions to solve the regression problem. It provides the
highest accuracy rate while doing prediction of large datasets. It is a strong machine-learning
technique that is based on 3D and 2D modelling .
• RANDOM FOREST (RF)
Random Forest algorithm is based on supervised learning that is used to solve classification
and regression problems. It is a building block of machine learning that is used for prediction of
new data on the basis of previous dataset.
4.2 Conclusion
In conclusion, we explore various machine learning, deep learning, and data mining
algorithms for predicting breast cancer, aiming to identify the most effective approaches.
We start by discussing different types of breast cancer, drawing insights from fourteen
research papers on their symptoms and causes. Then, we delve into key algorithms,
highlighting how they contribute to predictions. However, challenges remain, such as
limited datasets and imbalances between positive and negative data, which can lead to bias.
Additionally, the unequal number of breast cancer images versus affected patches poses a
challenge for accurate diagnosis and prediction. This article serves as a guide for beginners
interested in understanding these algorithms and their potential impact on breast cancer
detection.
14
Chapter 5
Bibliograph
REFERENCES
[1] Ajit N. Gedam, Kajol B. Deshmane, Nishigandha N. Jadhav, Ritul M. Adhav, and Akanksha N.
Ghodake conducted a Survey on Breast Cancer Detection using Logistic Regression Algorithm at
the Department of Computer Science, AISSMS Polytechnic, Pune, MH, India.
[2] Noreen Fatima, Li Liu, Sha Hong, and Haroon Ahmed's work titled Prediction of Breast Cancer,
Comparative Review of Machine Learning Techniques, and Their Analysis was published in IEEE
Access. The study was supported by the National Natural Science Foundation of China and the
Chongqing Provincial Human Resource and Social Security Department.
[3] Bruna Karina Banin Hirata, Julie Massayo Maeda Oda, Roberta Losi Guembarovski, Carolina
Batista Ariza, Carlos Eduardo Coral de Oliveira, and Maria Angelica Ehara Watanabe authored the
paper Molecular Markers for Breast Cancer: Prediction on Tumor Behavior. It was published by
the Laboratory of Polymorphism and Application Study of DNA, Department of Londrina, Brazil,
in 2014 with Academic Editor Andreas Pich.
[4] Madhu Kumari and Vijendra Singh presented a Breast Cancer Prediction System from the
Department of Computer Science and Engineering at The NorthCap University, Gurugram,
Haryana, India.
[5] The study Breast Cancer Detection Using Image Processing and Machine Learning by Zeyad
Q. Habeeb, Branislav Vuksanovic, and Imad Q. Al-Zaydi was carried out by the Biomedical
Engineering Department, University of Technology, Iraq, and the School of Energy and Electronic
Engineering, University of Portsmouth, UK.
[6] Ataollahi MR, Sharifi J, Paknahad MR, and Paknahad A reviewed Breast Cancer and
Associated Factors, representing the Noncommunicable Diseases Research Center, Fasa University
of Medical Sciences, Iran; Aja University of Medical Sciences, Tehran, Iran; and Hormozgan
University of Medical Sciences, Bandar Abbas, Iran.
[7] A. M. Khan and R. Afzal proposed A Hybrid Approach for Breast Cancer Classification with
SVM and Genetic Algorithms, published in Computers in Biology and Medicine in May 2020.
[8] S. Mehrotra and P. Agarwal developed a paper on Automated Breast Cancer Detection Using
Deep Learning: CNN-based Approaches, published in the Journal of Medical Imaging and Health
Informatics, December 2020.
[9] N. Fatima, Li Liu, Sha Hong, and Haroon Ahmed once again discussed Prediction of Breast
Cancer, Comparative Review of Machine Learning Techniques, and Their Analysis in a detailed
study published in IEEE Access, vol. 8, pp. 150366–150374, in 2020.
[10] A. Reddy, B. Soni, and S. Reddy's research on Breast Cancer Detection by Leveraging
Machine Learning appeared in ICT Express in 2020 with the DOI: 10.1016/[Link].2020.04.009.
15
Appendix – B
Name of Program :COMPUTER ENGINEERING Semester : CO – 5 – I
Course Title :Capstone Project Planning Course Code : 22058
Title of Capstone Project : Breast Cancer Prediction using Machine Learning
[Link] addressed by the Capstone Project:
A] Discipline knowledge
B] Engineering Tools
C] The engineer and society
D] Ethics
B. COs addressed by the Capstone Project:
C22058.a: Write the problem/task specification in existing system related to the occupation.
C22058.b: Select, Collect and use required information/knowledge to solve the problem/complete the task.
C22058.c: Logically choose relevant possible solution.
C22058.d: Consider the ethical issues related to project.
C22058.e: Assess the impact of the project on society.
C22058.f: Prepare project proposal with action plan and time duration scientifically before beginning of
project
C22058.f: Communicate effectively and confidently as a member and leader of team.
C. Other Learning Outcomes achieved through this project:
1Unit Outcomes (Cognitive Domain)
I. Problem identification and specification
II. Industrial survey and literature review
2 Practical Outcomes (in Psychomotor Domain)
I. Collect relevant from sources
II. Exprtise technology required for fabrication.
3 Affective Domain Outcomes
a. Demonstrate working as a leader/a team member.
b. Maintain tools and equipment.
c. Follow ethical practices.
[Link] [Link]
Name and Sign of Project guide Signature of Project coordinator
16