Predictive Analytics?
▪ Predictive analytics encompasses a variety of statistical techniques from data
mining, predictive modeling, and machine learning that analyze current and
historical facts (data) to make predictions about future or otherwise unknown
events.
▪ Predictive analytics is the process of using statistical techniques, machine
learning, data mining, and predictive modeling on historical and current data to
analyze patterns and make predictions about future or otherwise unknown events.
▪ Predictive analytics is the use of data, statistical algorithm, machine learning
techniques to identify the likelihood of future outcomes based on historical and
current data.
▪ Predictive analytics involves extracting patterns from historical and current data
to forecast future outcomes or trends. It answers the question, “What is likely to
happen?” by quantifying probabilities and identifying relationships in data.
▪ Historical data refers to recorded information about past events, circumstances,
or phenomena related to a specific subject.
How it works
▪ It uses statistical models and machine learning algorithms to analyze past data,
identify patterns, and make predictions.
Key objective
▪ It aims to predict the likelihood of outcomes and guide decision-making by
uncovering relationships between various factors based on past occurrences.
▪ The goal is to provide a best assessment of what will happen in the future.
▪ Enable proactive decision-making by providing actionable insights into future
probabilities, rather than just describing what has happened (descriptive analytics)
or why it happened (diagnostic analytics).
Key Components
• Data: The foundation of predictive analytics. This includes structured data (e.g.,
databases, spreadsheets) and unstructured data (e.g., text, images). High-quality,
relevant, and clean data is essential for accurate predictions.
• Statistical Models: Algorithms like regression, classification, and clustering are
used to identify relationships and patterns in data. These models form the basis
for predictions.
• Machine Learning: Advanced techniques, such as neural networks or ensemble
methods, enhance predictive power by handling complex, non-linear patterns in
large datasets.
• Domain Knowledge: Understanding the context of the problem (e.g., industry-
specific factors) ensures the right variables are selected and models are relevant.
• Validation and Testing: Predictions are tested against real outcomes to ensure
model accuracy and generalizability.
Core Techniques
• Regression Analysis: Used to predict numerical outcomes. For example, linear
regression can predict sales revenue based on advertising spend and historical
sales data.
o Example: Predicting a house’s price based on its size, location, and number
of bedrooms.
• Classification: Predicts categorical outcomes, such as whether an event will
occur (e.g., yes/no, true/false). Algorithms like logistic regression, decision trees,
or support vector machines are common.
o Example: Classifying whether a loan applicant is likely to default.
• Time-Series Forecasting: Analyzes data points collected over time to predict
future trends, such as stock prices or weather patterns. Techniques include
ARIMA, exponential smoothing, or LSTM (a type of neural network).
o Example: Forecasting monthly electricity demand for a utility company.
• Clustering: Groups similar data points to identify patterns, often used in customer
segmentation.
o Example: Grouping customers by purchasing behavior for targeted
marketing.
• Decision Trees and Ensemble Methods: Decision trees split data into branches
to make predictions, while ensemble methods like random forests or gradient
boosting combine multiple models for better accuracy.
o Example: Predicting equipment failure by analyzing sensor data with a
random forest model.
• Neural Networks and Deep Learning: These are used for complex datasets,
such as image recognition or natural language processing, where non-linear
relationships are prevalent.
o Example: Predicting customer sentiment from social media posts.
Applications Across Industries
• Finance:
o Credit Scoring: Predicting the likelihood of a borrower defaulting on a loan
using credit history and financial data.
o Fraud Detection: Identifying suspicious transactions by analyzing patterns
in spending behavior.
• Healthcare:
o Disease Prediction: Forecasting the likelihood of diseases like diabetes
based on patient data (e.g., age, weight, family history).
o Hospital Readmission: Predicting which patients are at risk of readmission
to optimize care plans.
• Marketing:
o Customer Churn Prediction: Identifying customers likely to stop using a
service to target retention efforts.
o Personalized Recommendations: Predicting what products a customer
might buy based on past purchases (e.g., Netflix or Amazon
recommendations).
• Operations:
o Supply Chain Optimization: Forecasting demand to manage inventory and
reduce waste.
o Predictive Maintenance: Anticipating equipment failures by analyzing
sensor data to schedule timely repairs.
• Retail:
o Sales Forecasting: Predicting future sales to optimize pricing and
promotions.
o Customer Segmentation: Grouping customers by behavior for targeted
campaigns.
• Human Resources:
o Employee Turnover Prediction: Identifying employees likely to leave based
on engagement and performance data.
o Talent Acquisition: Predicting candidate success based on resumes and
assessment scores.
Tools and Technologies
• Programming Languages:
o Python: Widely used due to libraries like scikit-learn, TensorFlow, and
pandas for data manipulation, modeling, and visualization.
o R: Popular for statistical analysis and visualization, with packages like caret
and randomForest.
• Platforms:
o SAS: Offers robust tools for predictive modeling and business intelligence.
o IBM SPSS: User-friendly for statistical analysis and predictive analytics.
o Microsoft Azure Machine Learning: Cloud-based platform for building and
deploying models.
o Google Cloud AI: Provides tools for predictive analytics with integration into
BigQuery and TensorFlow.
• Visualization Tools:
o Tableau: For visualizing predictive insights.
o Power BI: Integrates predictive models with interactive dashboards.
Challenges
• Data Quality: Inaccurate, incomplete, or biased data can lead to poor predictions.
For example, missing customer data can skew churn predictions.
• Overfitting: Models that are too complex may perform well on training data but fail
to generalize to new data.
• Interpretability: Complex models like neural networks can be “black boxes,”
making it hard to explain predictions to stakeholders.
• Scalability: Handling large datasets or real-time predictions requires significant
computational resources.
• Dynamic Environments: Changing conditions (e.g., market shifts) can render
models obsolete, requiring frequent updates.
Ethical Considerations
• Bias and Fairness: Models trained on biased data can disseminate unfair
outcomes, such as discriminatory loan approvals based on historical biases.
o Example: If a dataset underrepresents certain demographics, predictions
may be skewed against them.
• Privacy: Using sensitive data (e.g., health or financial records) raises concerns
about consent and data security.
• Transparency: Stakeholders need to understand how predictions are made,
especially in high-stakes areas like healthcare or criminal justice.
• Accountability: Organizations must ensure predictions don’t harm individuals or
groups and have mechanisms to address errors.
Getting Started with Predictive Analytics
• Learn the Basics: Study statistics, machine learning, and data manipulation.
Online courses (e.g., Coursera, edX) cover these topics.
• Choose a Tool: Start with Python or R for flexibility, or platforms like SAS for
enterprise use.
• Practice with Datasets: Use public datasets (e.g., Kaggle) to build and test
models.
• Focus on Business Value: Align projects with organizational goals to ensure
impact.
• Stay Updated: Follow advancements in AI and data science to leverage new tools
and techniques.
Data to insights to decisions / Process of Predictive
Analytics
The journey from data to insights to decisions follows a structured process that transforms
raw data into actionable outcomes supporting effective decision-making. The main stages
are:
1. Define the Problem or Objective: Begin with a clear understanding of the
business question or decision that needs to be addressed. This involves defining
specific goals, key performance indicators (KPIs), and what success looks like.
Clear objectives guide the entire analytics process.
2. Data Collection: Gather relevant data from various sources, such as internal
databases, customer feedback, transaction records, sensors, or external industry
data. The data must be aligned with the defined objectives.
3. Data Cleaning and Preparation: Cleanse the data to handle missing values,
errors, and inconsistencies. Prepare it for analysis through filtering, transformation,
and feature selection to ensure quality and reliability.
4. Data Analysis and Modeling: Apply analytical techniques, including descriptive
statistics, machine learning models, or predictive analytics, to uncover patterns,
correlations, or forecasts. Different models and methods may be tested to find the
best fit.
5. Generate Insights: Interpret the analysis results to extract meaningful insights that
answer the original question. Use data visualization and reporting to communicate
these findings clearly to stakeholders.
6. Make Data-Driven Decisions: Use the insights to inform business strategies or
operational decisions. Actions are based on evidence rather than intuition or
assumptions, improving accuracy and reducing bias.
7. Implement and Monitor: Put decisions into practice, monitor outcomes, and feed
new data back into the process for continuous learning and improvement.
This ierave cycle ensures organizaons move sysemacally rom raw daa hrough analysis o
insighs, enabling inormed and effecve decision-making ha drives business success.
1. Define Clear Objecves
Start by precisely defining the business problem or decision question. Set specific,
measurable goals (Key Performance Indicators or KPIs) aligned with the
organization’s strategic priorities. These objectives guide what data to collect and
how to measure success.
Example: A retail company sets an objective to reduce customer churn by 10% in
the next quarter.
2. Collec Relevan Daa
Gather data from internal and external sources that directly affect the objectives.
Data could include sales transactions, customer feedback, sensor data, market
reports, etc. Data quality and governance are critical here.
Example: The retailer collects purchase history, customer service interactions, and
loyalty program data.
3. Daa Cleaning and Preparaon
Clean data by handling missing values, correcting errors, and ensuring
consistency. Transform and select relevant features that will aid effective analysis.
Example: Missing customer contact information is filled or removed; product
categories are standardized.
4. Daa Analysis and Modeling
Use appropriate analytical and machine learning techniques to explore the data,
identify patterns, and build predictive models if applicable. Validation ensures
model accuracy and robustness.
Example: The retailer uses classification models (e.g., random forest) to predict
which customers are likely to churn.
5. Generae Insighs
Interpret analytical results to understand underlying causes or trends.
Visualizations and reports communicate insights clearly to stakeholders.
Example: Analysis reveals that customers who reduce purchase frequency and
contact customer service more than twice a month have a high churn risk.
6. Make Daa-Driven Decisions
Use the insights to inform strategic or operational decisions. Implement actions
designed to achieve the defined objectives.
Example: Develop targeted retention campaigns offering personalized discounts
to high-risk customers.
7. Implemen and Monior Resuls
Roll out decisions and continuously monitor outcomes against KPIs. Feed new
data back to refine models and strategies iteratively, enabling continuous
improvement.
Example: Track churn rates post-campaign and update predictive models with new
customer behaviors
Process of Predictive Analytics
• Define the Problem: Identify the business question or goal (e.g., “Which
customers are likely to churn?”).
• Data Collection: Gather relevant data from internal (e.g., CRM systems) and
external sources (e.g., market trends).
• Data Preparation: Clean data by handling missing values, outliers, and
inconsistencies. Feature engineering creates new variables to improve model
performance.
• Model Selection: Choose appropriate algorithms based on the problem type (e.g.,
regression for continuous outcomes, classification for categorical).
• Training and Testing: Split data into training and testing sets to build and validate
the model. Common splits are 70/30 or 80/20.
• Evaluation: Use metrics like accuracy, precision, recall, RMSE (Root Mean
Square Error), or AUC (Area Under the Curve) to assess model performance.
• Deployment: Integrate the model into business processes (e.g., embedding churn
predictions into a CRM system).
• Monitoring and Updating: Continuously monitor model performance and retrain
with new data to maintain accuracy.
Machine learning for predictive data analytics
Machine learning plays a foundational role in predictive data analytics by utilizing
algorithms and statistical techniques to analyze historical data and forecast future
outcomes. This approach is widely used across various industries to support decision-
making, identify trends, and drive business value.
How Machine Learning Enables Predictive Data Analytics
• Data Collection & Preparation: The process begins with gathering relevant, high-
quality data, often from multiple sources. This data is then cleansed and
preprocessed to handle missing values, outliers, and inconsistencies, ensuring
models are built on accurate information.
• Model Development: Data scientists use machine learning algorithms to develop
predictive models tailored to the problem at hand. Commonly used algorithms
include decision trees, regression models (linear and logistic), random forests,
neural networks, and ensemble models such as Gradient Boosted Machines
(GBM) and XGBoost.
• Types of Predictions:
• Classification: Predicts discrete outcomes, such as whether an email is
spam or not (using algorithms like decision trees, SVM, or random forests).
• Regression: Predicts continuous values, such as sales forecasts or risk
scores (using linear regression or time series analysis like ARIMA or
Prophet).
• Validation & Deployment: Models are validated for accuracy using techniques
like cross-validation. Once reliable, they are deployed into business systems to
deliver predictions in real time or over specific intervals.
• Continuous Learning: Predictive models improve over time as they ingest new
data, continuously optimizing for higher accuracy and adaptability in changing
environments
Common Machine Learning Algorithms in Predictive Analytics
Algorithm/Technique Purpose Example Use Case
Decision Trees Classification/Regression Churn prediction
Random Forest Ensemble Fraud detection
Classification/Regression
Logistic Regression Classification Credit risk assessment
Neural Networks Nonlinear/pattern recognition Image recognition, demand
forecasting
Time Series Trend/sequence forecasting Sales or stock price
(ARIMA) prediction
XGBoost/GBM Advanced ensemble models Customer segmentation,
medical diagnosis
Classification vs. Regression Models
Aspect Classification Regression
Goal Predict discrete/categorical Predict continuous/numerical values
outcomes
Examples Is this email spam or not? What will next month’s sales amount
be?
Typical Decision trees, random Linear regression, ridge/lasso
Algorithms forests, logistic regression, regression, decision trees, random
support vector machines forests, neural networks, ARIMA (for
(SVM), neural networks time series)
Outcome Finite set of classes/labels Any value within a range (e.g., 0–
Type (e.g., Yes/No, categories) 100, -∞ to ∞)
Performance Accuracy, precision, recall, Mean squared error (MSE), root
Metrics F1-score, ROC-AUC mean squared error (RMSE), mean
absolute error (MAE), R-squared
(R²)
Use Cases Fraud detection, customer Sales forecasting, price prediction,
segmentation, disease demand forecasting
diagnosis
ill-Posed Problems in Machine Learning
An ill-posed problem, in he conex o ML, reers o a problem ha violaes one or more o he
condions or being well-posed, as defined by Jacques Hadamard:
1. A soluon exiss.
2. The soluon is unique.
3. The soluon is sable (small changes in inpu lead o small changes in oupu).
In ML, ill-posed problems ofen arise when he daa or model seup leads o ambiguiy, non-
uniqueness, or insabiliy in soluons, making i hard o achieve reliable predicons.
Relevance o ML (Cancer Deecon Example)
In cancer deecon rom medical imaging:
• Exisence: A soluon may exis, bu he complexiy o medical images (e.g., suble umor
paerns) makes i hard o guaranee he model will find i.
• Uniqueness: Mulple models or parameer setngs migh produce similar resuls, bu i’s
unclear which is opmal. For insance, differen CNN archiecures may deec umors
wih comparable accuracy bu ocus on differen image eaures.
• Sabiliy: Small changes in inpu images (e.g., noise, variaons in imaging equipmen, or
paen demographics) can lead o drascally differen predicons i he model isn’
robus.
Causes of Ill-Posed Problems in ML
1. Insufficient Data:
o Small or non-representative datasets fail to capture the full variability of the
problem. For instance, a dataset of 1,000 mammograms may not include enough
cases of rare tumor types, leading to incomplete learning.
o Lack of diversity (e.g., images from a single demographic) can cause the model to
miss generalizable patterns.
2. High Dimensionality:
o ML problems often involve high-dimensional data (e.g., medical images with
millions of pixels). This increases the risk of overfitting or finding multiple
solutions that fit the data equally well.
o In cancer detection, the high number of features (pixels) compared to the number
of samples can make the problem underdetermined.
3. Noisy or Ambiguous Data:
o Real-world data often contains noise (e.g., artifacts in medical images from
equipment) or ambiguous patterns (e.g., benign and malignant tissues with similar
appearances), making it hard to define a unique solution.
o For example, subtle differences between cancerous and non-cancerous regions
may be indistinguishable without additional context.
4. Ill-Defined Problem Formulation:
o Poorly defined objectives or labels can exacerbate ill-posedness. For instance, if
"cancer" labels in a dataset are inconsistent due to human error, the model may
learn incorrect patterns.
o In regression tasks (e.g., predicting tumor size), the relationship between inputs
and outputs may be inherently ambiguous due to biological variability.
5. Non-Linear and Complex Relationships:
o Many ML problems involve complex, non-linear relationships that are difficult to
model accurately. In cancer detection, the relationship between pixel patterns and
malignancy is highly non-linear, increasing the risk of instability.
Overfitng in Machine Learning
Overfitng is a crical challenge in machine learning (ML) where a model learns he raining daa
oo well, including is noise and ouliers, resulng in poor perormance on new, unseen daa. This
leads o a model ha is overly complex and ails o generalize o real-world scenarios
Relevance o ML (Cancer Deecon Example)
In cancer deecon:
• A CNN migh memorize specific pixel paerns in he raining mammograms, including
irrelevan deails like imaging aracs or paen-specific noise.
• When esed on new images (e.g., rom a differen hospial or machine), he model may
ail o correcly classiy umors, leading o low sensiviy or specificiy.
Causes of Overfitng
1. Complex Models:
o Models wih oo many parameers (e.g., deep neural neworks wih millions o
weighs) can fi he raining daa excessively, capuring noise insead o general
paerns.
o Example: A CNN wih 50 layers migh overfi a small daase o 1,000
mammograms by learning irrelevan pixel paerns.
2. Insufficien Daa:
o Small or non-diverse daases limi he model’s abiliy o learn generalizable
paerns.
o Example: I a cancer deecon daase has ew malignan cases or is sourced rom
one hospial, he model may overfi o specific imaging condions.
3. Noisy Daa:
o Real-world daa ofen conains noise (e.g., imaging aracs in medical scans or
mislabeled daa), which he model may misakenly rea as meaningul.
o Example: A CNN migh learn o associae scanner-specific noise wih cancer,
leading o alse posives on cleaner images.
4. Lack of Regularizaon:
o Wihou consrains, models can become overly flexible, fitng he raining daa
oo closely.
o Example: A CNN wihou dropou or weigh penales may priorize irrelevan
eaures in mammograms.
5. Imbalanced Daa:
o I he daase is skewed (e.g., 90% benign and 10% malignan cases), he model
may overfi o he majoriy class, neglecng minoriy paerns.
o Example: A model migh achieve high accuracy by predicng mos cases as benign,
missing crical malignan cases.
6. Overraining:
o Training a model or oo many epochs can cause i o memorize he raining daa
raher han learn general paerns.
o Example: Afer 50 epochs, a CNN migh sar fitng noise in mammograms,
reducing is abiliy o generalize.
Implicaons of Overfitng
• Poor Generalizaon: The model perorms well on raining daa bu poorly on new daa,
liming is praccal uliy.
o Example: A cancer deecon model wih 95% raining accuracy bu 70% es
accuracy ails o reliably deec umors in clinical setngs.
• Reduced Trus: In applicaons like healhcare, overfitng can lead o alse posives or
negaves, eroding confidence in he model.
• Resource Wase: Overfied models require more compuaonal resources and me o
rain, wih diminishing reurns on perormance.
Soluons o Migae Overfitng
1. Regularizaon:
o Adds consrains o reduce model complexiy and preven overfitng o noise.
o Techniques:
▪ L1/L2 Regularizaon: Penalizes large weighs in he model (e.g., adding a
erm o he loss uncon o discourage overly complex CNNs).
▪ Dropou: Randomly deacvaes a racon o neurons during raining,
orcing he model o learn robus eaures.
o Example: Applying dropou (e.g., 50% neuron dropou rae) o a CNN or cancer
deecon prevens reliance on specific pixel paerns.
2. Daa Augmenaon:
o Increases daase size and diversiy by generang synhec variaons o he
raining daa.
o Example: For mammograms, augmenaons like roaon, flipping, zooming, or
adjusng brighness help he model generalize o varied imaging condions.
3. More Daa:
o Collecng larger, more diverse daases ensures he model learns represenave
paerns.
o Example: Including mammograms rom mulple hospials, demographics, and
imaging devices reduces he risk o overfitng o specific condions.
4. Cross-Validaon:
o Splis daa ino k-olds (e.g., 5-old cross-validaon) o evaluae model
perormance on mulple subses, ensuring robusness.
o Example: A cancer deecon model esed across five olds o diverse daa is less
likely o overfi han one rained on a single spli.
5. Early Sopping:
o Moniors validaon loss during raining and sops when i no longer improves,
prevenng he model rom memorizing raining daa.
o Example: Halng raining afer 10 epochs i validaon loss increases while raining
loss connues o drop.
6. Simpler Models:
o Using less complex archiecures reduces he risk o overfitng, especially wih
small daases.
o Example: Swiching rom a 50-layer CNN o a 10-layer CNN or using ranser
learning wih a pre-rained model like ResNe.
7. Transfer Learning:
o Fine-unes pre-rained models (e.g., rained on ImageNe) on specific asks,
leveraging general eaures o reduce overfitng.
o Example: Fine-uning a pre-rained ResNe on a small mammogram daase
improves generalizaon compared o raining rom scrach.
8. Daa Cleaning:
o Removing or correcng noisy or mislabeled daa improves he qualiy o he
raining se.
o Example: Veriying labels in a cancer daase o ensure malignan/benign cases are
accuraely annoaed.
9. Balanced Daases:
o Addressing class imbalance (e.g., using oversampling, undersampling, or synhec
daa generaon like SMOTE) ensures he model learns paerns rom all classes.
o Example: Generang synhec malignan mammograms o balance a daase wih
ew posive cases.
Sep-by-Sep Model wih Finance Example
1. Define Clear Objectives
Clearly specify the financial problem or goal.
Example: A bank aims to reduce loan default rates by 20% within the next year.
2. Collect Relevant Data
Gather data related to customers’ financial histories, credit scores, transaction
behaviors, and demographic information.
Example: Collect loan application data, repayment histories, credit bureau scores,
income details, and spending patterns.
3. Data Cleaning and Preparation
Handle missing or inconsistent entries, remove duplicates, and prepare features
relevant for modeling.
Example: Fill missing income entries, standardize loan types and payment
frequencies, and encode categorical variables such as employment status.
4. Data Analysis and Modeling
Develop machine learning models to predict the likelihood of loan default based
on historical patterns.
Example: Use logistic regression, random forests, or gradient boosting methods
to classify loan applicants by default risk.
5. Generate Insights
Analyze model outcomes to understand key predictors of default.
Example: Discover that applicants with unstable employment or high debt-to-
income ratios have significantly higher default probabilities.
6. Make Data-Driven Decisions
Use insights to adjust lending policies or design risk-based pricing models.
Example: Implement stricter credit criteria for high-risk profiles or offer tailored
loan terms with higher interest rates to mitigate risk.
7. Implement and Monitor Results
Track default rates post-implementation and continuously update models with new
loan performance data.
Example: Monitor monthly default trends and refine predictive models to adapt to
changing economic conditions.
Here is a deailed sep-by-sep daa-o-insighs-o-decisions model ocused on disease
predicon in healhcare using machine learning, aligned wih your reques or a disease-relaed
example:
Sep-by-Sep Model or Disease Predicon
1. Define Clear Objecves
Se a specific healhcare goal.
Example: Predic he likelihood o paens developing diabees wihin he nex year o
enable early inervenon.
2. Collec Relevan Daa
Gaher comprehensive paen daa including medical hisory, sympoms, lab es resuls,
demographic inormaon, and liesyle acors.
Example: Elecronic Healh Records (EHR), blood glucose levels, BMI, age, amily hisory
o diabees, physical acviy daa.
3. Daa Cleaning and Preparaon
Handle missing values, remove erroneous daa, encode caegorical variables, and
normalize eaures or consisency.
Example: Fill missing lab values using sascal mehods; encode gender and race
caegories numerically.
4. Feaure Selecon and Engineering
Ideny and ransorm he mos relevan eaures ha correlae srongly wih he disease
oucome.
Example: Creae new eaures such as average blood sugar over me or change in BMI.
5. Daa Splitng
Spli he daase ino raining and esng subses o validae model perormance
accuraely.
6. Model Selecon and Training
Choose appropriae machine learning algorihms (e.g., logisc regression, random ores,
suppor vecor machines) and rain hem on he prepared daase.
Example: Train a random ores classifier o predic diabees onse.
7. Model Evaluaon
Evaluae using merics like accuracy, precision, recall, F1-score, and ROC-AUC o ensure
reliable predicons.
8. Generae Insighs
Analyze which eaures conribue mos o predicons and inerpre he model oupus o
undersand risk acors.
9. Make Daa-Driven Decisions
Use predicons o inorm clinical decisions such as priorizing prevenve care or paen
educaon or high-risk individuals.
10. Implemen and Monior
Deploy he model in clinical workflows and connuously monior predicon accuracy and
paen oucomes or ongoing improvemen.
Specific Example: Diabees Predicon
• Objecve: Reduce new diabees cases by early deecon.
• Daa: Age, BMI, blood glucose, amily hisory, acviy levels, die, prior diagnoses.
• Algorihm: Random Fores classifier rained on hisorical paen daa.
• Oucome: Ideny paens wih a high risk score or developing diabees, enabling
proacve liesyle inervenons and medical monioring.
This model has been demonsraed in mulple sudies o achieve high accuracy (ofen above
80%), supporng early and effecve disease managemen. Addionally, combining predicons
rom models like Random Fores, Suppor Vecor Machines, and Naive Bayes hrough ensemble
approaches can improve overall accuracy
How o Wrie Beauful Pyhon Code wih PEP 8
hps://peps.pyhon.org/pep-0008/#inroducon
hps://realpyhon.com/pyhon-pep8/#why-we-need-pep-8