0% found this document useful (0 votes)
21 views2 pages

Script

The document discusses data preprocessing, feature selection, and model evaluation in a dataset of 41 medical conditions. It highlights the use of Recursive Feature Elimination and Mutual Information for feature selection, along with visualizations like correlation heatmaps and feature importance rankings. The comparison of algorithms shows that Random Forest achieved the highest accuracy among the tested models.

Uploaded by

sua43668
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views2 pages

Script

The document discusses data preprocessing, feature selection, and model evaluation in a dataset of 41 medical conditions. It highlights the use of Recursive Feature Elimination and Mutual Information for feature selection, along with visualizations like correlation heatmaps and feature importance rankings. The comparison of algorithms shows that Random Forest achieved the highest accuracy among the tested models.

Uploaded by

sua43668
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Slide 15: Data Preprocessing

This bar graph illustrates the distribution of diseases in our dataset. Each bar represents a
distinct disease, with equal frequencies across 41 medical conditions. The balanced dataset
minimizes bias, ensuring fair and accurate predictions across all classes.

Slide 16: Feature Selection

We employed two methods for feature selection:

●​ Recursive Feature Elimination (RFE): Identifies the most impactful features by


iteratively removing less significant ones.
●​ Mutual Information: Measures the dependency between features and the target
variable, selecting the most predictive attributes.

Slide 16 (continued): Correlation Heatmap

The heatmap visualizes feature relationships:

●​ Diagonal (Red): Perfect self-correlation (value = 1).


●​ Color Scale:
○​ Red: Strong positive correlation.
○​ Blue: Strong negative correlation.
○​ White: Little to no correlation.​
Highly correlated features aid feature selection and improve model
performance.

Slide 17: Feature Importance Visualization

This bar chart ranks features by their importance scores:

●​ Top Feature: Fatigue has the highest influence on predictions.


●​ Other impactful features include joint_pain, headache, and high_fever.
●​ Importance scores decrease down the list, highlighting diminishing contributions.

Slide 19: Top 50 Features

This bar chart displays the top 50 important features based on importance scores:

●​ Top Feature: Muscle_pain is the most influential predictor.


●​ Other key features: Itching, altered_sensorium, dark_urine, and high_fever.
●​ Features were ranked using importance metrics derived from tree-based models,
ensuring a balance between performance and complexity.

Slide 20: Confusion Matrices

Confusion matrices compare Random Forest (left) and Gradient Boosting (right) models:

●​ Axes: Predicted labels (x-axis) vs. actual labels (y-axis).


●​ Diagonal Values: Correct predictions dominate, indicating high accuracy.

Slide 22: Algorithm Comparison

We tested four algorithms:

●​ Random Forest, Gradient Boosting, Support Vector Classifier, and K-Nearest


Neighbors.
●​ Result: Random Forest delivered the highest accuracy, which will be detailed in the
results section.

You might also like