0% found this document useful (0 votes)

26 views11 pages

Task 2

The document outlines a task to build a diabetes prediction model using logistic regression, focusing on data preprocessing, feature engineering, and model evaluation. It details the required software, dataset features, and a step-by-step procedure for data handling, model training, and performance evaluation. The provided Python code demonstrates the implementation of these steps using libraries such as Pandas and Scikit-learn.

Uploaded by

Subramanian R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views11 pages

Task 2

Uploaded by

Subramanian R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

TASK : 02

Diabetes Predic on Model using Logis c Regression

Objec ve:

 To understand and implement data preprocessing techniques for a medical dataset.

 To perform feature engineering for improved model performance.

 To build and train a Logis c Regression model for diabetes predic on.

 To evaluate the performance of the trained model using appropriate metrics.

Tools/So ware Required:

 Python 3.x

 Pandas

 Scikit-learn (sklearn)

 NumPy (for numerical opera ons)

 Anaconda (recommended for environment management) or Google Colab

Dataset:

A sample diabetes dataset is provided in the code. This dataset includes the following
features:

 Pregnancies: Number of pregnancies.

 Glucose: Glucose level.

 BloodPressure: Blood pressure.

 SkinThickness: Skin thickness.

 Insulin: Insulin level.

 BMI: Body mass index.

 DiabetesPedigreeFunc on: 1 Diabetes pedigree func on. 2

 Age: Age.

 Outcome: Diabetes status (1: Diabetes, 0: No Diabetes).

Procedure:

1. Data Preprocessing:

1. Load the Data:

o Create a Pandas DataFrame from the provided sample diabetes dataset.

o Print the original DataFrame to inspect the raw data.

2. Handle Missing Values:

o Iden fy and handle missing values in the 'BMI' column using SimpleImputer
with the mean strategy.

o Print the DataFrame a er imputa on to verify the changes.

3. Scale Numerical Features:

o Scale the numerical features ('Pregnancies', 'Glucose', 'BloodPressure',

'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunc on', 'Age') using
StandardScaler.

o Print the DataFrame a er scaling to observe the transformed data.

2. Feature Engineering:

1. Create Interac on Feature:

o Create a new feature 'Glucose_BMI' by mul plying the 'Glucose' and 'BMI'
columns.

o Print the DataFrame a er adding the new feature.

3. Machine Learning Model Building and Evalua on:

1. Prepare Data for Model:

o Deﬁne the features (X) by dropping the 'Outcome' column.

o Deﬁne the target variable (y) as the 'Outcome' column.

2. Split Data:

o Use train_test_split to split the dataset into training and tes ng sets (80%
training, 20% tes ng) with random_state=42 for reproducibility.

o Print the shapes and contents of the training and tes ng sets.

3. Train the Model:

o Create a Logis cRegression model.

o Train the model using the training data (X_train, y_train).

4. Make Predic ons:

o Use the trained model to make predic ons on the tes ng data (X_test).
o Print the predic ons (y_pred).

5. Evaluate the Model:

o Calculate the accuracy of the model using accuracy_score.

o Generate a classiﬁca on report using classiﬁca on_report to assess precision,

recall, and F1-score.

o Print the accuracy and the classiﬁca on report.

Program :

Python

import pandas as pd

import numpy as np

from sklearn.model_selec on import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.impute import SimpleImputer

from sklearn.linear_model import Logis cRegression

from sklearn.metrics import accuracy_score, classiﬁca on_report

# 1. Sample Diabetes Dataset

data = {

'Pregnancies': [6, 1, 8, 1, 0, 5, 3, 10, 2, 4],

'Glucose': [148, 85, 183, 89, 137, 116, 78, 115, 197, 125],

'BloodPressure': [72, 66, 64, 66, 40, 74, 50, 0, 70, 96],

'SkinThickness': [35, 29, 0, 23, 35, 0, 32, 0, 45, 0],

'Insulin': [0, 0, 0, 94, 168, 0, 88, 0, 543, 0],

'BMI': [33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31.0, np.nan, 30.5, 0.0],

'DiabetesPedigreeFunc on': [0.627, 0.351, 0.672, 0.167, 2.288, 0.201, 0.248, 0.134, 0.158,
0.177],

'Age': [50, 31, 32, 21, 33, 30, 26, 29, 53, 41],

'Outcome': [1, 0, 1, 0, 1, 0, 0, 1, 1, 0]

}
df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# 2. Data Preprocessing

imputer = SimpleImputer(strategy='mean')

df['BMI'] = imputer.ﬁt_transform(df[['BMI']])

print("\nDataFrame a er Impu ng BMI:\n", df)

scaler = StandardScaler()

numerical_cols = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI',

'DiabetesPedigreeFunc on', 'Age']

df[numerical_cols] = scaler.ﬁt_transform(df[numerical_cols])

print("\nDataFrame a er Scaling:\n", df)

# 3. Feature Engineering

df['Glucose_BMI'] = df['Glucose'] * df['BMI']

print("\nDataFrame a er Feature Engineering:\n", df)

# 4. Model Building and Evalua on

X = df.drop('Outcome', axis=1)

y = df['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("\nTraining Data (X_train):\n", X_train)

print("\nTes ng Data (X_test):\n", X_test)

print("\nTraining Labels (y_train):\n", y_train)

print("\nTes ng Labels (y_test):\n", y_test)

model = Logis cRegression(random_state=42)

model.ﬁt(X_train, y_train)

y_pred = model.predict(X_test)

print("\nPredic ons (y_pred):\n", y_pred)

accuracy = accuracy_score(y_test, y_pred)

report = classiﬁca on_report(y_test, y_pred)

print("\nAccuracy:", accuracy)

print("\nClassiﬁca on Report:\n", report)

Output :

Original DataFrame:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

5 5 116 74 0 0 25.6

6 3 78 50 32 88 31.0

7 10 115 0 0 0 NaN

8 2 197 70 45 543 30.5

9 4 125 96 0 0 0.0
DiabetesPedigreeFunc on Age Outcome

0 0.627 50 1

1 0.351 31 0

2 0.672 32 1

3 0.167 21 0

4 2.288 33 1

5 0.201 30 0

6 0.248 26 0

7 0.134 29 1

8 0.158 53 1

9 0.177 41 0

DataFrame a er Impu ng BMI:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \

0 6 148 72 35 0 33.600000

1 1 85 66 29 0 26.600000

2 8 183 64 0 0 23.300000

3 1 89 66 23 94 28.100000

4 0 137 40 35 168 43.100000

5 5 116 74 0 0 25.600000

6 3 78 50 32 88 31.000000

7 10 115 0 0 0 26.866667

8 2 197 70 45 543 30.500000

9 4 125 96 0 0 0.000000
DiabetesPedigreeFunc on Age Outcome

0 0.627 50 1

1 0.351 31 0

2 0.672 32 1

3 0.167 21 0

4 2.288 33 1

5 0.201 30 0

6 0.248 26 0

7 0.134 29 1

8 0.158 53 1

9 0.177 41 0

DataFrame a er Scaling:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \

0 0.645497 0.544471 0.501265 0.885345 -0.553913 0.648853

1 -0.968246 -1.112615 0.254741 0.533552 -0.553913 -0.025697

2 1.290994 1.465074 0.172566 -1.166779 -0.553913 -0.343699

3 -0.968246 -1.007403 0.254741 0.181760 0.029153 0.118849

4 -1.290994 0.255139 -0.813528 0.885345 0.488163 1.564314

5 0.322749 -0.297223 0.583439 -1.166779 -0.553913 -0.122061

6 -0.322749 -1.296735 -0.402655 0.709449 -0.008064 0.398306

7 1.936492 -0.323526 -2.457018 -1.166779 -0.553913 0.000000

8 -0.645497 1.833316 0.419090 1.471666 2.814225 0.350124

9 0.000000 -0.060497 1.487359 -1.166779 -0.553913 -2.588989

DiabetesPedigreeFunc on Age Outcome

0 0.200095 1.579674 1

1 -0.242777 -0.369274 0

2 0.272302 -0.266698 1

3 -0.538025 -1.395037 0

4 2.865348 -0.164122 1

5 -0.483468 -0.471851 0

6 -0.408052 -0.882156 0

7 -0.590977 -0.574427 1

8 -0.552466 1.887403 1

9 -0.521979 0.656488 0

DataFrame a er Feature Engineering:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \

0 0.645497 0.544471 0.501265 0.885345 -0.553913 0.648853

1 -0.968246 -1.112615 0.254741 0.533552 -0.553913 -0.025697

2 1.290994 1.465074 0.172566 -1.166779 -0.553913 -0.343699

3 -0.968246 -1.007403 0.254741 0.181760 0.029153 0.118849

4 -1.290994 0.255139 -0.813528 0.885345 0.488163 1.564314

5 0.322749 -0.297223 0.583439 -1.166779 -0.553913 -0.122061

6 -0.322749 -1.296735 -0.402655 0.709449 -0.008064 0.398306

7 1.936492 -0.323526 -2.457018 -1.166779 -0.553913 0.000000

8 -0.645497 1.833316 0.419090 1.471666 2.814225 0.350124

9 0.000000 -0.060497 1.487359 -1.166779 -0.553913 -2.588989

DiabetesPedigreeFunc on Age Outcome Glucose_BMI

0 0.200095 1.579674 1 0.353282

1 -0.242777 -0.369274 0 0.028591

2 0.272302 -0.266698 1 -0.503545

3 -0.538025 -1.395037 0 -0.119729

4 2.865348 -0.164122 1 0.399117

5 -0.483468 -0.471851 0 0.036280

6 -0.408052 -0.882156 0 -0.516497

7 -0.590977 -0.574427 1 -0.000000

8 -0.552466 1.887403 1 0.641887

9 -0.521979 0.656488 0 0.156625

Training Data (X_train):

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \

5 0.322749 -0.297223 0.583439 -1.166779 -0.553913 -0.122061

0 0.645497 0.544471 0.501265 0.885345 -0.553913 0.648853

7 1.936492 -0.323526 -2.457018 -1.166779 -0.553913 0.000000

2 1.290994 1.465074 0.172566 -1.166779 -0.553913 -0.343699

9 0.000000 -0.060497 1.487359 -1.166779 -0.553913 -2.588989

4 -1.290994 0.255139 -0.813528 0.885345 0.488163 1.564314

3 -0.968246 -1.007403 0.254741 0.181760 0.029153 0.118849

6 -0.322749 -1.296735 -0.402655 0.709449 -0.008064 0.398306

DiabetesPedigreeFunc on Age Glucose_BMI

5 -0.483468 -0.471851 0.036280

0 0.200095 1.579674 0.353282

7 -0.590977 -0.574427 -0.000000

2 0.272302 -0.266698 -0.503545

9 -0.521979 0.656488 0.156625

4 2.865348 -0.164122 0.399117

3 -0.538025 -1.395037 -0.119729

6 -0.408052 -0.882156 -0.516497

Tes ng Data (X_test):

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \

8 -0.645497 1.833316 0.419090 1.471666 2.814225 0.350124

1 -0.968246 -1.112615 0.254741 0.533552 -0.553913 -0.025697

DiabetesPedigreeFunc on Age Glucose_BMI

8 -0.552466 1.887403 0.641887

1 -0.242777 -0.369274 0.028591

Training Labels (y_train):

5 0

0 1

7 1

2 1

9 0

4 1

3 0

6 0

Name: Outcome, dtype: int64

Tes ng Labels (y_test):

8 1
1 0

Name: Outcome, dtype: int64

Predic ons (y_pred):

[1 0]

Accuracy: 1.0

Results:

 Record the original DataFrame.

 Record the DataFrame a er imputa on and scaling.

 Record the DataFrame a er feature engineering.

 Record the shapes and content of the training and tes ng sets.

 Record the predic ons made by the model.

 Record the accuracy and classiﬁca on report of the model.

Diabetes Dataset Analysis & Prep
No ratings yet
Diabetes Dataset Analysis & Prep
11 pages
Heart Diseases EDA
No ratings yet
Heart Diseases EDA
1 page
Tukey IQR Outlier Detection Guide
No ratings yet
Tukey IQR Outlier Detection Guide
35 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Classificação Binária de Diabetes - Rafael Costa Viana - Ipynb - Colab
No ratings yet
Classificação Binária de Diabetes - Rafael Costa Viana - Ipynb - Colab
4 pages
Labpg3.ipynb - Colab
No ratings yet
Labpg3.ipynb - Colab
2 pages
Diabetes Data Analysis & Outlier Removal
No ratings yet
Diabetes Data Analysis & Outlier Removal
16 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
WWF
No ratings yet
WWF
268 pages
Python 2025
No ratings yet
Python 2025
25 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Predicting Heart Disease Using ML
No ratings yet
Predicting Heart Disease Using ML
57 pages
Practical 1
No ratings yet
Practical 1
26 pages
Week 13 1-Pandas
No ratings yet
Week 13 1-Pandas
10 pages
Tarea Inferencial1
No ratings yet
Tarea Inferencial1
4 pages
Road Construction Volume Data
No ratings yet
Road Construction Volume Data
3 pages
PMP Premium Table Region A en TC 01oct19
No ratings yet
PMP Premium Table Region A en TC 01oct19
8 pages
SystemData 033
No ratings yet
SystemData 033
2 pages
JC PP
No ratings yet
JC PP
7 pages
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
No ratings yet
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
2 pages
Book 1
No ratings yet
Book 1
5,224 pages
Final Result
No ratings yet
Final Result
353 pages
Untitled Text2
No ratings yet
Untitled Text2
3 pages
API DoE RSM Extra Info
No ratings yet
API DoE RSM Extra Info
6 pages
Result
No ratings yet
Result
2 pages
Ml1.ipynb - Colaboratory
No ratings yet
Ml1.ipynb - Colaboratory
5 pages
Dsbda 3a
No ratings yet
Dsbda 3a
11 pages
Financial Analysis Overview
No ratings yet
Financial Analysis Overview
20 pages
Customer - Churn - Data Q2
No ratings yet
Customer - Churn - Data Q2
3 pages
Tablas Variables
No ratings yet
Tablas Variables
4 pages
Statistical Analysis for Agriculture
No ratings yet
Statistical Analysis for Agriculture
24 pages
MeOH 25 Aug
No ratings yet
MeOH 25 Aug
16 pages
Goal Seek-9
No ratings yet
Goal Seek-9
4 pages
Simple Exponential Smoothing
No ratings yet
Simple Exponential Smoothing
17 pages
Database Fe2o3 Gamma
No ratings yet
Database Fe2o3 Gamma
3 pages
Multiple Regression Analysis in SAS
0% (1)
Multiple Regression Analysis in SAS
18 pages
Ishita Srivastava ML - Lab - Logistic
No ratings yet
Ishita Srivastava ML - Lab - Logistic
2 pages
19 Out
No ratings yet
19 Out
32 pages
Dice Experiment Half Life
No ratings yet
Dice Experiment Half Life
1 page
8 ACES - PO Breakdown 3.12.2024 1
No ratings yet
8 ACES - PO Breakdown 3.12.2024 1
342 pages
150 Fecc 4
No ratings yet
150 Fecc 4
4 pages
Report AHS Januari-Maret 2 2024
No ratings yet
Report AHS Januari-Maret 2 2024
21 pages
VHLSS 2018 HN 2
No ratings yet
VHLSS 2018 HN 2
11 pages
Motor Specs
No ratings yet
Motor Specs
1 page
Currency Conversion for Argentina
0% (1)
Currency Conversion for Argentina
33 pages
Peumusan (Update)
No ratings yet
Peumusan (Update)
14 pages
Authorised Capital AcitivityDescription Active 1
No ratings yet
Authorised Capital AcitivityDescription Active 1
6 pages
Chapter 2 Multiple Regression 2
No ratings yet
Chapter 2 Multiple Regression 2
6 pages
MicroMo DC-Micromotors Specs
No ratings yet
MicroMo DC-Micromotors Specs
1 page
Tugas-Pak-Puput-1 2
No ratings yet
Tugas-Pak-Puput-1 2
17 pages
Figure 1, FOPT For ROCK & PSEUDO Models
No ratings yet
Figure 1, FOPT For ROCK & PSEUDO Models
11 pages
Documents Downloader
No ratings yet
Documents Downloader
16 pages
Pima Indian Diabetes Prediction
No ratings yet
Pima Indian Diabetes Prediction
22 pages
Lab3 (Main) .Ipynb - Colab
No ratings yet
Lab3 (Main) .Ipynb - Colab
1 page
Moisture Content Testing Results
No ratings yet
Moisture Content Testing Results
1 page
Name and Formula: Natl. Bur. Stand. (U.S.) Monogr. 25, 18, 59, (1981)
No ratings yet
Name and Formula: Natl. Bur. Stand. (U.S.) Monogr. 25, 18, 59, (1981)
3 pages
CPDScardno 044-0141
100% (1)
CPDScardno 044-0141
3 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
6 pages
Data Analysis for Statisticians
No ratings yet
Data Analysis for Statisticians
10 pages
XLi EDGE SmartGuard Manual
No ratings yet
XLi EDGE SmartGuard Manual
66 pages
IQaudIO Product Guide
No ratings yet
IQaudIO Product Guide
44 pages
Database Utilization for Accountants
No ratings yet
Database Utilization for Accountants
24 pages
Cordero 16.4.3
No ratings yet
Cordero 16.4.3
7 pages
Venkat Incident
No ratings yet
Venkat Incident
129 pages
Cb3402 Unit 1 Notes
No ratings yet
Cb3402 Unit 1 Notes
43 pages
Anton Paar Ultrapyc - Instruction - Manual
No ratings yet
Anton Paar Ultrapyc - Instruction - Manual
40 pages
VWC2-M4 Series Video Wall Controllers
No ratings yet
VWC2-M4 Series Video Wall Controllers
2 pages
Abstract Writing for Academics
No ratings yet
Abstract Writing for Academics
3 pages
Ocenaudio
No ratings yet
Ocenaudio
1 page
Statistical Measure Provider Template
No ratings yet
Statistical Measure Provider Template
9 pages
05 Navis5100
No ratings yet
05 Navis5100
2 pages
1 WE6Paper AnalyzingOn ChipSupplyNoise
No ratings yet
1 WE6Paper AnalyzingOn ChipSupplyNoise
20 pages
FAS8300 and FAS8700 - Replacing The Controller Module
No ratings yet
FAS8300 and FAS8700 - Replacing The Controller Module
21 pages
Querying JSON With LINQ
No ratings yet
Querying JSON With LINQ
3 pages
SQL Server DBA with 4.5 Years Experience
No ratings yet
SQL Server DBA with 4.5 Years Experience
3 pages
Cloud Migration for IBA Bank
No ratings yet
Cloud Migration for IBA Bank
19 pages
ILM-DS User Manual
No ratings yet
ILM-DS User Manual
40 pages
Kabutihang Panlahat Learning Plan
No ratings yet
Kabutihang Panlahat Learning Plan
9 pages
Lab 10 Java - 2k20
No ratings yet
Lab 10 Java - 2k20
14 pages
JK GK MCQ
No ratings yet
JK GK MCQ
131 pages
UNV NVR302-16B-P16-IQ Network Video Recorder
No ratings yet
UNV NVR302-16B-P16-IQ Network Video Recorder
5 pages
1738320862
No ratings yet
1738320862
7 pages
SQL Cheatsheet Zero To Mastery V1.01 PDF
No ratings yet
SQL Cheatsheet Zero To Mastery V1.01 PDF
20 pages
Modern Taylor Series Method for ODEs
No ratings yet
Modern Taylor Series Method for ODEs
11 pages
Core Processors 14th Gen Nex Infographic
No ratings yet
Core Processors 14th Gen Nex Infographic
1 page
7.1 Functions, 7.2 Creating Functions & 7.3 Parameters Guided Notes
No ratings yet
7.1 Functions, 7.2 Creating Functions & 7.3 Parameters Guided Notes
3 pages
Email Spam Detection with ML Techniques
No ratings yet
Email Spam Detection with ML Techniques
8 pages
Google Camera - GCam APK 8.9 Download 2023 (All Phones)
No ratings yet
Google Camera - GCam APK 8.9 Download 2023 (All Phones)
27 pages
Lopez, Frank Noah O. CIS
No ratings yet
Lopez, Frank Noah O. CIS
3 pages