0% found this document useful (0 votes)
21 views4 pages

Practical 5 Missing Values

Practical_5_Missing_Values : DSV

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views4 pages

Practical 5 Missing Values

Practical_5_Missing_Values : DSV

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

DATA SCIENCE AND

VISUALIZATION
12202080501060
202046707

Practical 5:
Implement a method to handle missing values for gender and marks. (Using Practical 3
dataset)

Introduction:
In real-world datasets, missing values are very common and need to be handled properly
before performing any analysis or machine learning tasks. Missing values in categorical
variables such as Gender can be handled using the Mode, while missing values in
numerical variables such as Marks can be imputed using measures like Mean, Median, or
Mode. In this practical, we use the mean for handling missing marks and the mode for
handling missing gender values.

# Import required libraries


import numpy as np
import pandas as pd
import sklearn

# Load dataset

df = pd.read_csv('/content/drive/MyDrive/DSV /Dataset_(12202080501060)/

student_dataset_with_missing_values.c’)

df.info()

GCET
17
DATA SCIENCE AND
VISUALIZATION
12202080501060
202046707

x = df.iloc[:, :-1].values

y = df.iloc[:,3].values

# Handle missing values in marks using Mean df['Sem1_Math'] =


df['Sem1_Math'].fillna(df['Sem1_Math'].mean()) df['Sem1_Science'] =
df['Sem1_Science'].fillna(df['Sem1_Science'].mean())
df['Sem1_English'] =
df['Sem1_English'].fillna(df['Sem1_English'].mean())
df['Sem1_History'] =
df['Sem1_History'].fillna(df['Sem1_History'].mean()) df['Sem1_CS'] =
df['Sem1_CS'].fillna(df['Sem1_CS'].mean()) df['Sem2_Math'] =
df['Sem2_Math'].fillna(df['Sem2_Math'].mean()) df['Sem2_Science'] =
df['Sem2_Science'].fillna(df['Sem2_Science'].mean())
df['Sem2_English'] =
df['Sem2_English'].fillna(df['Sem2_English'].mean())
df['Sem2_History'] =
df['Sem2_History'].fillna(df['Sem2_History'].mean()) df['Sem2_CS'] =
df['Sem2_CS'].fillna(df['Sem2_CS'].mean()) df['Sem3_Math'] =
df['Sem3_Math'].fillna(df['Sem3_Math'].mean()) df['Sem3_Science'] =
df['Sem3_Science'].fillna(df['Sem3_Science'].mean())
df['Sem3_English'] =
df['Sem3_English'].fillna(df['Sem3_English'].mean())
df['Sem3_History'] =
GCET
18
DATA SCIENCE AND
VISUALIZATION
12202080501060
202046707

df['Sem3_History'].fillna(df['Sem3_History'].mean()) df['Sem3_CS'] =
df['Sem3_CS'].fillna(df['Sem3_CS'].mean()) df['Sem4_Math'] =
df['Sem4_Math'].fillna(df['Sem4_Math'].mean()) df['Sem4_Science'] =
df['Sem4_Science'].fillna(df['Sem4_Science'].mean())
df['Sem4_English'] =
df['Sem4_English'].fillna(df['Sem4_English'].mean())
df['Sem4_History'] =
df['Sem4_History'].fillna(df['Sem4_History'].mean()) df['Sem4_CS'] =
df['Sem4_CS'].fillna(df['Sem4_CS'].mean())

# Handle missing values in Gender using Mode


df['Gender'] = df['Gender'].fillna(df['Gender'].mode()[0])

df

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

y = df.iloc[:,3].values

y_reshaped = y.reshape(-1, 1)

imputer.fit(y_reshaped)

y_imputed = imputer.transform(y_reshaped)

y_imputed

GCET
19
DATA SCIENCE AND
VISUALIZATION
12202080501060
202046707

Important Points:
- Missing values can introduce bias or reduce the quality of analysis.
- For numerical data (marks), mean imputation ensures that overall distribution is less
disturbed. - For categorical data (gender), mode imputation is preferred as it maintains
majority class consistency.
- Sklearn and Pandas provide multiple imputation techniques.

Conclusion:
In this practical, missing values in the student dataset were successfully handled. We used
mean imputation for numerical marks and mode imputation for categorical gender data.
Handling missing values is an essential preprocessing step to ensure reliable and accurate
analysis in data science and machine learning.
GCET
20

You might also like