0% found this document useful (0 votes)

14 views3 pages

Untitled Document 5

Uploaded by

saurabhsin6294

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views3 pages

Untitled Document 5

Uploaded by

saurabhsin6294

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment 23

NIELIT

Handle missing values in a dataset by filling with mean, median, and dropping rows.Encode
categorical data using one-hot encoding and label encoding.Scale features using
standardization and normalization.Split the dataset into training and testing sets using an 80-20
split.Remove duplicate rows from a dataset.Rename columns in a dataset.Take a dataset for
above questions and write the code for the questions mentioned.ORUse below mentioned
dataset and write the code for the questions mentioned above.data = { 'ID': [1, 2, 3, 4, 5, 6, 7, 8,
9, 10], 'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red', 'Green', 'Red', np.nan, 'Blue', 'Green'], 'Size':
['S', 'M', 'L', 'XL', 'M', 'S', 'XL', 'L', 'M', 'S'], 'Height': [150, 160, 170, np.nan, 190, 180, 175, 165,
np.nan, 155], 'Weight': [60, 65, 70, 75, 80, 85, np.nan, 95, 90, np.nan], 'Age': [25, 30, 35, 40, 45,
np.nan, 55, 60, 65, 70], 'Feature1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'Feature2': [10, 20, 30, 40, 50,
60, 70, 80, 90, 100] }Convert to DataFramedf = pd.DataFrame(data)

Solution

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler,
MinMaxScaler
from sklearn.model_selection import train_test_split

# Sample dataset
data = {
'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red', 'Green', 'Red', np.nan, 'Blue', 'Green'],
'Size': ['S', 'M', 'L', 'XL', 'M', 'S', 'XL', 'L', 'M', 'S'],
'Height': [150, 160, 170, np.nan, 190, 180, 175, 165, np.nan, 155],
'Weight': [60, 65, 70, 75, 80, 85, np.nan, 95, 90, np.nan],
'Age': [25, 30, 35, 40, 45, np.nan, 55, 60, 65, 70],
'Feature1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Feature2': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
}

df = pd.DataFrame(data)

# 1. Handling missing values

# Filling with mean for numerical columns
df_mean_filled = df.fillna(df.mean())

# Filling with median for numerical columns

df_median_filled = df.fillna(df.median())
# Dropping rows with any missing values
df_dropna = df.dropna()

# 2. Encoding categorical data

# One-hot encoding
ohe = OneHotEncoder(sparse=False)
color_encoded = ohe.fit_transform(df[['Color']].fillna('Missing'))
color_encoded_df = pd.DataFrame(color_encoded,
columns=ohe.get_feature_names_out(['Color']))

# Label encoding
le = LabelEncoder()
size_encoded = le.fit_transform(df['Size'])
df['Size_LabelEncoded'] = size_encoded

# Merging one-hot encoded columns back to the dataframe

df = df.drop(columns=['Color']).join(color_encoded_df)

# 3. Scaling features
# Standardization
scaler_standard = StandardScaler()
df[['Height', 'Weight', 'Age']] = scaler_standard.fit_transform(df[['Height', 'Weight', 'Age']].fillna(0))

# Normalization
scaler_minmax = MinMaxScaler()
df[['Feature1', 'Feature2']] = scaler_minmax.fit_transform(df[['Feature1', 'Feature2']])

# 4. Splitting the dataset

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

# 5. Removing duplicate rows

df_no_duplicates = df.drop_duplicates()

# 6. Renaming columns
df.rename(columns={'Size_LabelEncoded': 'Size_Encoded'}, inplace=True)

Explanation:Handling Missing Values: The code demonstrates three approaches to handling

missing data: filling with the mean, filling with the median, and dropping rows.Encoding
Categorical Data: Categorical variables are encoded using both one-hot encoding and label
encoding. One-hot encoding is used for the 'Color' column, while label encoding is applied to the
'Size' column.Scaling Features: Numerical features are scaled using standardization and
normalization. StandardScaler is used to standardize 'Height', 'Weight', and 'Age'. MinMaxScaler
is used for normalizing 'Feature1' and 'Feature2'.Splitting the Dataset: The dataset is split into
training and testing sets using an 80-20 split.Removing Duplicate Rows: Any duplicate rows in
the dataset are removed.Renaming Columns: The code renames a column to reflect its
encoded nature.

# Display the final DataFrame

print(df)

Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Lab Manual 5 Solved 40
No ratings yet
Lab Manual 5 Solved 40
13 pages
Lab File
No ratings yet
Lab File
96 pages
ML-Lab05-Data Preprocessing Techniques in Python
No ratings yet
ML-Lab05-Data Preprocessing Techniques in Python
7 pages
Fda Exp2 E0323040
No ratings yet
Fda Exp2 E0323040
3 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
(Feature Engineering) (Extended-Cheatsheet)
100% (1)
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Data Wrangling and Imputation Techniques
100% (1)
Data Wrangling and Imputation Techniques
41 pages
DMML Lab Report 03
No ratings yet
DMML Lab Report 03
9 pages
21BDS0357 VL2024250504577 Ast02
No ratings yet
21BDS0357 VL2024250504577 Ast02
5 pages
Lab2 Day8 23BCSA84 AssignmentSolution
No ratings yet
Lab2 Day8 23BCSA84 AssignmentSolution
7 pages
Data Preparation.2
No ratings yet
Data Preparation.2
18 pages
Unit-1 AI ML PYTHON - Jupyter Notebook
No ratings yet
Unit-1 AI ML PYTHON - Jupyter Notebook
10 pages
Patient Data Management System
100% (1)
Patient Data Management System
27 pages
AI&ML
No ratings yet
AI&ML
9 pages
Data Preparation Techniques in Python
No ratings yet
Data Preparation Techniques in Python
9 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Exp 2 Data Preprocessing - Cleaning The Dataset Obtained From The UCI ML Repository
No ratings yet
Exp 2 Data Preprocessing - Cleaning The Dataset Obtained From The UCI ML Repository
9 pages
Parth ML
No ratings yet
Parth ML
24 pages
Clothes Size Prediction With KNN
No ratings yet
Clothes Size Prediction With KNN
11 pages
Data Science Practicals - Ipynb
No ratings yet
Data Science Practicals - Ipynb
54 pages
Data Assigment 1
100% (2)
Data Assigment 1
32 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Untitled Document
No ratings yet
Untitled Document
2 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
Step-by-Step Explanation of Python Data Preprocessing Script
No ratings yet
Step-by-Step Explanation of Python Data Preprocessing Script
9 pages
One-Hot Encoding for Categorical Data
No ratings yet
One-Hot Encoding for Categorical Data
2 pages
Lab Questionbank
No ratings yet
Lab Questionbank
3 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
22IZ023 Nikhil - Exercise 5 - Data Preprocessing
No ratings yet
22IZ023 Nikhil - Exercise 5 - Data Preprocessing
4 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Exp 2
No ratings yet
Exp 2
6 pages
Dsbda Assign1
No ratings yet
Dsbda Assign1
4 pages
Lab 6
No ratings yet
Lab 6
6 pages
Enda Practical 3 Explanation One
No ratings yet
Enda Practical 3 Explanation One
7 pages
Avinash DA 6
No ratings yet
Avinash DA 6
3 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
65 pages
Machine Learning Data Preprocessing
No ratings yet
Machine Learning Data Preprocessing
51 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Week 10
No ratings yet
Week 10
50 pages
2777959-Day 8 - Data Wrangling
No ratings yet
2777959-Day 8 - Data Wrangling
2 pages
Pandas Workshop: Data Analysis Guide
No ratings yet
Pandas Workshop: Data Analysis Guide
13 pages
Mini Project2 DAV Answers - Jupyter Notebook
No ratings yet
Mini Project2 DAV Answers - Jupyter Notebook
21 pages
Python Feature Engineering Guide
No ratings yet
Python Feature Engineering Guide
27 pages
DV Mid Internal 1
No ratings yet
DV Mid Internal 1
8 pages
ML Practice
No ratings yet
ML Practice
10 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
Untitled1.ipynb - Colab
No ratings yet
Untitled1.ipynb - Colab
10 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Dsbda Lab - 1 - 1736243987425
No ratings yet
Dsbda Lab - 1 - 1736243987425
10 pages
Aide Memoire Preparation Des Données
No ratings yet
Aide Memoire Preparation Des Données
2 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
List of Connectors (Nach Rasmus K. Ursem)
No ratings yet
List of Connectors (Nach Rasmus K. Ursem)
3 pages
Overview Management and Exploitation of Fishery Resources of Cameroon PDF
No ratings yet
Overview Management and Exploitation of Fishery Resources of Cameroon PDF
70 pages
Quarterly Exam Dates 2024-25
No ratings yet
Quarterly Exam Dates 2024-25
1 page
Montessori Language Concepts for Children
100% (1)
Montessori Language Concepts for Children
11 pages
Montanez Melvincent: Northville 4a Lambakin Marilao Bulacan Mobile No.: N/A Email
No ratings yet
Montanez Melvincent: Northville 4a Lambakin Marilao Bulacan Mobile No.: N/A Email
7 pages
Paper Strip Game
No ratings yet
Paper Strip Game
6 pages
End of Chapter 8 (p.606) Questions 1,2,4,8,14.: Short Answer
No ratings yet
End of Chapter 8 (p.606) Questions 1,2,4,8,14.: Short Answer
4 pages
Assignment 15-16
No ratings yet
Assignment 15-16
9 pages
Buayan National High School Video Project Rubric
No ratings yet
Buayan National High School Video Project Rubric
1 page
Research Proposal Example Robots in Cruise Tourism
No ratings yet
Research Proposal Example Robots in Cruise Tourism
2 pages
Geology: Merit Badge Workbook
No ratings yet
Geology: Merit Badge Workbook
6 pages
Annual Report 2021
No ratings yet
Annual Report 2021
8 pages
06B 2024 Assignment 3C
No ratings yet
06B 2024 Assignment 3C
5 pages
MATLAB A Ubiquitous Tool For The Practical Engineer
No ratings yet
MATLAB A Ubiquitous Tool For The Practical Engineer
558 pages
Error Detection / Correction: Computer Organization & Architecture
No ratings yet
Error Detection / Correction: Computer Organization & Architecture
18 pages
Squashing Commits With Git
No ratings yet
Squashing Commits With Git
4 pages
Intelligent Skyscraper Monitoring System
No ratings yet
Intelligent Skyscraper Monitoring System
10 pages
Presentation On: Supplier Evaluation and Selection
No ratings yet
Presentation On: Supplier Evaluation and Selection
38 pages
Python List String Loop Questions
No ratings yet
Python List String Loop Questions
6 pages
Roderick Jones - Conference Interpreting Explained
100% (1)
Roderick Jones - Conference Interpreting Explained
19 pages
SAP BI Interview Prep Guide
No ratings yet
SAP BI Interview Prep Guide
3 pages
Grashof Law
No ratings yet
Grashof Law
7 pages
Shams Dubai Initiative Contractor List
No ratings yet
Shams Dubai Initiative Contractor List
4 pages
9 Essential CBT Techniques and Tools Psychotherapy
100% (1)
9 Essential CBT Techniques and Tools Psychotherapy
4 pages
Code Blue Chapter 1 Questions
100% (1)
Code Blue Chapter 1 Questions
2 pages
Dec. 4 - Korean Literature
No ratings yet
Dec. 4 - Korean Literature
3 pages
Science Writing for Researchers
No ratings yet
Science Writing for Researchers
17 pages
Electric Motor Control Principles
100% (1)
Electric Motor Control Principles
19 pages
Key Lists of India's Natural Wonders
No ratings yet
Key Lists of India's Natural Wonders
41 pages
Linear Programming Course Syllabus
No ratings yet
Linear Programming Course Syllabus
3 pages

Untitled Document 5

Uploaded by

Untitled Document 5

Uploaded by

Assignment 23

# 1. Handling missing values

# Filling with median for numerical columns

# 2. Encoding categorical data

# Merging one-hot encoded columns back to the dataframe

# 4. Splitting the dataset

# 5. Removing duplicate rows

Explanation:Handling Missing Values: The code demonstrates three approaches to handling

# Display the final DataFrame

You might also like