0% found this document useful (0 votes)

83 views10 pages

LAB EXERCISE 2 - Data Preprocessing

Uploaded by

shreya halaswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views10 pages

LAB EXERCISE 2 - Data Preprocessing

Uploaded by

shreya halaswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

LAB EXERCISE – 2

Data Preprocessing

Aim of the Experiment.

The main aim of this experiment is to preprocess the given dataset. The database is created
and is available in the file [Link].
Sample Dataset

id first last gender Marks selected

1 Leone Debrick Female 50 TRUE
2 Romola Phinness Female 60 FALSE
y
3 Geri Prium Male 65 FALSE
4 Sandy Doveston Female 95 FALSE
5 Jacenta Jansik Female 31 TRUE
6 Diane- Medhurst Female 45 TRUE
marie
7 Austen Pool Male 45 TRUE
8 Vanya Teffrey Male 70 FALSE
9 Giordano Elloy Male 36 FALSE
10 Rozele Fawcett Female 50 FALSE

The objectives of this experiment are

1. Explore Label Encoder

2. Explore Scikit Preprocessing routines like Scaling
3. Explore Scikit Preprocessing routines like Binarizer

Reference to the Textbook and Explanation

All the fundamentals are given in Chapter 2 and Appendix 2.

The variable in the dataset Female and Male can be changed to 0 or 1 using Label Encoder. It is done as
given below:

df_gender_encode=LabelEncoder()

[Link]=df_gender_encode.fit_transform([Link])

Scaling can be done as follows:

[Link] = [Link]([Link])

scaled_df= [Link]([Link])

Scaling removes the mean

Copyright @ Oxford University Press, India 2021

Binarization uses threshold and converts values to binary as shown below:

scaled_df_bin = [Link](threshold=0.5).transform(newarr)

Duplicates can be removed as follows:

df_duplicates_removed = [Link].drop_duplicates(df_duplicated)

The NaN of a column can be removed as shown below:

df['m5']=df['m5'].fillna(0)

This removes all the NaN to zero.

The command,

df=[Link](axis=1)

removes all the columns that has NaN.

Listing 1

import pandas as pd

col_list=["id","first","last","gender","Marks","selected"]

df = pd.read_csv("[Link]",usecols=col_list)

print(df)

print("End of Listing\n\n\n")

# Let us convert the in Gender column, make Female as 0 and

# male as 1 using LabelEncoder in scikitlearn method

from [Link] import LabelEncoder

df_gender_encode=LabelEncoder()

[Link]=df_gender_encode.fit_transform([Link])

# One can observe that female is coded as 0 and Male as 1

print(df)

print("End of Listing\n\n\n")

# Now one can scale the marks to remove mean

Copyright @ Oxford University Press, India 2021

from sklearn import preprocessing

[Link] = [Link]([Link])

scaled_df= [Link]([Link])

print(df)

print("Scaling of marks is completed\n\n\n\n")

newarr = scaled_df.reshape(-1,1)

scaled_df_bin = [Link](threshold=0.5).transform(newarr)

df['Marks']=scaled_df_bin

print(df)

print("Binarizarion of marks is completed\n\n\n\n")

Output

Copyright @ Oxford University Press, India 2021

import pandas as pd

col_list=["id","first","last","gender","Marks","selected"]

df = pd.read_csv("[Link]",usecols=col_list)

print(df)

print("End of Listing\n\n\n")

# Let us create duplicate elements in the given dataset

# This is done using the command concate 2 times as given below

df_duplicated = [Link]([df]*2, ignore_index=True)

print(df_duplicated)

print("Display before duplication\n\n\n\n")

df_duplicates_removed = [Link].drop_duplicates(df_duplicated)

print(df_duplicates_removed)

print("Display after duplication\n\n\n\n")

Output

Copyright @ Oxford University Press, India 2021

import pandas as pd

df = [Link]({

'm1':[50,'A',60,'A',80],

'm2':[60,'A','60','A',80],

'm3':[50,70,'A','A',60],

'm4':[60,'A','A','A',60],

'm5':['A','A','A',10,20]

})

df = [Link](pd.to_numeric,errors='coerce')

print(df)

print('Dataframe with NaN\n\n\n')

# Make all the NaN in Mark5 as zero

df['m5']=df['m5'].fillna(0)

print(df)

print('Making m5 NaN as 0 using fillna() function\n\n\n\n')

df1 = [Link]()

df1['m2'].fillna(df1['m2'].mean(),inplace=True)

print(df1)

print('Making m5 NaN as mean using fillna() function\n\n\n\n')

df2 = [Link]()

df1['m3'].fillna(df1['m2'].median(),inplace=True)

print(df2)

print('Making m5 NaN as median using fillna() function\n\n\n\n')

# Dropping all columns having NaN

df=[Link](axis=1)

print(df)

print('Dropping all columns having NaN\n\n\n\n')

Output

Listing 4

This listing illustrates the use of MinMax scaling and Standard scaling for finding Z-scores.

from numpy import asarray

from [Link] import MinMaxScaler

from [Link] import StandardScaler

data = asarray([[1,3],[8,5],[6,7],[8,9]])

print("\n Original Data")

print(data)

scaler1 = MinMaxScaler()

scaler2 = StandardScaler()

scaled1 = scaler1.fit_transform(data)

scaled2 = scaler2.fit_transform(data)

print("\n\nThe output of MinMax Scaling")

print(scaled1)

print("\n\nThe output of Standard scaling as z-score")

print(scaled2)

Output

LAB EXERCISE 2 - Data Preprocessing
No ratings yet
LAB EXERCISE 2 - Data Preprocessing
10 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Data Preparation Techniques in Python
No ratings yet
Data Preparation Techniques in Python
9 pages
Manisadav
No ratings yet
Manisadav
29 pages
Ap Python
No ratings yet
Ap Python
12 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Python Data Analysis with Numpy & Pandas
No ratings yet
Python Data Analysis with Numpy & Pandas
19 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
6 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
DA Exp6 HTML
No ratings yet
DA Exp6 HTML
9 pages
Week 10
No ratings yet
Week 10
50 pages
Data Prep for ML Beginners
No ratings yet
Data Prep for ML Beginners
39 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Working With Pre (Rocessing Data Files
No ratings yet
Working With Pre (Rocessing Data Files
4 pages
L-2 (Data Frame Part 1) .Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1) .Ipynb - Colab
5 pages
Résumé-Analyse Des Données Resumee Resumee
No ratings yet
Résumé-Analyse Des Données Resumee Resumee
4 pages
Lab Manual 5 Solved 40
No ratings yet
Lab Manual 5 Solved 40
13 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
Data Processing
No ratings yet
Data Processing
19 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Data Preprocessing for Machine Learning
No ratings yet
Data Preprocessing for Machine Learning
38 pages
Lab 3 & 4
No ratings yet
Lab 3 & 4
10 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Lab2
No ratings yet
Lab2
8 pages
LP II Practical
No ratings yet
LP II Practical
5 pages
Practical File 2024
No ratings yet
Practical File 2024
25 pages
Pandas: Data Cleaning Essentials
No ratings yet
Pandas: Data Cleaning Essentials
6 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
32 pages
Lab2!17!07-2025 - Demonstrate Various Data Pre-Processing Techniques For A Given Dataset.
No ratings yet
Lab2!17!07-2025 - Demonstrate Various Data Pre-Processing Techniques For A Given Dataset.
17 pages
Logistic Regression and Beginner ML Notes
No ratings yet
Logistic Regression and Beginner ML Notes
9 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
Study Material For Machine Learning - 1 - 1754721598318
No ratings yet
Study Material For Machine Learning - 1 - 1754721598318
18 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
IntroToPython Unit 5
No ratings yet
IntroToPython Unit 5
42 pages
Academic Performance Data Wrangling
No ratings yet
Academic Performance Data Wrangling
9 pages
Python in Research
No ratings yet
Python in Research
18 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Lecture 5 Encoding
No ratings yet
Lecture 5 Encoding
35 pages
Machine Learning Data Preprocessing Guide
No ratings yet
Machine Learning Data Preprocessing Guide
24 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Pandas Data Analysis and Wrangling Guide
No ratings yet
Pandas Data Analysis and Wrangling Guide
12 pages
Ip Study
No ratings yet
Ip Study
18 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
ML-Lab05-Data Preprocessing Techniques in Python
No ratings yet
ML-Lab05-Data Preprocessing Techniques in Python
7 pages
Nilai Mapel 10 6A
No ratings yet
Nilai Mapel 10 6A
22 pages
Hoja Seguridad Placa Madre
No ratings yet
Hoja Seguridad Placa Madre
2 pages
Telenor Franchise Internship Report
No ratings yet
Telenor Franchise Internship Report
54 pages
The Jaar Script
No ratings yet
The Jaar Script
5 pages
The Importance of Operationa
No ratings yet
The Importance of Operationa
9 pages
GST Discussion Paper Analysis and Impact
No ratings yet
GST Discussion Paper Analysis and Impact
1 page
Enhance Your English Presentation Flow
No ratings yet
Enhance Your English Presentation Flow
2 pages
Stereotypes Research
No ratings yet
Stereotypes Research
5 pages
Big5 - Test and Scoring
No ratings yet
Big5 - Test and Scoring
3 pages
Edca Publishing vs. The Spouses Santo
No ratings yet
Edca Publishing vs. The Spouses Santo
2 pages
Is Dori A Bad Character in Genshin Impact - Google Search
No ratings yet
Is Dori A Bad Character in Genshin Impact - Google Search
1 page
ICDR Litigation Search Schedule 6
No ratings yet
ICDR Litigation Search Schedule 6
3 pages
EQUITABLE PCI BANKING CORPORATION Versus RCBC CAPITAL CORPORATION December 18 2008 G.R. No. 182248
No ratings yet
EQUITABLE PCI BANKING CORPORATION Versus RCBC CAPITAL CORPORATION December 18 2008 G.R. No. 182248
2 pages
Drama Essay on Slanda Bien Aime
50% (4)
Drama Essay on Slanda Bien Aime
2 pages
Gateway Drugs: Health Risks for Teens
No ratings yet
Gateway Drugs: Health Risks for Teens
10 pages
1 Sultan Dhow
No ratings yet
1 Sultan Dhow
9 pages
Makilala Executive Summary 2021
No ratings yet
Makilala Executive Summary 2021
5 pages
Apsrtc
No ratings yet
Apsrtc
2 pages
New Testament Church Review Questions-Answers
No ratings yet
New Testament Church Review Questions-Answers
16 pages
CBRC Yellow Book Wordpdf
100% (4)
CBRC Yellow Book Wordpdf
186 pages
REIL Electricals Maximum Retail Price List
No ratings yet
REIL Electricals Maximum Retail Price List
6 pages
A. The Child and Adolescent Learners
No ratings yet
A. The Child and Adolescent Learners
3 pages
Error Hit List
No ratings yet
Error Hit List
77 pages
Chromium-Vanadium Alloy Steel Spring Wire: Standard Specification For
No ratings yet
Chromium-Vanadium Alloy Steel Spring Wire: Standard Specification For
4 pages
Child Marriage Restraint Act 1929 (Amended in 2019)
No ratings yet
Child Marriage Restraint Act 1929 (Amended in 2019)
14 pages
Manila News Highlights
No ratings yet
Manila News Highlights
20 pages
Wikang Filipino
No ratings yet
Wikang Filipino
9 pages
India's Global Leadership Strategy
No ratings yet
India's Global Leadership Strategy
15 pages
Swot Analysis
100% (1)
Swot Analysis
3 pages
Narco Armor: Cartel Armored Vehicles in Mexico
100% (2)
Narco Armor: Cartel Armored Vehicles in Mexico
86 pages

LAB EXERCISE 2 - Data Preprocessing

Uploaded by

LAB EXERCISE 2 - Data Preprocessing

Uploaded by

LAB EXERCISE – 2

Aim of the Experiment.

id first last gender Marks selected

The objectives of this experiment are

1. Explore Label Encoder

Reference to the Textbook and Explanation

All the fundamentals are given in Chapter 2 and Appendix 2.

Scaling can be done as follows:

Scaling removes the mean

Copyright @ Oxford University Press, India 2021

Duplicates can be removed as follows:

The NaN of a column can be removed as shown below:

This removes all the NaN to zero.

removes all the columns that has NaN.

# Let us convert the in Gender column, make Female as 0 and

# male as 1 using LabelEncoder in scikitlearn method

from [Link] import LabelEncoder

# One can observe that female is coded as 0 and Male as 1

# Now one can scale the marks to remove mean

Copyright @ Oxford University Press, India 2021

print("Scaling of marks is completed\n\n\n\n")

print("Binarizarion of marks is completed\n\n\n\n")

Copyright @ Oxford University Press, India 2021

# Let us create duplicate elements in the given dataset

# This is done using the command concate 2 times as given below

df_duplicated = [Link]([df]*2, ignore_index=True)

print("Display before duplication\n\n\n\n")

print("Display after duplication\n\n\n\n")

Copyright @ Oxford University Press, India 2021

print('Dataframe with NaN\n\n\n')

# Make all the NaN in Mark5 as zero

print('Making m5 NaN as 0 using fillna() function\n\n\n\n')

print('Making m5 NaN as mean using fillna() function\n\n\n\n')

print('Making m5 NaN as median using fillna() function\n\n\n\n')

Copyright @ Oxford University Press, India 2021

print('Dropping all columns having NaN\n\n\n\n')

Copyright @ Oxford University Press, India 2021

from numpy import asarray

from [Link] import MinMaxScaler

from [Link] import StandardScaler

print("\n Original Data")

Copyright @ Oxford University Press, India 2021

print("\n\nThe output of MinMax Scaling")

print("\n\nThe output of Standard scaling as z-score")

Copyright @ Oxford University Press, India 2021

You might also like