0% found this document useful (0 votes)
20 views2 pages

Practical 6 Encoding

Practical_6_Encoding : DSV

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

Practical 6 Encoding

Practical_6_Encoding : DSV

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Practical 6: Perform Encoding of Categorical Variables

In data preprocessing, categorical variables need to be transformed into numerical


representations so that machine learning algorithms can process them effectively. This
practical demonstrates how to apply One-Hot Encoding, Label Encoding, and
preprocessing techniques such as scaling, normalization, and handling missing values. The
dataset used contains student details, including gender, city, mobile, semester marks, and
more.

from sklearn.compose import ColumnTransformer


from sklearn.preprocessing import OneHotEncoder
import numpy as np
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/DSV /Dataset_(12202080501060)/student_dataset_with_missing_

# Drop 'Name' and 'Enrollment' as they are likely unique identifiers and not useful for encoding
df = df.drop(['Name', 'Enrollment'], axis=1)

# Separate features (X) and target (y - assuming the last column is the target)
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

# Identify the index of the 'Gender' column in the modified dataframe


gender_col_index = df.columns.get_loc('Gender')
city_col_index = df.columns.get_loc('City')
mobile_col_index = df.columns.get_loc('Mobile')

# Use ColumnTransformer to apply OneHotEncoder to 'Gender' and 'City'


from sklearn.compose import make_column_transformer
from sklearn.impute import SimpleImputer

numeric_transformer = SimpleImputer(strategy='mean')
categorical_transformer = OneHotEncoder(handle_unknown='ignore')

ct = make_column_transformer(
(categorical_transformer, [gender_col_index, city_col_index]),
(numeric_transformer, [mobile_col_index]),
remainder='passthrough'
)

X = ct.fit_transform(X)
X = X.toarray() if hasattr(X, 'toarray') else X
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=7)

from sklearn.preprocessing import StandardScaler


sc = StandardScaler()
X_train_numeric = X_train[:, 8:]
X_test_numeric = X_test[:, 8:]
X_train_scaled = sc.fit_transform(X_train_numeric)
X_test_scaled = sc.transform(X_test_numeric)

from sklearn.preprocessing import Normalizer


nm = Normalizer()
numerical_cols_indices = slice(8, None)
imputer_numerical = SimpleImputer(missing_values=np.nan, strategy='mean')
X_train[:, numerical_cols_indices] = imputer_numerical.fit_transform(X_train[:, numerical_cols_indices])
X_test[:, numerical_cols_indices] = imputer_numerical.transform(X_test[:, numerical_cols_indices])
X_train[:, numerical_cols_indices] = nm.fit_transform(X_train[:, numerical_cols_indices])
X_test[:, numerical_cols_indices] = nm.transform(X_test[:, numerical_cols_indices])

Important Points: 1. One-Hot Encoding is used for categorical variables like Gender and
City. 2. Label Encoding is applied on the target variable. 3. Missing values in numerical
columns are handled using mean imputation. 4. StandardScaler normalizes numerical
values to a common scale. 5. Normalizer ensures feature vectors have unit norm.

Conclusion:
Encoding categorical variables is a crucial step in data preprocessing. It allows machine
learning models to interpret categorical data effectively. In this practical, we successfully
encoded categorical features, handled missing values, and applied scaling and
normalization to numerical data, preparing the dataset for model building.

You might also like