0% found this document useful (0 votes)
46 views20 pages

SEETHIS2.ipynb - Colab

The document provides a practice dataset for analyzing player engagement in gaming, including variables such as PlayerID, Age, Gender, and EngagementLevel. It emphasizes the importance of understanding concepts and problem-solving rather than memorizing specific questions or code solutions. Additionally, it outlines preprocessing steps and questions related to data analysis, including handling missing values and dataset splitting.

Uploaded by

ramras0509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views20 pages

SEETHIS2.ipynb - Colab

The document provides a practice dataset for analyzing player engagement in gaming, including variables such as PlayerID, Age, Gender, and EngagementLevel. It emphasizes the importance of understanding concepts and problem-solving rather than memorizing specific questions or code solutions. Additionally, it outlines preprocessing steps and questions related to data analysis, including handling missing values and dataset splitting.

Uploaded by

ramras0509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

8/16/25, 10:38 AM SEETHIS2.

ipynb - Colab

⚠️ Important Notice:
These questions are meant for practice purposes only. They do not represent the actual OPPE
paper pattern, question style, or any official material. Please avoid memorizing the question
patterns or the exact code solutions.

Instead, focus on developing the skill to tackle unfamiliar problems — explore library
documentation, understand concepts, and learn how to apply them in new contexts.

Regards
MLT TAs

keyboard_arrow_down MetaData
Feature Variables
PlayerID: Unique identifier for each player. Age: Age of the player.
Gender: Gender of the player.
Location: Geographic location of the player.
GameGenre: Genre of the game the player is engaged in.
PlayTimeHours: Average hours spent playing per session.
InGamePurchases: Indicates whether the player makes in-game purchases (0 = No, 1 =
Yes).
GameDifficulty: Difficulty level of the game.
SessionsPerWeek: Number of gaming sessions per week.
AvgSessionDurationMinutes: Average duration of each gaming session in minutes.
PlayerLevel: Current level of the player in the game.
AchievementsUnlocked: Number of achievements unlocked by the player.

Target Variable:
EngagementLevel: Indicates the level of player engagement categorized as 'High', 'Medium', or
'Low'.

Use the below link to access the dataset

Dataset V1: https://drive.google.com/file/d/1Z_Ns6fUzUVfGpM8Rqi4QIFzBVx19ZukK/view?


usp=sharing

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 1/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

keyboard_arrow_down Preprocessing
import numpy as np
import pandas as pd

from google.colab import files


uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been
executed in the current browser session. Please rerun this cell to enable.
Saving preprocessing V1 (1).csv to preprocessing V1 (1) (1).csv

df = pd.read_csv('preprocessing V1 (1).csv')

Q1. [marks : 0][MCQ] Which dataset are you using for this exam?

Options:

(A) MLP OPPE2 preprocessing V1.csv

(B) MLP OPPE2 preprocessing V2.csv

(C) MLP OPPE2 preprocessing V3.csv

Answer: V1: A, V2:B, V3: C

keyboard_arrow_down Q2[Marks:4][MCQ] Which of the following columns have object


datatype?

(A) Age

(B) Gender

(C) Location

(D) GameGenre

(E) PlayTimeHours

(F) PlayTimeHours

Ans: B,C,D

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
# Column Non-Null Count Dtype
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 2/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
--- ------ -------------- -----
0 PlayerID 10000 non-null int64
1 Age 9200 non-null float64
2 Gender 10000 non-null object
3 Location 9202 non-null object
4 GameGenre 10000 non-null object
5 PlayTimeHours 10000 non-null float64
6 InGamePurchases 9107 non-null float64
7 GameDifficulty 9154 non-null object
8 SessionsPerWeek 10000 non-null int64
9 AvgSessionDurationMinutes 10000 non-null int64
10 PlayerLevel 10000 non-null int64
11 AchievementsUnlocked 10000 non-null int64
12 EngagementLevel 10000 non-null object
dtypes: float64(3), int64(5), object(5)
memory usage: 1015.8+ KB

df.head(2)

PlayerID Age Gender Location GameGenre PlayTimeHours InGamePurchases GameDi

0 35900 37.0 Male Other Strategy 23.929404 NaN

1 27085 25.0 Male NaN Action 22.755168 1.0

df.InGamePurchases

InGamePurchases

0 NaN

1 1.0

2 0.0

3 NaN

4 1.0

... ...

9995 0.0

9996 0.0

9997 0.0

9998 0.0

9999 1.0

10000 rows × 1 columns

dtype: float64

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 3/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

keyboard_arrow_down Q3 [Marks:4][NAT] In this dataset, how many "Males" from "Europe" have
made "InGamePurchases" ?

Ans: 299

df[(df.Gender=='Male')&(df.Location=='Europe')&(df.InGamePurchases==1.0)].shape[0]

299

df.columns

Index(['PlayerID', 'Age', 'Gender', 'Location', 'GameGenre', 'PlayTimeHours',


'InGamePurchases', 'GameDifficulty', 'SessionsPerWeek',
'AvgSessionDurationMinutes', 'PlayerLevel', 'AchievementsUnlocked',
'EngagementLevel'],
dtype='object')

keyboard_arrow_down Q4 [Marks:4][NAT] In your dataset, how many players under the "Age"
18 have strictly greater than 10 "PlayTimeHours" ?

Ans: 453

df[(df.Age<18)&(df.PlayTimeHours>10)].shape[0]

453

keyboard_arrow_down Q5 [Marks:4][MCQ] Which of the following options represent all the


unique categories present in "GameGenre" feature?

options

(A) [Action, Adventure, Simulation, Sports, Strategy]

(B) [Action, RPG, Racing, Sports, Strategy]

(C) [Action, RPG, Simulation, Sports, Strategy]

(D) [Adventure, Puzzle, RPG, Simulation, Sports]

Ans : C

df.GameGenre.unique()

array(['Strategy', 'Action', 'Simulation', 'RPG', 'Sports'], dtype=object)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 4/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

keyboard_arrow_down Q6.[Marks: 4][NAT] Create feature matrix(X) and label vector(y) using
following instructions:

"EngagementLevel" is the target column(y).


All the columns except the target column are in feature matrix(X).

Compare the correlation value(r) for all numeric (Int,float) Dtype features
pair of the feature matrix(X) and write the Highest Positive Correlation
Value .

Ans: 0.025 (0.023, 0.027)

X = df.drop(columns='EngagementLevel')
y = df.EngagementLevel

corr = X.corr(numeric_only=True)
corr_v = corr.unstack()
corr_v = corr_v[corr_v<1]

corr_v.max()

0.025499335732023704

keyboard_arrow_down Q7[4 Marks][NAT] How many total null values were present in the whole
dataset ?

Ans: 3337

df.isnull().sum().sum()

np.int64(3337)

keyboard_arrow_down Q8 [Marks 5][MCQ] Split the dataset into train dataset and test dataset in
the following manner.

Use sklearn train_test_split function to split the data.


Use only 20% data as test_set and keep random_state = 42

Which category has the least value counts in y_train?

(A) Low

(B) Medium

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 5/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

(C) High

Ans: C

from sklearn.model_selection import train_test_split


X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

y_train.value_counts()

count

EngagementLevel

Medium 3983

Low 2021

High 1996

dtype: int64

keyboard_arrow_down Common Instructions for Question 9 and 10


Rules for Imputing the missing(NaN) or Unknown values:

Calculating statistical values (such as mean, median, mode) for each column in the
training dataset.

Applying these calculated statistical values to replace missing (NaN) and unknown values
in both the training and test datasets.

Ensure that the calculation of statistical values excludes any rows containing missing or
unknown values.

Replace Unknown values in the "Age" feature with the Mean value in that.

Replace Unknown values in the "Location" feature with the constant value "Other".

Replace Unknown values in the "GameDifficulty" feature with the Most Frequent value in
that.

Replace Unknown values in the "InGamePurchases" feature with the constant value '0'.

Write the answers related to the above imputation in below questions respectively.

df.Location.unique()

array(['Other', nan, 'Europe', 'USA', 'Asia'], dtype=object)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 6/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

df.InGamePurchases.unique()

array([nan, 1., 0.])

age_mean = X_train.Age.mean()

X_train.Age.fillna(age_mean,inplace=True)
X_test.Age.fillna(age_mean,inplace=True)

X_train.Location.fillna('Other',inplace=True)
X_test.Location.fillna('Other',inplace=True)

difficulty_mode=X_train.GameDifficulty.mode()[0]
X_train.GameDifficulty.fillna(difficulty_mode,inplace=True)
X_test.GameDifficulty.fillna(difficulty_mode,inplace=True)

X_train.InGamePurchases.fillna(0,inplace=True)
X_test.InGamePurchases.fillna(0,inplace=True)

/tmp/ipython-input-3245705817.py:3: FutureWarning: A value is trying to be set on a c


The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_train.Age.fillna(age_mean,inplace=True)
/tmp/ipython-input-3245705817.py:4: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_test.Age.fillna(age_mean,inplace=True)
/tmp/ipython-input-3245705817.py:6: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_train.Location.fillna('Other',inplace=True)
/tmp/ipython-input-3245705817.py:7: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_test.Location.fillna('Other',inplace=True)
/tmp/ipython-input-3245705817.py:11: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_train.GameDifficulty.fillna(difficulty_mode,inplace=True)
/tmp/ipython-input-3245705817.py:12: FutureWarning: A value is trying to be set on a

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 7/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_test.GameDifficulty.fillna(difficulty_mode,inplace=True)
/tmp/ipython-input-3245705817.py:15: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_train.InGamePurchases.fillna(0,inplace=True)
/tmp/ipython-input-3245705817.py:16: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_test.InGamePurchases.fillna(0,inplace=True)

keyboard_arrow_down Q9 [Marks: 5][NAT] Write the sum of transformed(imputed) "Age"


column of the test dataset . [NAT] (upto 2 digits after decimal points)

Answer: 63585.23 (Range: 63584,63586)

X_test.Age.sum()

np.float64(63585.239130434784)

Q10 [Marks: 5][NAT] Let's say most frequent category in the

keyboard_arrow_down "GameDifficulty" column of train dataset is "XXX" . What is the value


count of "XXX" in "GameDifficulty" column of test dataset after
imputation ?

Answer: 1102

difficulty_mode
X_test.GameDifficulty.value_counts()

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 8/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

count

GameDifficulty

Easy 1102

Medium 569

Hard 329

dtype: int64

keyboard_arrow_down Apply preprocessing on features of train and test datasets.


Drop the "PlayerID" Column before the preprocessing steps.

before applying any preprocessing there should not be any missing or unknown values
present in the train and test dataset.

Learn transformers' parameters using training set only and then transform train & test sets
using them.

For Categorical Features

Ordinal Features

Ordinally Encode "GameDifficulty"

GameDifficulty Order

Easy 0

Medium 1

Hard 2

Nominal Features

One-Hot Encode 'Gender', 'Location', 'GameGenre' features and keep drop_first =


True.

Scaling Features

Scale all the features (transformed categorical and numerical) of the feature matrix using
the StandardScaler

y_test.isnull().sum()

np.int64(0)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 9/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

X_train.drop(columns=['PlayerID'], inplace=True)
X_test.drop(columns=['PlayerID'], inplace=True)

from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, StandardScaler


from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

num_cols = X_train.select_dtypes(include=['int64', 'float64']).columns

# Fill categorical NaN with 'Other'


cat_cols = X_train.select_dtypes(include=['object']).columns

# 3️⃣ Define categorical feature groups


ordinal_features = ['GameDifficulty']
ordinal_mapping = [['Easy', 'Medium', 'Hard']]

nominal_features = ['Gender', 'Location', 'GameGenre']

# 4️⃣ Create transformers


ordinal_transformer = OrdinalEncoder(categories=ordinal_mapping)
nominal_transformer = OneHotEncoder(drop='first', handle_unknown='ignore')

# 5️⃣ Build ColumnTransformer


preprocessor = ColumnTransformer(
transformers=[
('ord', ordinal_transformer, ordinal_features),
('nom', nominal_transformer, nominal_features),
('num', 'passthrough', num_cols) # numeric features go through directly
]
)

# 6️⃣ Create pipeline with scaling


pipeline = Pipeline([
('preprocessor', preprocessor),
('scaler', StandardScaler())
])

# 7️⃣ Fit on train and transform both sets


X_train_transformed = pipeline.fit_transform(X_train)
X_test_transformed = pipeline.transform(X_test)

# Check shapes
print("Train shape:", X_train_transformed.shape)
print("Test shape:", X_test_transformed.shape)

# ✅ At this stage: X_train_transformed & X_test_transformed are fully processed & scaled

Train shape: (8000, 16)


Test shape: (2000, 16)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 10/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

X_train_processed = preprocessor.fit_transform(X_train)
X_test_processed = preprocessor.transform(X_test)

Q11[Marks:10][NAT] Calculate the sum of all the values present in first

keyboard_arrow_down five rows of transformed test feature matrix ? (upto 2 digits afer the
decimal)

Answer: -7.1728 (-7.19 , -7.15)

first_five = X_test_transformed[:5]

# Sum all values


total_sum = first_five.sum()

# Round to 2 decimal places


total_sum_2dec = round(total_sum, 2)

print(total_sum_2dec)

-7.17

keyboard_arrow_down Model Building


Set or change the parameters specified in the question, while keeping all other parameters
at their default values.

keyboard_arrow_down Use the below link to access the dataset

Dataset V1: https://drive.google.com/file/d/1IybNLBUwfvsiyEBjDt4iDxa4WOKnyONh/view?


usp=sharing

from google.colab import files


uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been
executed in the current browser session. Please rerun this cell to enable.
Saving ModelBuilding V1 (1).csv to ModelBuilding V1 (1).csv

df = pd.read_csv('ModelBuilding V1 (1).csv')

keyboard_arrow_down Q1. [marks : 0] Which dataset are you using for this Model Building
section?

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 11/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

Options:

A) MLP OPPE2 ModelBuilding V1.csv

B) MLP OPPE2 ModelBuilding V2.csv

C) MLP OPPE2 ModelBuilding V3.csv

Answer: V1: A, V2:B, V3: C

X = df.drop(columns=['EngagementLevel'])
y = df.EngagementLevel

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

Split the dataset into train dataset and test dataset in the following manner.

"EngagementLevel" is the target column(y).


All the columns except the target column are in feature matrix(X).
Use sklearn train_test_split function to split the data.
Use only 20% data as test_set and keep random_state = 42

keyboard_arrow_down Q2 [5 marks][NAT] Take LogisticRegression estimator with following


parameters for training:

Use sag as solver


Set random state to be equal to 42
Tolerance for stopping criteria to be 1e-3
Maximum number of iterations taken for the solvers to converge to be 100

Enter the recall score for class 1 of y_test for the given model using test
set(X_test, y_test)

No Changes

Answer: 0.7044 (0.690, 0.715)

from sklearn.linear_model import LogisticRegression


from sklearn.metrics import recall_score

model = LogisticRegression(solver='sag',
random_state=42,tol=1e-3,max_iter=100)
model.fit(X_train,y_train)

y_pred = model.predict(X_test)

print(recall_score(y_test, y_pred,labels=[1], average='macro'))

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 12/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

Q3 [6 marks] [NAT] Use SGDClassifier on the training dataset

keyboard_arrow_down ( X_train and y_train ) to train the model. Use the following
parameters:

1. log_loss is the loss function to be used


2. apply ridge regularization,
3. maximum number of passes over the training data is 10
4. constant learning rate of 0.01,
5. regularization rate value is 0.001,
6. Take random_state=42.
7. Set warm_start as False

Note : Please ignore the convergence warning.

Using above model, calculate and write the correct value of f1_score for
label class = 2 of the test set.
Version rs=64 rs=42

V1 0.8167 (0.800, 0.825) 0.817 (0.80,0.825)

from sklearn.linear_model import SGDClassifier


from sklearn.metrics import f1_score

# Step 1: Train the model


sgd = SGDClassifier(
loss='log_loss', # logistic regression loss
penalty='l2', # ridge regularization
max_iter=10, # number of passes over the data
learning_rate='constant',
eta0=0.01, # learning rate
alpha=0.001, # regularization strength
random_state=42,
warm_start=False
)

sgd.fit(X_train, y_train)

# Step 2: Predictions
y_pred = sgd.predict(X_test)

# Step 3: f1-score for class label = 2


f1_class2 = f1_score(y_test, y_pred, labels=[2], average='macro')
print(f1_class2)

0.8170854271356784

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 13/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

keyboard_arrow_down Q4. [6 marks] [MCQ] Tune the parameters using Gridsearchcv for below
settings

estimator = KNeighborsClassifier

scoring =accuracy

cv= 5

Consider following parameters for KNeighborsClassifier:

n_neighbors = [19,23,27,31]
metric = "minkowski"
Set p value for minkowski = 2

Keep other parameter values as default values.

What is the best value of K you obtained using the above instructions?

Options

A) 19

B) 23

C) 27

D) 31

Ans: B

from sklearn.model_selection import GridSearchCV


from sklearn.neighbors import KNeighborsClassifier

# Step 1: Model
knn = KNeighborsClassifier(metric='minkowski', p=2)

# Step 2: Parameter grid


param_grid = {
'n_neighbors': [19, 23, 27, 31]
}

# Step 3: GridSearchCV
grid = GridSearchCV(
estimator=knn,
param_grid=param_grid,
scoring='accuracy',
cv=5
)

grid.fit(X_train, y_train)

# Step 4: Best K
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 14/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

print(grid.best_params_) # {'n_neighbors': XX}


print(grid.best_score_)

{'n_neighbors': 23}
0.6859999999999999

keyboard_arrow_down Q.5 [3 Marks] [MCQ] Fit an SVM classifier with following parameters:

kernel='rbf'
decision_function_shape='ovr'
random_state=42
C=1

Train the model on training data, and choose the option with correct
confusion matrix on test data.

438 17 64
⎡ ⎤
(A) ⎢ 11 407 87 ⎥
⎣ ⎦
36 50 890

434 18 60
⎡ ⎤
(B) ⎢ 13 409 87 ⎥
⎣ ⎦
38 50 891

449 16 75
⎡ ⎤
(C) ⎢ 11 382 101 ⎥
⎣ ⎦
38 45 883

449 16 75
⎡ ⎤

⎢ 11 382 101 ⎥
⎣ ⎦
38 45 883

465 11 57
⎡ ⎤
(D)⎢ 13 424 97 ⎥
⎣ ⎦
23 46 864

Answers: C

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 15/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

from sklearn.svm import SVC


from sklearn.metrics import confusion_matrix

# Step 1: Model
svm_model = SVC(
kernel='rbf',
decision_function_shape='ovr',
random_state=42,
C=1
)

# Step 2: Train
svm_model.fit(X_train, y_train)

# Step 3: Predict
y_pred = svm_model.predict(X_test)

# Step 4: Confusion matrix


cm = confusion_matrix(y_test, y_pred)
print(cm)

Common Instruction for Question 6 to 8

Train a Decision Tree Classifier with the following properties:

criterion = 'entropy'
splitter = 'random'
min_samples_split = 4
min_impurity_decrease = 0.0001
random_state = 42

keyboard_arrow_down Q.6 [MCQ][3 Marks] What is the depth of the trained tree?

Answer: 20

from sklearn.tree import DecisionTreeClassifier

# 1️⃣ Train the Decision Tree


dt = DecisionTreeClassifier(
criterion='entropy',
splitter='random',
min_samples_split=4,
min_impurity_decrease=0.0001,
random_state=42
)
dt.fit(X_train, y_train)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 16/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

▾ DecisionTreeClassifier i ?

DecisionTreeClassifier(criterion='entropy', min_impurity_decrease=0.0001,
min_samples_split=4, random_state=42, splitter='random')

dt.get_depth()

20

keyboard_arrow_down Q.7 [NAT][3 Marks] How many nodes are there in the tree?

Answer: 2367 (Range: 2365, 2369)

dt.tree_.node_count

keyboard_arrow_down Q.8 [NAT][5 Marks] What is the value of entropy at the left child of root
node? (correct upto 2 digit after decimal)

Answer: 1.1607 (1.15,1.17)

dt.tree_.impurity[1]

(Common Instructions for Q9,Q10)

Take an adaboost model with following hyperparameter values


and tune it using GridsearchCV.
Use n_estimators as [10,20,30]
random_state = 42
Use learning_rate as [0.5,1,2]
Take cv value= 5

Q9[Marks: 5] [NAT] Train the 'model' using above instructions and use

keyboard_arrow_down the best estimator to calculate the total number of misclassified


samples for the test data and submit the value.

Answer: 436 (Range: 434,438 )

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 17/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

from sklearn.ensemble import AdaBoostClassifier


from sklearn.model_selection import GridSearchCV

# Define model and parameters


ada = AdaBoostClassifier(random_state=42)
param_grid = {'n_estimators':[10,20,30], 'learning_rate':[0.5,1,2]}

# GridSearchCV
grid = GridSearchCV(ada, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)
# Predict and count misclassified samples
misclassified = (y_test != y_pred).sum()
print(misclassified)

keyboard_arrow_down Q10[Marks: 5][MCQ] Choose the value of n_estimators of the best


model after training with GridSearchCV.

Options:

A) 10

B) 20

C) 30

D) None of these

Answer: C

grid.best_estimator_

keyboard_arrow_down Q11. [Marks: 5 ] [NAT] Train with RandomizedSearchCV.

Keep Below Settings for RandomForestClassifier(random_state=42) estimator

n_estimators = range(2,100)
max_depth = range(1,11)
min_impurity_decrease = uniform(loc=0,scale=5)

Keep below settings for RandomizedSearchCV

estimator = RandomForestClassifier(random_state=42)
random_state = 42
n_iter=5
cv=3
n_jobs= -1
verbose=2

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 18/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

hint: from scipy.stats import uniform

Submit the best param value for n_estimators using this


RandomizedSearchCV on train data.

Answer: 16

Q12[Marks 4] [NAT] Write the best param value for

keyboard_arrow_down min_impurity_decrease using this RandomizedSearchCV on train data.


(upto 2 digit after the decimal)

Answers: 3.9827 (Range: 3.95,4.00)

from sklearn.ensemble import RandomForestClassifier


from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
import numpy as np

# Step 1: Define the model


rf = RandomForestClassifier(random_state=42)

# Step 2: Define the parameter distributions


param_dist = {
'n_estimators': range(2, 100),
'max_depth': range(1, 11),
'min_impurity_decrease': uniform(loc=0, scale=5)
}

# Step 3: RandomizedSearchCV
rand_search = RandomizedSearchCV(
estimator=rf,
param_distributions=param_dist,
n_iter=5,
cv=3,
random_state=42,
n_jobs=-1,
verbose=2
)

# Step 4: Fit on training data


rand_search.fit(X_train, y_train)

# Step 5: Best parameters


best_n_estimators = rand_search.best_params_['n_estimators']
best_min_impurity_decrease = rand_search.best_params_['min_impurity_decrease']

print("Best n_estimators:", best_n_estimators)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 19/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

print("Best min_impurity_decrease:", round(best_min_impurity_decrease, 2))

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 20/20

You might also like