0% found this document useful (0 votes)

46 views20 pages

SEETHIS2.ipynb - Colab

The document provides a practice dataset for analyzing player engagement in gaming, including variables such as PlayerID, Age, Gender, and EngagementLevel. It emphasizes the importance of understanding concepts and problem-solving rather than memorizing specific questions or code solutions. Additionally, it outlines preprocessing steps and questions related to data analysis, including handling missing values and dataset splitting.

Uploaded by

ramras0509

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views20 pages

SEETHIS2.ipynb - Colab

Uploaded by

ramras0509

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

8/16/25, 10:38 AM SEETHIS2.

ipynb - Colab

⚠️ Important Notice:
These questions are meant for practice purposes only. They do not represent the actual OPPE
paper pattern, question style, or any official material. Please avoid memorizing the question
patterns or the exact code solutions.

Instead, focus on developing the skill to tackle unfamiliar problems — explore library
documentation, understand concepts, and learn how to apply them in new contexts.

Regards
MLT TAs

keyboard_arrow_down MetaData
Feature Variables
PlayerID: Unique identifier for each player. Age: Age of the player.
Gender: Gender of the player.
Location: Geographic location of the player.
GameGenre: Genre of the game the player is engaged in.
PlayTimeHours: Average hours spent playing per session.
InGamePurchases: Indicates whether the player makes in-game purchases (0 = No, 1 =
Yes).
GameDifficulty: Difficulty level of the game.
SessionsPerWeek: Number of gaming sessions per week.
AvgSessionDurationMinutes: Average duration of each gaming session in minutes.
PlayerLevel: Current level of the player in the game.
AchievementsUnlocked: Number of achievements unlocked by the player.

Target Variable:
EngagementLevel: Indicates the level of player engagement categorized as 'High', 'Medium', or
'Low'.

Use the below link to access the dataset

Dataset V1: https://drive.google.com/file/d/1Z_Ns6fUzUVfGpM8Rqi4QIFzBVx19ZukK/view?

usp=sharing

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 1/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

keyboard_arrow_down Preprocessing
import numpy as np
import pandas as pd

from google.colab import files

uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been
executed in the current browser session. Please rerun this cell to enable.
Saving preprocessing V1 (1).csv to preprocessing V1 (1) (1).csv

df = pd.read_csv('preprocessing V1 (1).csv')

Q1. [marks : 0][MCQ] Which dataset are you using for this exam?

Options:

(A) MLP OPPE2 preprocessing V1.csv

(B) MLP OPPE2 preprocessing V2.csv

(C) MLP OPPE2 preprocessing V3.csv

Answer: V1: A, V2:B, V3: C

keyboard_arrow_down Q2[Marks:4][MCQ] Which of the following columns have object

datatype?

(A) Age

(B) Gender

(D) GameGenre

(E) PlayTimeHours

(F) PlayTimeHours

Ans: B,C,D

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
# Column Non-Null Count Dtype
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 2/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
--- ------ -------------- -----
0 PlayerID 10000 non-null int64
1 Age 9200 non-null float64
2 Gender 10000 non-null object
3 Location 9202 non-null object
4 GameGenre 10000 non-null object
5 PlayTimeHours 10000 non-null float64
6 InGamePurchases 9107 non-null float64
7 GameDifficulty 9154 non-null object
8 SessionsPerWeek 10000 non-null int64
9 AvgSessionDurationMinutes 10000 non-null int64
10 PlayerLevel 10000 non-null int64
11 AchievementsUnlocked 10000 non-null int64
12 EngagementLevel 10000 non-null object
dtypes: float64(3), int64(5), object(5)
memory usage: 1015.8+ KB

df.head(2)

PlayerID Age Gender Location GameGenre PlayTimeHours InGamePurchases GameDi

0 35900 37.0 Male Other Strategy 23.929404 NaN

1 27085 25.0 Male NaN Action 22.755168 1.0

df.InGamePurchases

InGamePurchases

0 NaN

1 1.0

2 0.0

3 NaN

4 1.0

... ...

9995 0.0

9996 0.0

9997 0.0

9998 0.0

9999 1.0

10000 rows × 1 columns

dtype: float64

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 3/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

keyboard_arrow_down Q3 [Marks:4][NAT] In this dataset, how many "Males" from "Europe" have
made "InGamePurchases" ?

Ans: 299

df[(df.Gender=='Male')&(df.Location=='Europe')&(df.InGamePurchases==1.0)].shape[0]

299

df.columns

Index(['PlayerID', 'Age', 'Gender', 'Location', 'GameGenre', 'PlayTimeHours',

'InGamePurchases', 'GameDifficulty', 'SessionsPerWeek',
'AvgSessionDurationMinutes', 'PlayerLevel', 'AchievementsUnlocked',
'EngagementLevel'],
dtype='object')

keyboard_arrow_down Q4 [Marks:4][NAT] In your dataset, how many players under the "Age"
18 have strictly greater than 10 "PlayTimeHours" ?

Ans: 453

df[(df.Age<18)&(df.PlayTimeHours>10)].shape[0]

453

keyboard_arrow_down Q5 [Marks:4][MCQ] Which of the following options represent all the

unique categories present in "GameGenre" feature?

options

(A) [Action, Adventure, Simulation, Sports, Strategy]

(B) [Action, RPG, Racing, Sports, Strategy]

(C) [Action, RPG, Simulation, Sports, Strategy]

(D) [Adventure, Puzzle, RPG, Simulation, Sports]

Ans : C

df.GameGenre.unique()

array(['Strategy', 'Action', 'Simulation', 'RPG', 'Sports'], dtype=object)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 4/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

keyboard_arrow_down Q6.[Marks: 4][NAT] Create feature matrix(X) and label vector(y) using
following instructions:

"EngagementLevel" is the target column(y).

All the columns except the target column are in feature matrix(X).

Compare the correlation value(r) for all numeric (Int,float) Dtype features
pair of the feature matrix(X) and write the Highest Positive Correlation
Value .

Ans: 0.025 (0.023, 0.027)

X = df.drop(columns='EngagementLevel')
y = df.EngagementLevel

corr = X.corr(numeric_only=True)
corr_v = corr.unstack()
corr_v = corr_v[corr_v<1]

corr_v.max()

0.025499335732023704

keyboard_arrow_down Q7[4 Marks][NAT] How many total null values were present in the whole
dataset ?

Ans: 3337

df.isnull().sum().sum()

np.int64(3337)

keyboard_arrow_down Q8 [Marks 5][MCQ] Split the dataset into train dataset and test dataset in
the following manner.

Use sklearn train_test_split function to split the data.

Use only 20% data as test_set and keep random_state = 42

Which category has the least value counts in y_train?

(A) Low

(B) Medium

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 5/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

Ans: C

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

y_train.value_counts()

count

EngagementLevel

Medium 3983

Low 2021

High 1996

dtype: int64

keyboard_arrow_down Common Instructions for Question 9 and 10

Rules for Imputing the missing(NaN) or Unknown values:

Calculating statistical values (such as mean, median, mode) for each column in the
training dataset.

Applying these calculated statistical values to replace missing (NaN) and unknown values
in both the training and test datasets.

Ensure that the calculation of statistical values excludes any rows containing missing or
unknown values.

Replace Unknown values in the "Age" feature with the Mean value in that.

Replace Unknown values in the "Location" feature with the constant value "Other".

Replace Unknown values in the "GameDifficulty" feature with the Most Frequent value in
that.

Replace Unknown values in the "InGamePurchases" feature with the constant value '0'.

Write the answers related to the above imputation in below questions respectively.

df.Location.unique()

array(['Other', nan, 'Europe', 'USA', 'Asia'], dtype=object)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 6/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

df.InGamePurchases.unique()

array([nan, 1., 0.])

age_mean = X_train.Age.mean()

X_train.Age.fillna(age_mean,inplace=True)
X_test.Age.fillna(age_mean,inplace=True)

X_train.Location.fillna('Other',inplace=True)
X_test.Location.fillna('Other',inplace=True)

difficulty_mode=X_train.GameDifficulty.mode()[0]
X_train.GameDifficulty.fillna(difficulty_mode,inplace=True)
X_test.GameDifficulty.fillna(difficulty_mode,inplace=True)

X_train.InGamePurchases.fillna(0,inplace=True)
X_test.InGamePurchases.fillna(0,inplace=True)

/tmp/ipython-input-3245705817.py:3: FutureWarning: A value is trying to be set on a c

The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_train.Age.fillna(age_mean,inplace=True)
/tmp/ipython-input-3245705817.py:4: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_test.Age.fillna(age_mean,inplace=True)
/tmp/ipython-input-3245705817.py:6: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_train.Location.fillna('Other',inplace=True)
/tmp/ipython-input-3245705817.py:7: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_test.Location.fillna('Other',inplace=True)
/tmp/ipython-input-3245705817.py:11: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_train.GameDifficulty.fillna(difficulty_mode,inplace=True)
/tmp/ipython-input-3245705817.py:12: FutureWarning: A value is trying to be set on a

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 7/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_test.GameDifficulty.fillna(difficulty_mode,inplace=True)
/tmp/ipython-input-3245705817.py:15: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_train.InGamePurchases.fillna(0,inplace=True)
/tmp/ipython-input-3245705817.py:16: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({

X_test.InGamePurchases.fillna(0,inplace=True)

keyboard_arrow_down Q9 [Marks: 5][NAT] Write the sum of transformed(imputed) "Age"

column of the test dataset . [NAT] (upto 2 digits after decimal points)

Answer: 63585.23 (Range: 63584,63586)

X_test.Age.sum()

np.float64(63585.239130434784)

Q10 [Marks: 5][NAT] Let's say most frequent category in the

keyboard_arrow_down "GameDifficulty" column of train dataset is "XXX" . What is the value

count of "XXX" in "GameDifficulty" column of test dataset after
imputation ?

Answer: 1102

difficulty_mode
X_test.GameDifficulty.value_counts()

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 8/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

count

GameDifficulty

Easy 1102

Medium 569

Hard 329

dtype: int64

keyboard_arrow_down Apply preprocessing on features of train and test datasets.

Drop the "PlayerID" Column before the preprocessing steps.

before applying any preprocessing there should not be any missing or unknown values
present in the train and test dataset.

Learn transformers' parameters using training set only and then transform train & test sets
using them.

For Categorical Features

Ordinal Features

Ordinally Encode "GameDifficulty"

GameDifficulty Order

Easy 0

Medium 1

Hard 2

Nominal Features

One-Hot Encode 'Gender', 'Location', 'GameGenre' features and keep drop_first =

True.

Scaling Features

Scale all the features (transformed categorical and numerical) of the feature matrix using
the StandardScaler

y_test.isnull().sum()

np.int64(0)

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 9/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

X_train.drop(columns=['PlayerID'], inplace=True)
X_test.drop(columns=['PlayerID'], inplace=True)

from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, StandardScaler

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

num_cols = X_train.select_dtypes(include=['int64', 'float64']).columns

# Fill categorical NaN with 'Other'

cat_cols = X_train.select_dtypes(include=['object']).columns

# 3️⃣ Define categorical feature groups

ordinal_features = ['GameDifficulty']
ordinal_mapping = [['Easy', 'Medium', 'Hard']]

nominal_features = ['Gender', 'Location', 'GameGenre']

# 4️⃣ Create transformers

ordinal_transformer = OrdinalEncoder(categories=ordinal_mapping)
nominal_transformer = OneHotEncoder(drop='first', handle_unknown='ignore')

# 5️⃣ Build ColumnTransformer

preprocessor = ColumnTransformer(
transformers=[
('ord', ordinal_transformer, ordinal_features),
('nom', nominal_transformer, nominal_features),
('num', 'passthrough', num_cols) # numeric features go through directly
]
)

# 6️⃣ Create pipeline with scaling

pipeline = Pipeline([
('preprocessor', preprocessor),
('scaler', StandardScaler())
])

# 7️⃣ Fit on train and transform both sets

X_train_transformed = pipeline.fit_transform(X_train)
X_test_transformed = pipeline.transform(X_test)

# Check shapes
print("Train shape:", X_train_transformed.shape)
print("Test shape:", X_test_transformed.shape)

# ✅ At this stage: X_train_transformed & X_test_transformed are fully processed & scaled

Train shape: (8000, 16)

Test shape: (2000, 16)

X_train_processed = preprocessor.fit_transform(X_train)
X_test_processed = preprocessor.transform(X_test)

Q11[Marks:10][NAT] Calculate the sum of all the values present in first

keyboard_arrow_down five rows of transformed test feature matrix ? (upto 2 digits afer the
decimal)

Answer: -7.1728 (-7.19 , -7.15)

first_five = X_test_transformed[:5]

# Sum all values

total_sum = first_five.sum()

# Round to 2 decimal places

total_sum_2dec = round(total_sum, 2)

print(total_sum_2dec)

-7.17

keyboard_arrow_down Model Building

Set or change the parameters specified in the question, while keeping all other parameters
at their default values.

keyboard_arrow_down Use the below link to access the dataset

Dataset V1: https://drive.google.com/file/d/1IybNLBUwfvsiyEBjDt4iDxa4WOKnyONh/view?

usp=sharing

from google.colab import files

uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been
executed in the current browser session. Please rerun this cell to enable.
Saving ModelBuilding V1 (1).csv to ModelBuilding V1 (1).csv

df = pd.read_csv('ModelBuilding V1 (1).csv')

keyboard_arrow_down Q1. [marks : 0] Which dataset are you using for this Model Building
section?

Options:

A) MLP OPPE2 ModelBuilding V1.csv

B) MLP OPPE2 ModelBuilding V2.csv

C) MLP OPPE2 ModelBuilding V3.csv

Answer: V1: A, V2:B, V3: C

X = df.drop(columns=['EngagementLevel'])
y = df.EngagementLevel

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

Split the dataset into train dataset and test dataset in the following manner.

"EngagementLevel" is the target column(y).

All the columns except the target column are in feature matrix(X).
Use sklearn train_test_split function to split the data.
Use only 20% data as test_set and keep random_state = 42

keyboard_arrow_down Q2 [5 marks][NAT] Take LogisticRegression estimator with following

parameters for training:

Use sag as solver

Set random state to be equal to 42
Tolerance for stopping criteria to be 1e-3
Maximum number of iterations taken for the solvers to converge to be 100

Enter the recall score for class 1 of y_test for the given model using test
set(X_test, y_test)

No Changes

Answer: 0.7044 (0.690, 0.715)

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import recall_score

model = LogisticRegression(solver='sag',
random_state=42,tol=1e-3,max_iter=100)
model.fit(X_train,y_train)

y_pred = model.predict(X_test)

print(recall_score(y_test, y_pred,labels=[1], average='macro'))

Q3 [6 marks] [NAT] Use SGDClassifier on the training dataset

keyboard_arrow_down ( X_train and y_train ) to train the model. Use the following
parameters:

1. log_loss is the loss function to be used

2. apply ridge regularization,
3. maximum number of passes over the training data is 10
4. constant learning rate of 0.01,
5. regularization rate value is 0.001,
6. Take random_state=42.
7. Set warm_start as False

Note : Please ignore the convergence warning.

Using above model, calculate and write the correct value of f1_score for
label class = 2 of the test set.
Version rs=64 rs=42

V1 0.8167 (0.800, 0.825) 0.817 (0.80,0.825)

from sklearn.linear_model import SGDClassifier

from sklearn.metrics import f1_score

# Step 1: Train the model

sgd = SGDClassifier(
loss='log_loss', # logistic regression loss
penalty='l2', # ridge regularization
max_iter=10, # number of passes over the data
learning_rate='constant',
eta0=0.01, # learning rate
alpha=0.001, # regularization strength
random_state=42,
warm_start=False
)

sgd.fit(X_train, y_train)

# Step 2: Predictions
y_pred = sgd.predict(X_test)

# Step 3: f1-score for class label = 2

f1_class2 = f1_score(y_test, y_pred, labels=[2], average='macro')
print(f1_class2)

0.8170854271356784

keyboard_arrow_down Q4. [6 marks] [MCQ] Tune the parameters using Gridsearchcv for below
settings

estimator = KNeighborsClassifier

scoring =accuracy

cv= 5

Consider following parameters for KNeighborsClassifier:

n_neighbors = [19,23,27,31]
metric = "minkowski"
Set p value for minkowski = 2

Keep other parameter values as default values.

What is the best value of K you obtained using the above instructions?

Options

A) 19

B) 23

C) 27

D) 31

Ans: B

from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsClassifier

# Step 1: Model
knn = KNeighborsClassifier(metric='minkowski', p=2)

# Step 2: Parameter grid

param_grid = {
'n_neighbors': [19, 23, 27, 31]
}

# Step 3: GridSearchCV
grid = GridSearchCV(
estimator=knn,
param_grid=param_grid,
scoring='accuracy',
cv=5
)

grid.fit(X_train, y_train)

# Step 4: Best K
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 14/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab

print(grid.best_params_) # {'n_neighbors': XX}

print(grid.best_score_)

{'n_neighbors': 23}
0.6859999999999999

keyboard_arrow_down Q.5 [3 Marks] [MCQ] Fit an SVM classifier with following parameters:

kernel='rbf'
decision_function_shape='ovr'
random_state=42
C=1

Train the model on training data, and choose the option with correct
confusion matrix on test data.

438 17 64
⎡ ⎤
(A) ⎢ 11 407 87 ⎥
⎣ ⎦
36 50 890

434 18 60
⎡ ⎤
(B) ⎢ 13 409 87 ⎥
⎣ ⎦
38 50 891

449 16 75
⎡ ⎤
(C) ⎢ 11 382 101 ⎥
⎣ ⎦
38 45 883

449 16 75
⎡ ⎤

⎢ 11 382 101 ⎥
⎣ ⎦
38 45 883

465 11 57
⎡ ⎤
(D)⎢ 13 424 97 ⎥
⎣ ⎦
23 46 864

Answers: C

from sklearn.svm import SVC

from sklearn.metrics import confusion_matrix

# Step 1: Model
svm_model = SVC(
kernel='rbf',
decision_function_shape='ovr',
random_state=42,
C=1
)

# Step 2: Train
svm_model.fit(X_train, y_train)

# Step 3: Predict
y_pred = svm_model.predict(X_test)

# Step 4: Confusion matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)

Common Instruction for Question 6 to 8

Train a Decision Tree Classifier with the following properties:

criterion = 'entropy'
splitter = 'random'
min_samples_split = 4
min_impurity_decrease = 0.0001
random_state = 42

keyboard_arrow_down Q.6 [MCQ][3 Marks] What is the depth of the trained tree?

Answer: 20

from sklearn.tree import DecisionTreeClassifier

# 1️⃣ Train the Decision Tree

dt = DecisionTreeClassifier(
criterion='entropy',
splitter='random',
min_samples_split=4,
min_impurity_decrease=0.0001,
random_state=42
)
dt.fit(X_train, y_train)

▾ DecisionTreeClassifier i ?

DecisionTreeClassifier(criterion='entropy', min_impurity_decrease=0.0001,
min_samples_split=4, random_state=42, splitter='random')

dt.get_depth()

keyboard_arrow_down Q.7 [NAT][3 Marks] How many nodes are there in the tree?

Answer: 2367 (Range: 2365, 2369)

dt.tree_.node_count

keyboard_arrow_down Q.8 [NAT][5 Marks] What is the value of entropy at the left child of root
node? (correct upto 2 digit after decimal)

Answer: 1.1607 (1.15,1.17)

dt.tree_.impurity[1]

(Common Instructions for Q9,Q10)

Take an adaboost model with following hyperparameter values

and tune it using GridsearchCV.
Use n_estimators as [10,20,30]
random_state = 42
Use learning_rate as [0.5,1,2]
Take cv value= 5

Q9[Marks: 5] [NAT] Train the 'model' using above instructions and use

keyboard_arrow_down the best estimator to calculate the total number of misclassified

samples for the test data and submit the value.

Answer: 436 (Range: 434,438 )

from sklearn.ensemble import AdaBoostClassifier

from sklearn.model_selection import GridSearchCV

# Define model and parameters

ada = AdaBoostClassifier(random_state=42)
param_grid = {'n_estimators':[10,20,30], 'learning_rate':[0.5,1,2]}

# GridSearchCV
grid = GridSearchCV(ada, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)
# Predict and count misclassified samples
misclassified = (y_test != y_pred).sum()
print(misclassified)

keyboard_arrow_down Q10[Marks: 5][MCQ] Choose the value of n_estimators of the best

model after training with GridSearchCV.

Options:

A) 10

B) 20

C) 30

D) None of these

Answer: C

grid.best_estimator_

keyboard_arrow_down Q11. [Marks: 5 ] [NAT] Train with RandomizedSearchCV.

Keep Below Settings for RandomForestClassifier(random_state=42) estimator

n_estimators = range(2,100)
max_depth = range(1,11)
min_impurity_decrease = uniform(loc=0,scale=5)

Keep below settings for RandomizedSearchCV

estimator = RandomForestClassifier(random_state=42)
random_state = 42
n_iter=5
cv=3
n_jobs= -1
verbose=2

hint: from scipy.stats import uniform

Submit the best param value for n_estimators using this

RandomizedSearchCV on train data.

Answer: 16

Q12[Marks 4] [NAT] Write the best param value for

keyboard_arrow_down min_impurity_decrease using this RandomizedSearchCV on train data.

(upto 2 digit after the decimal)

Answers: 3.9827 (Range: 3.95,4.00)

from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
import numpy as np

# Step 1: Define the model

rf = RandomForestClassifier(random_state=42)

# Step 2: Define the parameter distributions

param_dist = {
'n_estimators': range(2, 100),
'max_depth': range(1, 11),
'min_impurity_decrease': uniform(loc=0, scale=5)
}

# Step 3: RandomizedSearchCV
rand_search = RandomizedSearchCV(
estimator=rf,
param_distributions=param_dist,
n_iter=5,
cv=3,
random_state=42,
n_jobs=-1,
verbose=2
)

# Step 4: Fit on training data

rand_search.fit(X_train, y_train)

# Step 5: Best parameters

best_n_estimators = rand_search.best_params_['n_estimators']
best_min_impurity_decrease = rand_search.best_params_['min_impurity_decrease']

print("Best n_estimators:", best_n_estimators)

print("Best min_impurity_decrease:", round(best_min_impurity_decrease, 2))

https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 20/20

Pub G Analysis
No ratings yet
Pub G Analysis
14 pages
Importing Libraries: Import As Import As Import As Import As Import As Import
No ratings yet
Importing Libraries: Import As Import As Import As Import As Import As Import
13 pages
DAT p3 EDA - Ipynb - Colaboratory
No ratings yet
DAT p3 EDA - Ipynb - Colaboratory
3 pages
Data Science Sample
No ratings yet
Data Science Sample
5 pages
DS-Assignment 3-CCP S2025
No ratings yet
DS-Assignment 3-CCP S2025
11 pages
Matrix Operations and Data Analysis in Python
No ratings yet
Matrix Operations and Data Analysis in Python
38 pages
Analyzing Online Gaming Behavior
No ratings yet
Analyzing Online Gaming Behavior
16 pages
Class 12 IP Practical File Certificate 2024-25
100% (7)
Class 12 IP Practical File Certificate 2024-25
22 pages
MATLAB and Python Data Analysis Techniques
No ratings yet
MATLAB and Python Data Analysis Techniques
7 pages
Exemplar - Perform Feature Engineering
No ratings yet
Exemplar - Perform Feature Engineering
14 pages
Python Lab Manual
No ratings yet
Python Lab Manual
33 pages
Ip 12th Practical
No ratings yet
Ip 12th Practical
22 pages
ML Lab A1 A4
No ratings yet
ML Lab A1 A4
6 pages
Practical File 12.
No ratings yet
Practical File 12.
22 pages
Data8 sp22 Midterm Solution
No ratings yet
Data8 sp22 Midterm Solution
16 pages
Proje
No ratings yet
Proje
140 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Practical Record File X - DS
No ratings yet
Practical Record File X - DS
12 pages
Informatics Practices Project File PDF
0% (1)
Informatics Practices Project File PDF
45 pages
Task Statistics
No ratings yet
Task Statistics
4 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Age Analysis of Aerofit Treadmill Users
No ratings yet
Age Analysis of Aerofit Treadmill Users
7 pages
Data Exploration
No ratings yet
Data Exploration
9 pages
TDS
No ratings yet
TDS
36 pages
7 Online Mandatory 62 62 70 Yes No Yes 0 1 64065380387 No Null
No ratings yet
7 Online Mandatory 62 62 70 Yes No Yes 0 1 64065380387 No Null
38 pages
TDS (1) - Merged
No ratings yet
TDS (1) - Merged
144 pages
23HCS4142 PDF
No ratings yet
23HCS4142 PDF
24 pages
Mock Project Repor
No ratings yet
Mock Project Repor
7 pages
Python Programming Practical File
No ratings yet
Python Programming Practical File
28 pages
Data Management with Pandas in Retail
No ratings yet
Data Management with Pandas in Retail
1 page
B - 59 - SMA - Exp 4
No ratings yet
B - 59 - SMA - Exp 4
9 pages
Business Report PM Suchita Bhovar March 10 2024
No ratings yet
Business Report PM Suchita Bhovar March 10 2024
27 pages
Quiz Complete
No ratings yet
Quiz Complete
4 pages
Data Science Lab Experiments
No ratings yet
Data Science Lab Experiments
32 pages
October Loyalty Points Analysis
No ratings yet
October Loyalty Points Analysis
15 pages
BDA University Question Paper
No ratings yet
BDA University Question Paper
10 pages
Surya, Aaron, Savanth Ip Project
No ratings yet
Surya, Aaron, Savanth Ip Project
25 pages
Divp Pyq 2023
No ratings yet
Divp Pyq 2023
7 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Python 1
No ratings yet
Python 1
16 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
All Merged Revised
No ratings yet
All Merged Revised
128 pages
Titanic Data Cleaning & Analysis
No ratings yet
Titanic Data Cleaning & Analysis
4 pages
Data Analysis and Visualization Course
No ratings yet
Data Analysis and Visualization Course
4 pages
AI Practical 05
No ratings yet
AI Practical 05
18 pages
DM Practice Problem Set-2
No ratings yet
DM Practice Problem Set-2
7 pages
HW 4
No ratings yet
HW 4
11 pages
Question Bank Class XII IP 065 Long Question Answer
No ratings yet
Question Bank Class XII IP 065 Long Question Answer
35 pages
Ip Final Practical File
No ratings yet
Ip Final Practical File
22 pages
Data Analysis & Visualization Tasks
No ratings yet
Data Analysis & Visualization Tasks
47 pages
Video Game Sales Analysis Report
No ratings yet
Video Game Sales Analysis Report
18 pages
ML Lab Manual 2024
No ratings yet
ML Lab Manual 2024
41 pages
CSC 240 HW 2
No ratings yet
CSC 240 HW 2
5 pages
Data Analysis & Visualization Exam Paper
No ratings yet
Data Analysis & Visualization Exam Paper
7 pages
Astros
No ratings yet
Astros
20 pages
Programming & Data Visualization
No ratings yet
Programming & Data Visualization
11 pages
PQC Metadata1
No ratings yet
PQC Metadata1
1 page
Week 1
No ratings yet
Week 1
16 pages
PDSA
No ratings yet
PDSA
23 pages
Kanish Stores Presentation
No ratings yet
Kanish Stores Presentation
11 pages
See This - Ipynb - Colab
No ratings yet
See This - Ipynb - Colab
12 pages
ERP - EDU Automate Brochure June 2025 Latest
No ratings yet
ERP - EDU Automate Brochure June 2025 Latest
10 pages
Joint Regional Security Stacks (JRSS)
No ratings yet
Joint Regional Security Stacks (JRSS)
2 pages
1Z0 931 24 Demo
No ratings yet
1Z0 931 24 Demo
5 pages
MikroTik Router OS BGP RTBH and DDOS Mitigation (PDFDrive)
No ratings yet
MikroTik Router OS BGP RTBH and DDOS Mitigation (PDFDrive)
118 pages
LNMIIT B.Tech ECE Curriculum Overview
No ratings yet
LNMIIT B.Tech ECE Curriculum Overview
4 pages
Iot Domain Analyst-Ece3502
No ratings yet
Iot Domain Analyst-Ece3502
15 pages
Differences Between Traditional & Digital Libraries
No ratings yet
Differences Between Traditional & Digital Libraries
2 pages
ANSYS Installation Guide for Win7
100% (1)
ANSYS Installation Guide for Win7
2 pages
File Processing System Issues and DBMS
No ratings yet
File Processing System Issues and DBMS
154 pages
2G 3G Guides KPI
No ratings yet
2G 3G Guides KPI
193 pages
Internet and Its Functionality
No ratings yet
Internet and Its Functionality
9 pages
Tender Details
No ratings yet
Tender Details
2 pages
Securing Information System: Team Paradox 14 Batch, Department of Marketing Jagannath University
No ratings yet
Securing Information System: Team Paradox 14 Batch, Department of Marketing Jagannath University
16 pages
Vibhu Saxena: IT Consultant & Project Manager
No ratings yet
Vibhu Saxena: IT Consultant & Project Manager
2 pages
Unit 2processes: 2.1. Learning Outcomes
No ratings yet
Unit 2processes: 2.1. Learning Outcomes
26 pages
Operator'S Manual: KP-6299C/8299C/1299C
No ratings yet
Operator'S Manual: KP-6299C/8299C/1299C
83 pages
B.tech Artificial Intelligence Curriculum and Syllabus
No ratings yet
B.tech Artificial Intelligence Curriculum and Syllabus
282 pages
Functionality Testing Scenarios Numbered
No ratings yet
Functionality Testing Scenarios Numbered
2 pages
2011 ZX-10r FI Cal Tool Manual
100% (1)
2011 ZX-10r FI Cal Tool Manual
101 pages
Pose Estimation and Experimental Study
No ratings yet
Pose Estimation and Experimental Study
6 pages
Examples: Lect03.ppt S-38.145 - Introduction To Teletraffic Theory - Spring 2005
No ratings yet
Examples: Lect03.ppt S-38.145 - Introduction To Teletraffic Theory - Spring 2005
53 pages
Phase Task Sources Completion Date: Hiring Plan
No ratings yet
Phase Task Sources Completion Date: Hiring Plan
1 page
Manual CANOCO 5
No ratings yet
Manual CANOCO 5
25 pages
Unspsc Guide
No ratings yet
Unspsc Guide
5 pages
Dell Precision T5600 Spec Sheet PDF
No ratings yet
Dell Precision T5600 Spec Sheet PDF
2 pages
Troubleshooting An IPsec VPN Issue On A Sophos Firewall
No ratings yet
Troubleshooting An IPsec VPN Issue On A Sophos Firewall
3 pages
LTE Module-2 Notes
No ratings yet
LTE Module-2 Notes
35 pages
Anna Univ Exam Timetable 2020
No ratings yet
Anna Univ Exam Timetable 2020
24 pages
Data Structures in Programming Languages
No ratings yet
Data Structures in Programming Languages
11 pages
Emerging Trends in Civil Engineering: Unit 1-Mcq
80% (5)
Emerging Trends in Civil Engineering: Unit 1-Mcq
5 pages