SEETHIS2.ipynb - Colab
SEETHIS2.ipynb - Colab
ipynb - Colab
⚠️ Important Notice:
These questions are meant for practice purposes only. They do not represent the actual OPPE
paper pattern, question style, or any official material. Please avoid memorizing the question
patterns or the exact code solutions.
Instead, focus on developing the skill to tackle unfamiliar problems — explore library
documentation, understand concepts, and learn how to apply them in new contexts.
Regards
MLT TAs
keyboard_arrow_down MetaData
Feature Variables
PlayerID: Unique identifier for each player. Age: Age of the player.
Gender: Gender of the player.
Location: Geographic location of the player.
GameGenre: Genre of the game the player is engaged in.
PlayTimeHours: Average hours spent playing per session.
InGamePurchases: Indicates whether the player makes in-game purchases (0 = No, 1 =
Yes).
GameDifficulty: Difficulty level of the game.
SessionsPerWeek: Number of gaming sessions per week.
AvgSessionDurationMinutes: Average duration of each gaming session in minutes.
PlayerLevel: Current level of the player in the game.
AchievementsUnlocked: Number of achievements unlocked by the player.
Target Variable:
EngagementLevel: Indicates the level of player engagement categorized as 'High', 'Medium', or
'Low'.
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 1/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
keyboard_arrow_down Preprocessing
import numpy as np
import pandas as pd
Choose Files No file chosen Upload widget is only available when the cell has been
executed in the current browser session. Please rerun this cell to enable.
Saving preprocessing V1 (1).csv to preprocessing V1 (1) (1).csv
df = pd.read_csv('preprocessing V1 (1).csv')
Q1. [marks : 0][MCQ] Which dataset are you using for this exam?
Options:
(A) Age
(B) Gender
(C) Location
(D) GameGenre
(E) PlayTimeHours
(F) PlayTimeHours
Ans: B,C,D
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
# Column Non-Null Count Dtype
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 2/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
--- ------ -------------- -----
0 PlayerID 10000 non-null int64
1 Age 9200 non-null float64
2 Gender 10000 non-null object
3 Location 9202 non-null object
4 GameGenre 10000 non-null object
5 PlayTimeHours 10000 non-null float64
6 InGamePurchases 9107 non-null float64
7 GameDifficulty 9154 non-null object
8 SessionsPerWeek 10000 non-null int64
9 AvgSessionDurationMinutes 10000 non-null int64
10 PlayerLevel 10000 non-null int64
11 AchievementsUnlocked 10000 non-null int64
12 EngagementLevel 10000 non-null object
dtypes: float64(3), int64(5), object(5)
memory usage: 1015.8+ KB
df.head(2)
df.InGamePurchases
InGamePurchases
0 NaN
1 1.0
2 0.0
3 NaN
4 1.0
... ...
9995 0.0
9996 0.0
9997 0.0
9998 0.0
9999 1.0
dtype: float64
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 3/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
keyboard_arrow_down Q3 [Marks:4][NAT] In this dataset, how many "Males" from "Europe" have
made "InGamePurchases" ?
Ans: 299
df[(df.Gender=='Male')&(df.Location=='Europe')&(df.InGamePurchases==1.0)].shape[0]
299
df.columns
keyboard_arrow_down Q4 [Marks:4][NAT] In your dataset, how many players under the "Age"
18 have strictly greater than 10 "PlayTimeHours" ?
Ans: 453
df[(df.Age<18)&(df.PlayTimeHours>10)].shape[0]
453
options
Ans : C
df.GameGenre.unique()
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 4/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
keyboard_arrow_down Q6.[Marks: 4][NAT] Create feature matrix(X) and label vector(y) using
following instructions:
Compare the correlation value(r) for all numeric (Int,float) Dtype features
pair of the feature matrix(X) and write the Highest Positive Correlation
Value .
X = df.drop(columns='EngagementLevel')
y = df.EngagementLevel
corr = X.corr(numeric_only=True)
corr_v = corr.unstack()
corr_v = corr_v[corr_v<1]
corr_v.max()
0.025499335732023704
keyboard_arrow_down Q7[4 Marks][NAT] How many total null values were present in the whole
dataset ?
Ans: 3337
df.isnull().sum().sum()
np.int64(3337)
keyboard_arrow_down Q8 [Marks 5][MCQ] Split the dataset into train dataset and test dataset in
the following manner.
(A) Low
(B) Medium
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 5/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
(C) High
Ans: C
y_train.value_counts()
count
EngagementLevel
Medium 3983
Low 2021
High 1996
dtype: int64
Calculating statistical values (such as mean, median, mode) for each column in the
training dataset.
Applying these calculated statistical values to replace missing (NaN) and unknown values
in both the training and test datasets.
Ensure that the calculation of statistical values excludes any rows containing missing or
unknown values.
Replace Unknown values in the "Age" feature with the Mean value in that.
Replace Unknown values in the "Location" feature with the constant value "Other".
Replace Unknown values in the "GameDifficulty" feature with the Most Frequent value in
that.
Replace Unknown values in the "InGamePurchases" feature with the constant value '0'.
Write the answers related to the above imputation in below questions respectively.
df.Location.unique()
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 6/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
df.InGamePurchases.unique()
age_mean = X_train.Age.mean()
X_train.Age.fillna(age_mean,inplace=True)
X_test.Age.fillna(age_mean,inplace=True)
X_train.Location.fillna('Other',inplace=True)
X_test.Location.fillna('Other',inplace=True)
difficulty_mode=X_train.GameDifficulty.mode()[0]
X_train.GameDifficulty.fillna(difficulty_mode,inplace=True)
X_test.GameDifficulty.fillna(difficulty_mode,inplace=True)
X_train.InGamePurchases.fillna(0,inplace=True)
X_test.InGamePurchases.fillna(0,inplace=True)
X_train.Age.fillna(age_mean,inplace=True)
/tmp/ipython-input-3245705817.py:4: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t
X_test.Age.fillna(age_mean,inplace=True)
/tmp/ipython-input-3245705817.py:6: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t
X_train.Location.fillna('Other',inplace=True)
/tmp/ipython-input-3245705817.py:7: FutureWarning: A value is trying to be set on a c
The behavior will change in pandas 3.0. This inplace method will never work because t
X_test.Location.fillna('Other',inplace=True)
/tmp/ipython-input-3245705817.py:11: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t
X_train.GameDifficulty.fillna(difficulty_mode,inplace=True)
/tmp/ipython-input-3245705817.py:12: FutureWarning: A value is trying to be set on a
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 7/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
The behavior will change in pandas 3.0. This inplace method will never work because t
X_test.GameDifficulty.fillna(difficulty_mode,inplace=True)
/tmp/ipython-input-3245705817.py:15: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t
X_train.InGamePurchases.fillna(0,inplace=True)
/tmp/ipython-input-3245705817.py:16: FutureWarning: A value is trying to be set on a
The behavior will change in pandas 3.0. This inplace method will never work because t
X_test.InGamePurchases.fillna(0,inplace=True)
X_test.Age.sum()
np.float64(63585.239130434784)
Answer: 1102
difficulty_mode
X_test.GameDifficulty.value_counts()
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 8/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
count
GameDifficulty
Easy 1102
Medium 569
Hard 329
dtype: int64
before applying any preprocessing there should not be any missing or unknown values
present in the train and test dataset.
Learn transformers' parameters using training set only and then transform train & test sets
using them.
Ordinal Features
GameDifficulty Order
Easy 0
Medium 1
Hard 2
Nominal Features
Scaling Features
Scale all the features (transformed categorical and numerical) of the feature matrix using
the StandardScaler
y_test.isnull().sum()
np.int64(0)
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 9/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
X_train.drop(columns=['PlayerID'], inplace=True)
X_test.drop(columns=['PlayerID'], inplace=True)
# Check shapes
print("Train shape:", X_train_transformed.shape)
print("Test shape:", X_test_transformed.shape)
# ✅ At this stage: X_train_transformed & X_test_transformed are fully processed & scaled
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 10/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
X_train_processed = preprocessor.fit_transform(X_train)
X_test_processed = preprocessor.transform(X_test)
keyboard_arrow_down five rows of transformed test feature matrix ? (upto 2 digits afer the
decimal)
first_five = X_test_transformed[:5]
print(total_sum_2dec)
-7.17
Choose Files No file chosen Upload widget is only available when the cell has been
executed in the current browser session. Please rerun this cell to enable.
Saving ModelBuilding V1 (1).csv to ModelBuilding V1 (1).csv
df = pd.read_csv('ModelBuilding V1 (1).csv')
keyboard_arrow_down Q1. [marks : 0] Which dataset are you using for this Model Building
section?
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 11/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
Options:
X = df.drop(columns=['EngagementLevel'])
y = df.EngagementLevel
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
Split the dataset into train dataset and test dataset in the following manner.
Enter the recall score for class 1 of y_test for the given model using test
set(X_test, y_test)
No Changes
model = LogisticRegression(solver='sag',
random_state=42,tol=1e-3,max_iter=100)
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 12/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
keyboard_arrow_down ( X_train and y_train ) to train the model. Use the following
parameters:
Using above model, calculate and write the correct value of f1_score for
label class = 2 of the test set.
Version rs=64 rs=42
sgd.fit(X_train, y_train)
# Step 2: Predictions
y_pred = sgd.predict(X_test)
0.8170854271356784
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 13/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
keyboard_arrow_down Q4. [6 marks] [MCQ] Tune the parameters using Gridsearchcv for below
settings
estimator = KNeighborsClassifier
scoring =accuracy
cv= 5
n_neighbors = [19,23,27,31]
metric = "minkowski"
Set p value for minkowski = 2
What is the best value of K you obtained using the above instructions?
Options
A) 19
B) 23
C) 27
D) 31
Ans: B
# Step 1: Model
knn = KNeighborsClassifier(metric='minkowski', p=2)
# Step 3: GridSearchCV
grid = GridSearchCV(
estimator=knn,
param_grid=param_grid,
scoring='accuracy',
cv=5
)
grid.fit(X_train, y_train)
# Step 4: Best K
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 14/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
{'n_neighbors': 23}
0.6859999999999999
keyboard_arrow_down Q.5 [3 Marks] [MCQ] Fit an SVM classifier with following parameters:
kernel='rbf'
decision_function_shape='ovr'
random_state=42
C=1
Train the model on training data, and choose the option with correct
confusion matrix on test data.
438 17 64
⎡ ⎤
(A) ⎢ 11 407 87 ⎥
⎣ ⎦
36 50 890
434 18 60
⎡ ⎤
(B) ⎢ 13 409 87 ⎥
⎣ ⎦
38 50 891
449 16 75
⎡ ⎤
(C) ⎢ 11 382 101 ⎥
⎣ ⎦
38 45 883
449 16 75
⎡ ⎤
⎢ 11 382 101 ⎥
⎣ ⎦
38 45 883
465 11 57
⎡ ⎤
(D)⎢ 13 424 97 ⎥
⎣ ⎦
23 46 864
Answers: C
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 15/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
# Step 1: Model
svm_model = SVC(
kernel='rbf',
decision_function_shape='ovr',
random_state=42,
C=1
)
# Step 2: Train
svm_model.fit(X_train, y_train)
# Step 3: Predict
y_pred = svm_model.predict(X_test)
criterion = 'entropy'
splitter = 'random'
min_samples_split = 4
min_impurity_decrease = 0.0001
random_state = 42
keyboard_arrow_down Q.6 [MCQ][3 Marks] What is the depth of the trained tree?
Answer: 20
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 16/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
▾ DecisionTreeClassifier i ?
DecisionTreeClassifier(criterion='entropy', min_impurity_decrease=0.0001,
min_samples_split=4, random_state=42, splitter='random')
dt.get_depth()
20
keyboard_arrow_down Q.7 [NAT][3 Marks] How many nodes are there in the tree?
dt.tree_.node_count
keyboard_arrow_down Q.8 [NAT][5 Marks] What is the value of entropy at the left child of root
node? (correct upto 2 digit after decimal)
dt.tree_.impurity[1]
Q9[Marks: 5] [NAT] Train the 'model' using above instructions and use
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 17/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
# GridSearchCV
grid = GridSearchCV(ada, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)
# Predict and count misclassified samples
misclassified = (y_test != y_pred).sum()
print(misclassified)
Options:
A) 10
B) 20
C) 30
D) None of these
Answer: C
grid.best_estimator_
n_estimators = range(2,100)
max_depth = range(1,11)
min_impurity_decrease = uniform(loc=0,scale=5)
estimator = RandomForestClassifier(random_state=42)
random_state = 42
n_iter=5
cv=3
n_jobs= -1
verbose=2
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 18/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
Answer: 16
# Step 3: RandomizedSearchCV
rand_search = RandomizedSearchCV(
estimator=rf,
param_distributions=param_dist,
n_iter=5,
cv=3,
random_state=42,
n_jobs=-1,
verbose=2
)
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 19/20
8/16/25, 10:38 AM SEETHIS2.ipynb - Colab
https://colab.research.google.com/drive/1vMyIheuwOUe9ECmxpbI0ZV2lIghtXhLZ#printMode=true 20/20