Introduction To Machine Learning
Introduction To Machine Learning
Here are the answers to all questions from both the Winter-2024 and
Summer-2025 "Introduction to Machine Learning" exam papers.
(c) OR: List three popular tools or technologies used in machine learning and explain
their significance. (7 Marks)
1. Scikit-learn:
○ Description: A powerful and user-friendly Python library for general-purpose
machine learning.
○ Significance: It is extremely popular for beginners and experts because it
provides a wide range of algorithms for classification, regression, and
clustering with a simple and consistent interface (.fit(), .predict()). It's the go-to
tool for most traditional ML tasks.
2. TensorFlow & PyTorch:
○ Description: Open-source frameworks specifically designed for deep
learning and building neural networks.
○ Significance: These are the industry standards for complex tasks like image
recognition, natural language processing (NLP), and AI research. They allow
developers to build, train, and deploy large-scale neural network models
efficiently, especially with GPU support.
3. Pandas:
○ Description: A Python library for data manipulation and analysis.
○ Significance: Machine learning is impossible without clean, well-structured
data. Pandas provides the fundamental data structure, the DataFrame, which
allows data scientists to easily load, clean, filter, transform, and analyze data
before feeding it into a machine learning model.
(b) Create a bar plot using Matplotlib with the following data: x=['Rohit', 'Virat',
'Shikhar', 'Gill'], y=[45, 89, 13, 54]. Label the X-axis as "Player" and y-axis as "Score".
(4 Marks)
Python
# Data
x = ['Rohit', 'Virat', 'Shikhar', 'Gill']
y = [45, 89, 13, 54]
# Add labels
plt.xlabel("Player")
plt.ylabel("Score")
# Add a title
plt.title("Player Scores")
1) To find the maximum and minimum value of a given any single dimensional array
2) To compute the mean, standard deviation, and variance of a given array along the second
axis.
Python
import numpy as np
print(f"Array: {arr_1d}")
print(f"Maximum value: {max_val}")
print(f"Minimum value: {min_val}")
print("\n" + "="*40 + "\n")
# The second axis (axis=1) means we compute stats for each row
mean_val = np.mean(arr_2d, axis=1)
std_dev = np.std(arr_2d, axis=1)
variance = np.var(arr_2d, axis=1)
print(f"2D Array:\n{arr_2d}")
print(f"Mean along axis=1 (for each row): {mean_val}")
print(f"Standard Deviation along axis=1: {std_dev}")
print(f"Variance along axis=1: {variance}")
(OR Q.2 a) Create a NumPy array with values [9,8,7,6,5,4]. Access the third element of
the array. (3 Marks)
Python
import numpy as np
(OR Q.2 b) Write and explain syntax of following operation in Pandas Data Frame: 1)
Remove Duplicate Rows 2) Clean Empty Cells (NaN values). (4 Marks)
1. Remove Duplicate Rows
○ Syntax: df.drop_duplicates()
○ Explanation: This method returns a new DataFrame with duplicate rows
removed. By default, it considers all columns to identify duplicates. It keeps
the first occurrence of a duplicated row.
2. Clean Empty Cells (NaN values)
○ Syntax: df.dropna()
○ Explanation: This method removes rows (or columns) that contain missing
values (NaN). By default, it drops any row that has at least one missing value.
You can use df.dropna(axis=1) to drop columns with missing values instead.
(OR Q.2 c) List and explain steps involved in building a model in scikit-learn? How
can you load a dataset in scikit-learn? (7 Marks)
Scikit-learn comes with several small, built-in datasets for practice. You can load them using
specific functions from the sklearn.datasets module.
Python
The main purpose of dimensionality reduction is to reduce the number of input variables
(features) in a dataset. This is important for several reasons:
● Reduces Overfitting: Fewer features can lead to a simpler model that generalizes
better to new data.
● Improves Performance: Training models is computationally faster with fewer
dimensions.
● Handles the "Curse of Dimensionality": In very high dimensions, data becomes
sparse, making it difficult for models to find patterns. Reducing dimensions can make
the data denser and patterns easier to find.
● Better Visualization: It's impossible to visualize data in more than 3 dimensions.
Reducing it to 2D or 3D allows for plotting and visual inspection.
(OR Q.3 c) Explain steps involved in Preparing the Model activity in machine learning.
(7 Marks)
This phase, also known as Data Preprocessing, is a crucial set of steps taken to prepare
the raw data for a machine learning model.
1. Data Cleaning: This involves handling imperfections in the data.
○ Handling Missing Values: Either removing rows/columns with missing data
(dropna()) or filling them in with a value like the mean, median, or mode
(fillna()).
○ Handling Outliers: Identifying and dealing with data points that are
abnormally different from others.
2. Data Transformation: This involves converting data into a suitable format.
○ Feature Scaling: Scaling numerical features to a common range (e.g., using
Standardization or Normalization) so that no single feature dominates the
learning process.
○ Encoding Categorical Data: Converting categorical features (like 'Red',
'Green') into numbers (e.g., using One-Hot Encoding or Label Encoding)
because models can only process numerical data.
3. Feature Selection / Engineering:
○ Selecting the most relevant features for the model to reduce complexity and
improve accuracy.
○ Creating new features from existing ones if needed.
4. Splitting the Dataset:
○ Dividing the processed data into a training set (used to train the model) and
a testing set (used to evaluate the model's performance on unseen data).
This is critical to check for overfitting.
Support Vector Machine (SVM) is a powerful supervised algorithm used for classification.
● Main Goal: The primary objective of SVM is to find the optimal hyperplane that best
separates the data points of different classes in the feature space.
● Hyperplane: A hyperplane is a decision boundary. In a 2D space, it's a line; in a 3D
space, it's a plane.
● Margin: SVM doesn't just find any hyperplane; it finds the one that has the
maximum margin. The margin is the distance between the hyperplane and the
nearest data points from either class. A larger margin leads to better generalization
and a more robust classifier.
● Support Vectors: The data points that are closest to the hyperplane and which
define the margin are called support vectors. These are the most critical data points
in the dataset.
● Kernel Trick: For data that is not linearly separable, SVM can use a "kernel trick" to
map the data into a higher dimension where a linear separator can be found. This
makes SVM very effective for complex, non-linear problems.
equation: .
(OR Q.4 b) Define Decision Trees algorithm. Explain Terminologies of Decision Trees.
(4 Marks)
A Decision Tree is a supervised learning algorithm that works by splitting the data into
smaller and smaller subsets based on a series of questions about the features. It creates a
tree-like model of decisions.
● Terminologies:
○ Root Node: The topmost node in the tree, representing the entire dataset.
○ Decision Node: A node that splits into two or more sub-nodes. It represents
a test on a feature.
○ Leaf / Terminal Node: A node that does not split further. It represents the
final decision or class label.
○ Splitting: The process of dividing a node into sub-nodes.
○ Branch / Sub-Tree: A subsection of the entire tree.
○ Pruning: The process of removing sub-nodes of a decision node to reduce
the complexity of the tree and prevent overfitting.
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
if prediction[0] == 0:
print("The new animal is classified as a 'Cat'.")
else:
print("The new animal is classified as a 'Dog'.")
(b) Give the difference between supervised and unsupervised machine learning. (4
Marks)
Input Data Uses labeled data (features + Uses unlabeled data (only
correct answers). features).
(OR Q.5 b) Explain Clustering and briefly list techniques of clustering. (4 Marks)
● Clustering: It is the task of grouping a set of objects (data points) in such a way that
objects in the same group (called a cluster) are more similar to each other than to
those in other groups. It is a fundamental technique in unsupervised learning.
● Clustering Techniques:
○ Partitioning Methods: These methods divide the data into a pre-determined
number of non-overlapping clusters. (e.g., K-Means).
○ Hierarchical Methods: These methods create a tree-like hierarchy of
clusters. They can be agglomerative (bottom-up) or divisive (top-down). (e.g.,
Agglomerative Clustering).
○ Density-Based Methods: These methods connect areas of high data point
density into clusters, allowing for arbitrarily shaped clusters and identifying
outliers. (e.g., DBSCAN).
(OR Q.5 c) Define Association. Explain step by step process of Association. (7 Marks)
● Definition: Association is a rule-based machine learning method for discovering
interesting relationships or "associations" between variables in large datasets. The
classic example is "Market Basket Analysis."
● Step-by-Step Process (using the Apriori algorithm concept):
1. Define Metrics: The process relies on three key metrics:
■ Support: How frequently an item or itemset appears in the dataset.
■ Confidence: The likelihood that item B is purchased when item A is
purchased.
■ Lift: The increase in the ratio of the sale of B when A is sold.
2. Set Minimum Thresholds: Define minimum thresholds for support and
confidence to filter out uninteresting rules.
3. Find Frequent Itemsets: The algorithm first scans the dataset to find all the
individual items that meet the minimum support threshold. These are the
"frequent 1-itemsets."
4. Generate Candidate Itemsets: It then joins the frequent 1-itemsets to create
"candidate 2-itemsets" and checks if they meet the minimum support. This
process is repeated iteratively (generating 3-itemsets from frequent
2-itemsets, and so on) until no more frequent itemsets can be found. This
uses the Apriori Principle: if an itemset is frequent, then all of its subsets
must also be frequent.
5. Generate Association Rules: From the final list of frequent itemsets, the
algorithm generates association rules (e.g., {A} -> {B}) that meet the minimum
confidence threshold. For example, the rule {Bread, Butter} -> {Milk} is
generated from the frequent itemset {Bread, Butter, Milk}.
(b) Give differences between Machine Learning and Human Learning. (4 Marks)
● Speed & Scale: Machines can process vast amounts of data much faster than
humans.
● Accuracy & Consistency: For specific, repetitive tasks, ML models can be more
accurate and are perfectly consistent, whereas humans are prone to fatigue and
error.
● Adaptability: Humans are far superior at adapting to new, unseen situations and can
learn from very limited information. ML models need to be retrained on new data.
● Data Dependency: ML is highly dependent on the quality and quantity of data.
Human learning can occur through abstract reasoning and intuition with little data.
(c) Write a python program to implement any five maths functions in Numpy. (7 Marks)
Python
import numpy as np
# 1. Square Root
sqrt_a = np.sqrt(a)
print(f"Original array a: {a}")
print(f"1. Square root of a: {sqrt_a}\n")
(OR Q.1 c) How to define an array in Numpy? Create two Numpy arrays: 1. Array filled
with all zeros and 2. Array filled with all ones. Combine both in single array and
display their elements. (7 Marks)
In NumPy, an array is defined using the np.array() function, which converts a list-like
structure into a NumPy array.
Python
import numpy as np
# 1. Create an array filled with all zeros
# Let's make a 2x3 array of zeros
zeros_array = np.zeros((2, 3))
print("1. Array of all zeros:")
print(zeros_array)
print("-" * 20)
Python
import pandas as pd
(b) Which type of machine learning system should you use to learn spam email
detection? Brief about selected model. (4 Marks)
● Type of ML System: Supervised Learning, specifically a Classification task. This
is because we have historical data of emails that are already labeled as either 'spam'
or 'not spam'.
● Selected Model: A Naive Bayes classifier is an excellent and common choice for
spam detection.
○ Brief: It works on Bayes' theorem of probability. It calculates the probability of
an email being 'spam' given the presence of certain words in it (e.g., "free,"
"winner," "prize"). It's called "naive" because it assumes that the presence of
one word is independent of another, which is a simplifying assumption but
works very well in practice for text classification.
Difference:
● plt.show(): This function displays the plot in a pop-up window on your screen. It's
used for interactive viewing.
● plt.savefig('filename.png'): This function saves the current plot to a file on your
computer's disk (e.g., as a PNG, JPG, or PDF). It does not display the plot on the
screen.
Program:
Python
Python
import pandas as pd
(OR Q.2 b) Which type of machine learning system should you use to make a robot
learn how to walk? Brief about selected model. (4 Marks)
● Type of ML System: Reinforcement Learning (RL).
● Brief: In RL, an agent (the robot) learns to behave in an environment (the physical
world) by performing actions (moving its motors) and observing the rewards or
penalties it receives.
○ For a walking robot, a positive reward could be given for moving forward
without falling over.
○ A negative reward (penalty) would be given for falling down.
○ Through countless trials and errors, the robot learns a policy (a strategy of
which actions to take) that maximizes its cumulative reward, which in this
case, corresponds to successful walking.
(OR Q.2 c) Differentiate between Numpy and Pandas. Create a series having the
names of 3 students in your class and assign their roll numbers as index values. Also
use attributes: index, dtype, shape, ndim with series. (7 Marks)
Difference:
| Primary Data Structure | ndarray (n-dimensional array) | Series (1D) and DataFrame (2D) |
| Data Type | Homogeneous (all elements must be the same type). | Heterogeneous
(columns can have different types). |
| Use Case | Optimized for fast numerical and mathematical operations. | Best for data
cleaning, manipulation, and analysis of tabular data. |
Program:
Python
import pandas as pd
print("Student Series:")
print(student_series)
print("-" * 20)
# Use attributes
print(f"Index: {student_series.index}")
print(f"Data Type (dtype): {student_series.dtype}")
print(f"Shape: {student_series.shape}")
print(f"Number of Dimensions (ndim): {student_series.ndim}")
Detailed Process:
1. Shuffle the Dataset: The first step is to randomly shuffle the entire dataset to ensure
that the data is not ordered in any way.
2. Split into K Folds: The dataset is then split into K equal-sized, non-overlapping
subsets called "folds." A common value for K is 5 or 10.
3. Iterate K Times: The process iterates K times. In each iteration:
○ Select Validation Fold: One of the K folds is chosen as the validation set (or
test set for that iteration).
○ Select Training Folds: The remaining K-1 folds are combined to form the
training set.
○ Train and Evaluate: The model is trained on the training set and then
evaluated on the validation set. The performance score (e.g., accuracy) for
that iteration is recorded.
4. Average the Scores: After all K iterations are complete, the final performance of the
model is calculated by taking the average of the K recorded scores. This average
score is a more reliable and less biased estimate of the model's performance than a
single train-test split.
(OR Q.3 a) Explain the importance of Dimensionality reduction and Feature subset
selection in Data Pre-Processing. (3 Marks)
● Dimensionality Reduction: Important for reducing the "Curse of Dimensionality,"
where models struggle with sparse data in high dimensions. It speeds up model
training and can help with data visualization.
● Feature Subset Selection: Important for simplifying models by removing irrelevant
or redundant features. This can improve model accuracy, reduce overfitting, and
make the model easier to interpret.
(OR Q.3 b) Write the difference between 1) Nominal and Ordinal data, 2) Interval and
Ratio data. (4 Marks)
1. Nominal vs. Ordinal Data (Both are Categorical):
○ Nominal Data: Categories that have no natural order or ranking. Examples:
Gender ('Male', 'Female'), Colors ('Red', 'Blue').
○ Ordinal Data: Categories that have a meaningful order or rank, but the
difference between them is not defined. Examples: Customer satisfaction
('Poor', 'Good', 'Excellent'), T-shirt size ('S', 'M', 'L').
2. Interval vs. Ratio Data (Both are Numerical):
○ Interval Data: Ordered data where the difference between two values is
meaningful, but there is no true zero point. Examples: Temperature in
Celsius (0°C is a temperature, not the absence of heat), IQ score.
○ Ratio Data: Ordered data with a meaningful difference and a true, absolute
zero. This means zero represents the absence of the attribute. Examples:
Height, weight, age, price.
(OR Q.3 c) Write a Pandas program to find and drop the missing values from the given
dataset. (7 Marks)
Python
import pandas as pd
import numpy as np
Choosing the value of 'k' (the number of neighbors) in the K-Nearest Neighbors algorithm is
a critical step that significantly affects the model's performance.
● Small k (e.g., k=1): The model is highly flexible and sensitive to noise and outliers.
This can lead to a complex decision boundary and overfitting (high variance).
● Large k: The model becomes less sensitive to individual points and has a smoother
decision boundary. This can lead to underfitting (high bias), as it might fail to
capture the local structure of the data.
● How to Choose k: There is no single best value for k. The optimal k is typically
found through experimentation and cross-validation. A common practice is to:
1. Test the model's performance with a range of k values (e.g., from 1 to 20).
2. Plot the accuracy for each k.
3. Choose the k value that provides the best accuracy (often found at the
"elbow" of the curve, where performance stabilizes).
4. It's also a good practice to use an odd number for k in binary classification to
avoid ties.
c) Define simple linear regression using a graph explaining slope. Find the slope of
the graph where the lower point on the line is represented as (-3, -2) and the higher
point on the line is represented as (2, 2). (7 Marks)
The slope of the graph is 0.8.
(OR Q.4 a) Define regression with example in supervised learning. List types of
regression. (3 Marks)
● Definition: Regression is a supervised learning task where the objective is to predict
a continuous numerical value. The model learns the relationship between input
features and a continuous target variable.
● Example: Predicting the price of a house based on its size, location, and number of
bedrooms.
● Types of Regression:
○ Simple Linear Regression
○ Multiple Linear Regression
○ Polynomial Regression
○ Ridge Regression
○ Lasso Regression
The K-Nearest Neighbors (KNN) algorithm is a simple, yet effective, supervised machine
learning algorithm used for both classification and regression. It is a non-parametric and lazy
learning algorithm.
● Non-parametric: It makes no assumptions about the underlying data distribution.
● Lazy Learning: It does not build a model during the training phase. Instead, it simply
stores the entire training dataset. The computation is deferred until a prediction is
needed.
How it Works:
The core idea of KNN is that similar things exist in close proximity. To classify a new, unseen
data point, KNN follows these steps:
1. Choose a value for k: Decide on the number of nearest neighbors to consider (e.g.,
k=5).
2. Calculate Distances: Calculate the distance between the new data point and every
single point in the training dataset. The most common distance metric used is
Euclidean distance.
3. Identify the k Nearest Neighbors: Find the k data points from the training set that
have the smallest distances to the new point.
4. Make a Prediction:
○ For Classification: The new data point is assigned to the class that is most
common among its k nearest neighbors (this is called a "majority vote").
○ For Regression: The prediction for the new data point is the average of the
values of its k nearest neighbors.
The main difference between K-Means and K-Medoids lies in how they define the center of
a cluster.
● K-Means: The center, called a centroid, is the mean (average) of all the data points
in the cluster. This centroid is a calculated point and may not be an actual data point
in the dataset.
● K-Medoids: The center, called a medoid, is the most centrally located actual data
point within the cluster. It is chosen to minimize the total distance to all other points
in its cluster.
Because K-Medoids uses an actual data point as the center, it is more robust to outliers
than K-Means.
This is a repeat of Q.5(b) from the Winter-2024 paper. Please refer to that answer.
(c) Explain how the Market Basket Analysis uses the concepts of association
analysis. (7 Marks)
By using these concepts, retailers can make practical business decisions like placing
associated items close to each other in a store, creating targeted promotions, and offering
product bundles.
This is a repeat of Q.5(a) OR from the Winter-2024 paper. Please refer to that answer.
(OR Q.5 b) What are the broad three categories of clustering techniques? Explain the
characteristics of each briefly. (4 Marks)
This is a repeat of Q.5(b) OR from the Winter-2024 paper. The three categories are:
1. Partitioning Clustering (e.g., K-Means): Divides data into a pre-set number of
non-overlapping clusters.
2. Hierarchical Clustering (e.g., Agglomerative): Creates a tree-like hierarchy of
clusters, not requiring the number of clusters to be specified beforehand.
3. Density-Based Clustering (e.g., DBSCAN): Groups dense regions of data points
into clusters and can identify arbitrarily shaped clusters and outliers.
(OR Q.5 c) List association methods and explain any one with example. (7 Marks)
● Association Methods:
1. Apriori
2. Eclat
3. FP-Growth
● Explanation of Apriori Algorithm:
The Apriori algorithm is a classic method for mining frequent itemsets and generating
association rules. Its core idea is based on the Apriori Principle: "If an itemset is
frequent, then all of its subsets must also be frequent." This principle helps to prune
the search space efficiently.
Steps with an Example:
Imagine a small set of transactions: {Milk, Bread}, {Bread, Diapers}, {Milk, Diapers,
Beer}, {Milk, Bread, Diapers}. Let's set a minimum support of 50% (must appear in at
least 2 transactions).
1. Find Frequent 1-Itemsets:
■ {Milk}: 3 times (75%) -> Frequent
■ {Bread}: 3 times (75%) -> Frequent
■ {Diapers}: 3 times (75%) -> Frequent
■ {Beer}: 1 time (25%) -> Infrequent. We discard {Beer}.
2. Generate and Prune 2-Itemsets: We generate pairs only from the frequent
1-itemsets.
■ {Milk, Bread}: 2 times (50%) -> Frequent
■ {Milk, Diapers}: 2 times (50%) -> Frequent
■ {Bread, Diapers}: 2 times (50%) -> Frequent
3. Generate and Prune 3-Itemsets: We join the frequent 2-itemsets. The only
candidate is {Milk, Bread, Diapers}.
■ {Milk, Bread, Diapers}: 2 times (50%) -> Frequent
4. Generate Association Rules: From the frequent itemsets, we generate rules
that meet a minimum confidence threshold (e.g., 60%).
■ From {Milk, Bread, Diapers}: One possible rule is {Diapers, Bread} ->
{Milk}.
■ Confidence Calculation: The itemset {Diapers, Bread} appears 2
times. The itemset {Milk, Bread, Diapers} also appears 2 times.
■ Confidence = (Support of {Milk, Bread, Diapers}) / (Support of
{Diapers, Bread}) = 2/2 = 100%.
■ Since 100% > 60%, this is a strong rule.