0% found this document useful (0 votes)

19 views31 pages

Solution

Uploaded by

Blaze 08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views31 pages

Solution

Uploaded by

Blaze 08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit I: Solutions to Previous Year Questions

From AL-405 (GS) Examination, December 2024

1. a) How does supervised learning differ from unsupervised learning? Discuss.

Answer:

Supervised and unsupervised learning are two main ways machines learn from data. They are different in how they
use data.

• Supervised Learning:

o How it learns: The computer learns from "labeled" data. This means that for every piece of
information it gets, the correct answer or "label" is already given. Think of it like a student with a
teacher.

o Goal: To learn a rule to map inputs to known outputs. Then, it predicts outputs for new, unseen data.

o Data: Requires input data that has correct answers already attached.

o Tasks: It solves two main types of problems:

▪ Regression: Predicting a number (like house prices).

▪ Classification: Predicting a category (like "spam" or "not spam").

o Example: Training a program to tell if an email is spam. You give it many emails, each already marked
as "spam" or "not spam." The program learns from these examples.

• Unsupervised Learning:

o How it learns: This type of learning uses "unlabeled" data. The computer gets data without any
correct answers. It has to find patterns or structures in the data on its own. It's like a student
exploring a new topic without a teacher.

o Goal: To find hidden patterns or groups within the data. It tries to organize the data into meaningful
sets.

o Data: Uses input data that has no correct answers attached.

o Tasks: Its main jobs are:

▪ Clustering: Grouping similar data points together (like customer groups).

▪ Association: Finding rules about how things go together (like what products are often bought
together).

o Example: Grouping customers for a store. You give the program data about their purchases, but you
don't tell it what groups to make. It finds similar customer groups on its own.

In Summary: Supervised learning uses data with answers to predict new answers. Unsupervised learning explores
data without answers to find hidden patterns or groups.
1. b) Briefly explain the need of Inductive Bias in decision Tree Learning.

Answer:

Inductive bias is a set of assumptions that a learning program uses to guess outputs for information it hasn't seen
before. It's very important for decision trees to learn well.

Here's why inductive bias is needed for decision trees:

• Limited Training Data: Real-world data is often incomplete. Without inductive bias, a decision tree might just
memorize the training data perfectly. This is called "overfitting." When overfitting happens, the tree works
badly on new data. Inductive bias helps the tree make smart guesses for new situations.

• Too Many Possible Trees: There are often many, many different decision trees that could perfectly fit the
same training data. Inductive bias helps the program pick the best and often simplest tree from all these
choices. Without it, the program wouldn't know which tree is better.

• Guiding How to Split Data: When building a decision tree, the program needs to decide the best way to split
the data at each step. This choice is guided by a form of inductive bias. It helps the tree focus on features
that make the best splits.

• Preventing Overfitting: Inductive biases, like "pruning" (cutting off branches from the tree), help prevent the
tree from becoming too complicated and too focused on the training data. This makes the tree perform
better on data it hasn't seen.

In simple terms: Inductive bias gives the learning program a "hint" or a "preference" on how to learn and generalize
from limited data, so it doesn't just memorize and can make good predictions for new situations.

2. a) What is Principal Component Analysis (PCA) in machine learning? How does PCA help in reducing the
dimensionality of data?

Answer:

Principal Component Analysis (PCA) is a common method in machine learning that helps simplify large datasets. It's
used when you have too many features (or columns) in your data. PCA works by turning many original features into a
smaller number of new, simpler features called Principal Components (PCs).

How PCA Helps Reduce Data Dimensions:

1. Finds Key Directions: Datasets often have features that are related to each other (like house size and number
of rooms). PCA finds new, independent "directions" in the data where the information is most spread out.

o The first Principal Component (PC1) captures the most information from the original data.

o The second Principal Component (PC2) captures the next most important information, in a direction
that is totally separate from the first PC.

2. Ranks by Importance: PCA ranks these new PCs by how much information they hold. The first few PCs
usually contain most of the important stuff.

3. Reduces Number of Features: You then choose to keep only the top PCs that hold most of the important
information. For example, if you had 10 original features, you might reduce them to just 2 or 3 PCs.

o This way, you simplify your data, making it much easier to work with, visualize, and faster for other
machine learning programs to process. You keep most of the important information but with fewer
features.

In simple terms: PCA finds the main patterns in your data and combines related features into fewer, stronger
features. This shrinks your data without losing too much important detail.
2. b) Evaluate the effectiveness of biologically inspired neural network architectures in solving complex machine
learning tasks.

Answer:

Neural networks, which are inspired by the human brain, are very effective at solving tough machine learning
problems. Their design is based on how our brains work.

How They Are Effective:

1. Doing Many Things at Once (Parallel Processing): Our brains can handle many tasks at the same time. Neural
networks do this too. They have many small parts (like brain neurons) that work together. This makes them
very fast for big tasks like recognizing images or understanding speech in real-time.

2. Learning Complex Patterns: Our brain processes information in complex, non-straightforward ways. Neural
networks use special "activation functions" that let them learn these tricky, non-straightforward patterns in
data. This helps them solve hard problems like recognizing faces or understanding complex language.

3. Storing Information Widely: In the brain, information isn't stored in just one spot; it's spread out. Neural
networks also store their learning (weights) across the whole network. This means if a small part of the
network is damaged, it can often still work, making it tough and reliable.

4. Adapting to Incomplete Information: Neural networks can still make good guesses even if some of the input
data is missing or a bit messy. They learn general patterns, so they can adapt and fill in gaps.

5. Handling Different Data Types: They can work with many kinds of data, from numbers to pictures and
sounds. This makes them useful for a wide range of tasks like spam filters or identifying objects in photos.

Even though they can be hard to understand and need powerful computers, their ability to learn like a brain makes
them very good at many complex machine learning tasks.

From AL-405 (GS) Examination, June 2022

1. a) Explain the concept of hypothesis space and inductive bias in brief.

Answer:

• Hypothesis Space:

o When we choose a machine learning program (like a neural network or a decision tree), we are also
choosing all the possible "rules" or "answers" that this program could learn.

o This entire collection of all possible answers is called the hypothesis space.

o The learning program then searches within this space to find the best rule that fits the training data
well and can make good predictions for new data.

• Inductive Bias:

o The "hypothesis space" can be huge, sometimes endless. So, a learning program needs some built-in
assumptions or preferences to help it pick one rule over others. This is especially true when it needs
to guess for data it has never seen.

o This set of assumptions is called inductive bias.

o It helps the program generalize from the data it has learned from to make predictions for new
situations. Without these assumptions, the program would just memorize the training data
(overfitting) and wouldn't be able to guess for new things.
In short: Hypothesis space is all the possible solutions a program could find. Inductive bias is the program's built-in
"preference" or "guesswork" that helps it choose the best solution from that space, especially for new data.

1. b) List and explain perspectives and issues in machine learning.

Answer:

Machine Learning, while very powerful, faces several important problems or challenges that can affect how well its
models work.

Here are the key issues in machine learning:

1. Not Enough Good Training Data:

o Issue: ML models learn from data, so they need a lot of it. A big problem is not having enough high-
quality and relevant data to train a model properly. If data is scarce, the model can't learn strong
patterns.

2. Poor Quality of Data:

o Issue: Even if there's a lot of data, if it's messy, incomplete (missing info), inaccurate (wrong info), or
contains noise (unwanted details), the model's accuracy will suffer badly. This is often summed up as
"garbage in, garbage out."

3. Training Data Not Representative:

o Issue: For a model to work well on new, real-world data, the data it trained on must be a good
sample of all the situations it will face. If the training data doesn't properly represent new situations,
the model won't make good predictions for them.

4. Overfitting:

o Issue: This happens when a model learns the training data too perfectly, including all its little errors
and random quirks. It becomes overly specific and essentially just memorizes the training examples.
Because of this, it works great on the training data but performs very poorly on new, unseen data.

5. Underfitting:

o Issue: This is the opposite of overfitting. It happens when a model is too simple or hasn't been
trained enough. It fails to learn the important patterns in the training data. This leads to poor
performance on both the training data and new data.

6. Hard to Explain (Lack of Interpretability):

o Issue: For complex ML models, it can be very difficult to understand why the model made a certain
prediction. This "black box" nature can make people less trusting of the model, especially in
important areas like healthcare.

7. Needs Powerful Computers:

o Issue: Training very advanced and large ML models requires a lot of computing power, including
special processors and memory. This can be expensive and require significant resources.

From AL-405 (GS) Examination, November 2023

1. a) Explain Principle component analysis with Examples.

Answer:
Principal Component Analysis (PCA) is a popular method in machine learning that helps simplify large datasets. It's
used when you have too many features (or columns) in your data. PCA works by turning many original features into a
smaller number of new, simpler features called Principal Components (PCs).

How PCA Works (with Example):

Imagine you have a dataset about different cars. Each car has many features like:

• Engine size

• Horsepower

• Weight

• Fuel efficiency

• Top speed

• Number of seats

Some of these features might be closely related. For example, "Engine size," "Horsepower," and "Top speed" might all
be high for a sports car and low for an economy car. "Weight" might also be related to "Engine size."

PCA helps in these steps:

1. Finds New Directions (Principal Components): PCA looks for the main "directions" in your data where the
information is most spread out.

o The First Principal Component (PC1) captures the biggest chunk of information from the original
data. For our car example, PC1 might represent "overall performance" (a mix of engine size,
horsepower, and top speed).

o The Second Principal Component (PC2) captures the next biggest chunk of information, but it's
completely separate from the first PC. PC2 might represent "practicality" (a mix of fuel efficiency and
number of seats).

2. Ranks by Importance: PCA then ranks these new PCs by how much information they explain. The first few
PCs usually hold most of the important details.

3. Reduces Number of Features: You can then choose to keep only the top-ranked PCs (e.g., PC1 and PC2). By
doing this, you've reduced your data's features from many original ones (like 6 in our example) down to just a
few (like 2 PCs).

o Even though you have fewer features, you've still kept most of the important information about the
cars, but in a simpler form. This makes your data easier to understand, visualize, and faster for other
ML programs to use.

Key Points:

• PCA is an unsupervised method for reducing the number of features.

• It changes related features into new, unrelated features called Principal Components.

• It helps simplify data by keeping most of the important information with fewer variables.

• Good for making data easier to visualize and process.

1. b) Define Machine Learning and explain different issues of Machine Learning.

Answer:
Machine Learning (ML) is a field of Artificial Intelligence (AI) that allows computers to learn from data without
needing to be told exactly what to do. It helps computers find patterns, make decisions, or predict things based on
past information.

Different Issues (Problems) in Machine Learning:

Machine Learning models, despite their power, often face several common challenges:

1. Not Enough Good Training Data:

o Problem: ML models need to learn from data. A big issue is not having enough data that is relevant
and of high quality. If there isn't enough good data, the model can't learn well and will give poor
results.

2. Poor Quality of Data:

o Problem: Even if there's a lot of data, its quality matters a lot. Data that is messy, incomplete (missing
information), inaccurate (wrong details), or has noise (unwanted extra details) will make the model's
predictions bad.

3. Training Data Not Like Real Data:

o Problem: For a model to work well in the real world, the data it learned from must truly represent
what it will see later. If the training data is not like the new data, the model won't make good
predictions.

4. Overfitting:

o Problem: This happens when a model learns the training data too perfectly, including all its small
errors and random quirks. It becomes overly specific and basically just memorizes the training
examples. Because of this, it works great on the training data but very badly on new, unseen data.

5. Underfitting:

o Problem: This is the opposite of overfitting. It happens when a model is too simple or hasn't been
trained enough. It fails to learn the important patterns in the training data. This leads to poor
performance on both the training data and new data.

6. Hard to Understand Decisions:

o Problem: For complex ML models, it can be very difficult to know why the model made a certain
prediction. This "black box" problem can make people trust the model less, especially in critical areas
like healthcare.

7. Needs Powerful Computers:

o Problem: Training very advanced and large ML models needs a lot of computer power, including
special processors and memory. This can be costly and requires significant resources.
Unit II: Neural Networks

2.1 From Biology to Simulation (Biological Neural Networks)

What is the Idea Behind Neural Networks?

Artificial Neural Networks (ANNs) are like computer models that are inspired by the human brain. Our brain has
billions of tiny parts called neurons that are all connected. These connections help us learn, recognize things, and
make decisions very quickly.

How ANNs Mimic the Brain:

ANNs try to copy this brain structure. They have "nodes" (like brain neurons) that are connected to each other in
layers. This design helps ANNs learn from data and adjust their connections to do tasks. It shows how computers can
process many things at once, just like our brains.

Key Ideas:

• ANNs are inspired by how our brain's neurons work.

• The human brain is good at doing many things at once (parallel processing).

• ANNs try to mimic the brain's learning and decision-making.

• Information in the brain is stored in many places, which helps us use our memory.

2.2 Neural Network Representation

What Does a Neural Network Look Like?

A Neural Network (NN) is a collection of connected "nodes" (like tiny processors) arranged in layers.

The Layers:

• Input Layer: This is where the data first enters the network.

• Hidden Layer(s): These layers are between the input and output. They do most of the complex calculations
to find patterns in the data.

• Output Layer: This layer gives the final result or prediction from the network.

How Nodes Connect:

Every connection between nodes has a "weight." This weight shows how strong or important that connection is.
When the network learns, it changes these weights. Each node takes inputs from the previous layer, adds them up
(with their weights), includes a "bias" (an extra value), and then puts it through an "activation function" to decide
what to pass on.

Key Points:

• NNs have nodes arranged in Input, Hidden, and Output Layers.

• Connections between nodes have weights that change during learning.

• Each node does a weighted sum and uses an activation function.

• Information usually flows forward from input to output.

2.3 Neural Networks as a Paradigm for Parallel Processing

What is Parallel Processing?

Parallel processing means doing many things at the same time. Neural Networks are built to do this naturally.

How NNs Use Parallel Processing:

Each node in a Neural Network can work independently, but also together with other nodes. When you give data to
the network, information travels through all the connections and nodes at the same time. This makes NNs very fast
and efficient, especially for big and complex tasks. This is like how our brain's billions of neurons work together
simultaneously.

Key Points:

• NNs can do many tasks at once.

• Each node acts as a small, parallel worker.

• This makes NNs very fast for complex problems.

• They handle lots of data by processing parts at the same time.

• This involves distributing the network itself (model parallelism) or the training data (data parallelism) across
different processors.

2.4 Perceptron Learning

What is a Perceptron?

The Perceptron is the simplest kind of Artificial Neural Network. It was one of the very first models and is used for
basic "yes" or "no" (binary) decisions.

How it Works:

A perceptron takes several inputs. Each input is multiplied by a "weight" (its importance). All these weighted inputs
are added together. Then, it uses a simple rule (an "activation function") to decide if the final output should be 0 or 1.

How it Learns:

If the perceptron makes a wrong guess, its weights are adjusted slightly. This helps it make better guesses next time.
This process repeats until it learns to correctly classify the data. However, a single perceptron can only solve "linearly
separable" problems, meaning data that can be perfectly split by a straight line.

Key Points:

• The simplest type of Neural Network.

• Used for binary (yes/no) classification.

• Calculates a weighted sum of inputs.

• Adjusts weights to learn from mistakes.

• Can only solve problems that are easy to separate with a line.

2.5 Training a Perceptron

How Do We Teach a Perceptron?

Training a perceptron means repeatedly adjusting its weights and bias (an extra value) so it can correctly guess for
the training data.
The Steps:

1. Start with Guesses: Give the weights and bias small random starting numbers.

2. Make a Prediction: For each training example, the perceptron makes a guess (0 or 1) based on its current
weights.

3. Check the Guess: Compare the perceptron's guess to the actual correct answer.

4. Adjust if Wrong: If the guess is wrong, the weights and bias are changed using a special "learning rule." This
rule helps reduce the error for the next guess.

5. Repeat: Do this for all training examples many times until the perceptron guesses correctly for most or all of
them.

Key Points:

• It's a repeating process of adjusting weights.

• We start with random weights.

• The perceptron makes a prediction, checks it, and then updates its weights if wrong.

• The learning rate controls how big the weight adjustments are.

• For simple problems, the perceptron is guaranteed to learn correctly.

2.6 Multilayer Perceptron (MLP)

What is an MLP?

A Multilayer Perceptron (MLP) is a more advanced type of Neural Network. It's called "multilayer" because it has one
or more "hidden layers" between the input and output layers.

Why is it More Powerful?

Unlike a simple perceptron, MLPs can learn much more complex patterns in data, even patterns that can't be
separated by a straight line. This is because the hidden layers, combined with special "activation functions," allow the
network to find very complicated relationships. Information flows forward through these layers, building up
understanding step by step.

Key Points:

• Has one or more hidden layers.

• Can learn complex, non-linear patterns.

• Information flows from input to hidden to output layers.

• Each node uses a weighted sum and an activation function.

• Much more powerful than a single perceptron.

2.7 Backpropagation Algorithm

What is Backpropagation?

Backpropagation is a key algorithm used to train neural networks like MLPs. It's how the network learns from its
mistakes.

How it Works (Two Main Steps):

1. Forward Pass: The input data goes through the network, layer by layer, to produce an output prediction.

2. Backward Pass:

o First, we figure out how wrong the prediction was (the "error").

o Then, this error is sent backward through the network, from the output layer all the way to the input
layer.

o As the error moves backward, the algorithm calculates how much each connection's "weight"
contributed to the mistake.

o Finally, all the weights are adjusted a little bit to reduce the error for next time.

This process repeats many times until the network becomes good at making correct predictions. It uses calculus to
figure out the right adjustments.

Key Points:

• The main way to train deep neural networks.

• Involves a forward pass (making a prediction) and a backward pass (calculating and distributing error).

• Adjusts weights to reduce error.

• It's a repeating process.

• Needs special "activation functions" that can be used for these calculations.

2.8 Training & Validation (in Neural Networks)

Training:

This is the process where the Neural Network learns from the labeled data. The goal is for the network to adjust its
internal weights and biases so it can predict outputs very well. It keeps changing its settings to make the errors as
small as possible.

Validation:

Validation is a very important step during or just after training. We use a separate part of the data, called the
"validation set," that the network has never seen before.

• Checking Performance: We use this set to see how well the model is really learning and if it's getting too
specific to the training data (overfitting).

• Adjusting Settings: It helps us choose the best settings for the network (like how fast it learns).

• Stopping Early: If the model starts doing worse on the validation data, we can stop training early. This
prevents overfitting.

Key Points:

• Training: The network learns patterns from data.

• Validation: Checks how well the network performs on new, unseen data.

• Validation set: A separate part of data used only for checking.

• Helps prevent overfitting and choose the best model settings.

2.9 Activation Functions

What are Activation Functions?

Activation functions are special mathematical rules used inside each node (neuron) of a Neural Network. After a
node adds up all its weighted inputs, the activation function decides if that node should "turn on" or "fire."

Why are they Important?

• Add Non-Linearity: Without them, even a many-layered network would just act like a simple straight line.
Activation functions allow the network to learn complex, curvy, non-linear patterns.

• Decide "On" or "Off": They determine if a node's signal is strong enough to be passed to the next layer.

• Scale Output: They can also make sure the output of a node stays within a certain range (like between 0 and
1).

Common Types:

• Sigmoid: Squashes values between 0 and 1. Good for "yes/no" predictions.

• ReLU (Rectified Linear Unit): Simple rule: If input is positive, output is the input; otherwise, output is 0. Very
popular because it's fast and helps solve some learning problems.

• Tanh (Hyperbolic Tangent): Squashes values between -1 and 1.

• Softmax: Used in the final output layer for problems with multiple categories (e.g., predicting if an image is a
cat, dog, or bird, giving probabilities for each).

2.10 Vanishing and Exploding Gradients

What are These Problems?

These are issues that can happen when training deep Neural Networks, making them learn very slowly or become
unstable. They happen because of how errors are sent backward through many layers (backpropagation).

• Vanishing Gradients:

o What: The error signals become extremely tiny as they go backward through many layers.

o Why: This often happens when multiplying many small numbers together (from certain activation
functions, especially if their outputs are very small or very large).

o Effect: The layers closer to the start of the network barely learn at all, stopping the overall learning
process.

• Exploding Gradients:

o What: The error signals become extremely large, blowing up during backpropagation.

o Why: This happens if the network's weights are too large, causing the error to grow out of control
with each step backward.

o Effect: The network becomes very unstable, weights change drastically, and learning breaks down.

How to Fix Them:

• For Vanishing: Use different activation functions (like ReLU), or special network designs (like LSTMs, which
handle long-term memory).

• For Exploding: Use "gradient clipping" (limiting how large the error signals can get), or careful ways to set
initial weights.
Unit III: Supervised Learning Techniques

3.1 Decision Trees

What is a Decision Tree?

A Decision Tree is a supervised learning model that looks like a tree. It helps make decisions or predictions by
breaking down data into smaller and smaller parts. It can be used for both "yes/no" type answers (classification) and
guessing numbers (regression)1.

How it Looks and Works:

• Nodes: Each "circle" or "box" in the tree is a "node."

o Internal Node (Decision Node): This is where a question is asked about a feature (like "Is age > 30?").
It's labeled with an input feature2.

o Branch (Link): The lines coming out of a node are "branches," representing the answers to the
question or a decision rule3.

o Leaf Node (Terminal Node): These are the very end points of the tree. They give the final answer or
prediction, which is a class label or a continuous value4. A leaf node has a class label determined by
the majority vote of training examples5.

o Root Node: This is the starting node at the very top of the tree6.

• The tree keeps asking questions and splitting the data until it reaches a final answer. The goal is to make the
purest groups possible at the end.

• Decision trees can be easily converted into classification rules7.

Purpose of Decision Tree:

• Classification: A classification tree determines a set of logical "if-then" conditions to categorize problems8.
For example, discriminating between flower types based on features9.

• Regression: Used when the target variable is a number or continuous. Each split is made based on
minimizing the sum of squared errors101010.

Key Ideas:

• A tree-like model for making decisions11.

• Used for

classification (categories) and regression (numbers)12.

• Made of

nodes, branches, and leaves13.

• Works by

splitting data based on questions about features14.

3.2 Naive Bayes Classification

What is Naive Bayes?

Naive Bayes is a classification algorithm that works based on probability, using a rule called Bayes' Theorem15. It's
"naive" because it assumes that all features (like a person's age, income, and job) are independent of each other
when predicting a class (like whether they will buy a product)16. This means it thinks one feature doesn't affect
another.

How it Works:

Even with this simple assumption, Naive Bayes often performs surprisingly well, especially with a lot of data. It's very
fast and efficient. It calculates the chance that a certain input belongs to each possible category, and then picks the
category with the highest chance17.

Key Ideas:

• It's a

classification algorithm (predicts categories)18.

• Based on

probability (Bayes' Theorem)19.

• Assumes features are

independent (which is why it's "naive")20.

• Calculates probabilities to predict outcomes21.

• Good for tasks like spam detection or figuring out text feelings.

3.3 Classification (General Concepts)

What is Classification?

Classification is a type of supervised learning where the computer learns to put things into predefined groups or
categories. The answer it gives is always one of these specific labels. It is one of the main tasks of supervised
learning2222.

How it Differs:

• Output: The main thing about classification is that its output is always a category (like "yes" or "no", "cat" or
"dog")2323.

• Learning: The model learns from data that already has these categories marked (labeled data)2424.

Examples:

• Is this email

spam or not spam? 25

• Is this picture a cat, a dog, or a bird?

• Does this person have a disease or not?

Key Points:

• Predicts

discrete categories (labels)2626.

• Learns from labeled data.

• Different from regression (which predicts numbers)27272727.

3.4 Support Vector Machines (SVMs)

What is a Support Vector Machine (SVM)?

An SVM is a powerful supervised learning algorithm used for both classification and regression problems. Its main
goal is to find the best way to separate different groups of data by drawing a clear boundary2828.

How it Works (The "Hyperplane" and "Margin"):

Imagine you have data points scattered on a graph, and you want to draw a line to separate two different types of
points (like circles and squares).

• Hyperplane: The SVM tries to find the "best" line (or plane, if you have more features) that separates these
groups. This line is called a hyperplane29292929. The dimensions of the hyperplane depend on the features in
the dataset3030.

• Maximum Margin: The "best" line is the one that has the largest possible gap (or "margin") between it and
the closest data points from each group31313131313131. A large margin is considered a good margin32.

• Support Vectors: The data points that are closest to this separating line are called "support vectors." These
are the critical points that "support" or define the position of the hyperplane333333.

Key Points:

• A supervised learning algorithm for

classification (and regression)34.

• Goal: To

divide datasets into classes by finding a boundary35.

• Finds a "hyperplane" that

maximizes the margin between classes36363636.

• Support vectors are the data points closest to the hyperplane37.

3.5 Random Forest

What is a Random Forest?

A Random Forest is a very popular and powerful ML algorithm for both classification and regression problems38. It's
based on an idea called "Ensemble Learning," which means combining many simpler models (specifically, decision
trees) to get a better, more robust result39. As its name suggests, it builds a "forest" of decision trees40.

How it Works:

• Instead of relying on just one decision tree, a Random Forest builds

many decision trees using different random subsets of the given data41.

• Each individual tree in the forest makes its own prediction.

• For

classification problems, the Random Forest then takes a "vote" from all the trees and chooses the prediction that the
majority of trees agreed on42.

• For regression problems, it averages all the trees' predictions.

• Having many trees helps improve overall accuracy and prevents overfitting (where a single tree might be too
specific to the training data)43.
Key Ideas:

• Combines

multiple decision trees44.

• Uses

Ensemble Learning to improve performance45.

• Takes

majority vote for classification, averages for regression46.

• Helps achieve

higher accuracy and prevents overfitting47.

3.6 Linear Regression for Regression Problems

What is Linear Regression?

Linear Regression is a fundamental statistical technique used for

regression problems484848. Its goal is to find the best straight line that describes the relationship between an input
feature (or features) and a continuous numerical output4949.

How it Works (Finding the "Best Fit Line"):

Imagine you have data points on a graph (like house size vs. house price). Linear regression tries to draw a straight
line that comes closest to all these points5050.

• Minimizing Errors: It does this by minimizing the "residuals" or "errors," which are the distances between
each actual data point and the line5151. Specifically, it minimizes the

sum of the squared differences between the observed values and the predicted values from the model52525252. This is
called the "Ordinary Least Squares (OLS)" method.

• Equation of the Line: The final line has an equation. For one input, it's like

Y=mX+c5353. More generally, for multiple inputs, it's

Y=β0+β1X1+β2X2+...+βnXn+e54. Here, Y is the output (dependent variable), X values are inputs (independent
variables),

β values are the coefficients (showing how much each input affects the output), and 'e' is the error term55.

Assumptions (Things that should be true for it to work well):

Before using Linear Regression, ideally, certain things should be true about your data for the model to be
reliable56565656:

• Linearity: There should be a straight-line relationship between the input features and the output.

• Independence of Errors: The errors (differences between predicted and actual values) should not be related
to each other.

• Homoscedasticity: The spread of errors should be roughly constant across all levels of the input variables
(constant variance of errors).

• Normality of Errors: The errors should be normally distributed (follow a bell-shaped curve).

• No Multicollinearity: The independent input features should not be too highly correlated with each other.
Key Points:

• Used for

regression problems (predicting numbers)57.

• Finds the

best-fitting straight line through data points by minimizing errors5858.

• Minimizes the

sum of squared errors (Ordinary Least Squares)59595959.

• Has

assumptions about the data for best results60606060.

3.7 Ordinary Least Squares (OLS) Regression

What is OLS?

Ordinary Least Squares (OLS) is the most common technique used for

regression analysis61. It's specifically how the "best-fitting line" in Linear Regression is found6262.

How it Works:

The main idea of OLS is to make the differences between the actual data points and the line as small as
possible63636363.

• Errors/Residuals: These are the differences between the actual value and the value predicted by the
model64646464.

• Sum of Squared Residuals (RSS): OLS squares each of these errors and then adds them up65656565. Squaring
the errors means larger errors are penalized more, and positive/negative errors don't cancel each other out.

• Minimizing This Sum: The OLS method then finds the line (by figuring out the best slope and intercept) that
makes this "sum of squared residuals" as small as possible66. This line is called the "Regression Line" and
represents the best fit for the data67.

Key Points:

• A

linear regression technique used to estimate model parameters68.

• Its core is

minimizing the sum of squared residuals (RSS) between actual and predicted values69.

• Results in the

best-fit "Regression Line" for the data70.

• It is a popular method because it is used and produces decent results71.

3.8 Logistic Regression

What is Logistic Regression?

Despite having "regression" in its name, Logistic Regression is primarily a

classification algorithm72727272. It's used when the output you want to predict is a binary category (like "yes/no,"
"true/false," "spam/not spam"). It predicts the

probability that an input belongs to a certain class.

How it Works:

• Instead of fitting a straight line to the data (like linear regression), Logistic Regression uses a special S-shaped
curve called the "sigmoid function."

• This curve squashes any input value into a probability between 0 and 1.

• If the calculated probability is above a certain cutoff (e.g., 0.5), the model assigns it to one class; otherwise, it
assigns it to the other.

Key Points:

• A

classification algorithm, not for predicting numbers directly73737373.

• Used for binary outcomes (e.g., yes/no, spam/not spam).

• Predicts the probability of belonging to a class.

• Uses the sigmoid (S-shaped) function.

Okay, let's create the notes for Unit IV: Unsupervised Learning (Clustering), following the simple language, short
paragraph, and bullet point format. I will also incorporate information and answer specific questions from the
provided past papers for this unit, citing them where appropriate.

Unit IV: Unsupervised Learning (Clustering)

4.1 Clustering

What is Clustering?

Clustering is a task in unsupervised learning. It's about taking a group of unlabeled data points and dividing them into
different "clusters" or groups111. The goal is to put similar data points into the same cluster, and points that are
different into different clusters2.

Why is Clustering Used?

Imagine you run a store and want to understand your customers. You can't look at every single customer's details.
Instead, clustering can group your customers into, say, 10 groups based on their buying habits3. Then, you can create
different marketing plans for each group4. This helps make sense of lots of data without knowing the answers
beforehand5.

Key Points:

• It's an

unsupervised learning task6.

• Divides unlabeled data or data points into

different groups (clusters)777.

• Goal: Put

similar data points together and assign them to a cluster8.

• Helps to

understand patterns in data9.

• Example:

Customer behavior analysis1010.

Applications of Clustering (as per questions):

• Finding patterns in medical data sets11.

• Modeling natural phenomena12.

• Customer behavior analysis13.

• Stock price prediction (though often done with supervised, clustering can be used for market
segmentation)14.

Advantages of Clustering Algorithms (as per questions):

• Discover Hidden Patterns: Can find groups in data without prior knowledge15.

• Exploratory Data Analysis: Useful for initial understanding of data structure.

• Customer Segmentation: Helps in grouping customers with similar preferences16.

• Anomaly Detection: Outliers might form their own small clusters or be far from existing clusters.
• No Labeled Data Needed: Works with unlabeled data, which is often abundant.

Disadvantages of Clustering Algorithms (as per questions):

• Requires Choosing K (for some methods): For algorithms like K-Means, you need to decide the number of
clusters (K) beforehand, which can be difficult171717.

• Sensitivity to Initial Conditions: Some algorithms (like K-Means) can be sensitive to where the cluster centers
start.

• Difficulty with Irregular Shapes: Many algorithms struggle with clusters that are not round or clear18.

• Interpretation Can Be Hard: Understanding what each cluster truly means can sometimes be challenging.

• Sensitive to Outliers: Extreme data points can sometimes distort clusters.

4.2 K-Means Clustering Algorithm

What is K-Means Clustering?

K-Means Clustering is a very popular unsupervised learning algorithm. It groups unlabeled data into a specific
number of clusters, which we call "K"19. For example, if K=2, it will create two clusters; if K=3, it will create three
clusters2020. It is an iterative algorithm that divides unlabeled data into K different clusters21.

How it Works (Simple Steps):

1. Choose K: You first decide how many clusters (K) you want2222.

2. Pick Centers: The algorithm randomly picks K starting points called "centroids" (center points for each
cluster)23.

3. Assign Points: Each data point is assigned to the closest centroid24.

4. Update Centers: Once all points are assigned, the centroid of each cluster is recalculated to be the actual
center of all points in that cluster.

5. Repeat: Steps 3 and 4 are repeated. Points might move to different clusters, and centroids keep shifting until
they stop moving much, meaning the clusters are stable.

The main aim is to minimize the sum of distances between the data points and their corresponding cluster centers25.
Each data point will belong to only one group26.

Key Points:

• An

unsupervised learning algorithm27.

• Groups unlabeled data into

K predefined clusters28.

• It's an

iterative algorithm29.

• Each cluster is associated with a

centroid30.

• Goal:

Minimize the sum of distances between data points and their cluster centers31.
Example of K-Means Clustering (from previous papers - showing process):

Let's cluster the data {2, 4, 10, 12, 3, 20, 30, 11, 25} into two groups (K=2)32323232.

Assume initial cluster centroids are

m1=2 and m2=433333333. The distance function is Euclidean distance.

• Iteration 1:

o Assign points to closest centroid:

▪ Points closest to m1=2: {2, 3, 4} (distances: |2-2|=0, |3-2|=1, |4-2|=2)

▪ Points closest to m2=4: {10, 12, 20, 30, 11, 25} (distances: |10-4|=6, |12-4|=8, etc.)

▪ Initial clusters: Cluster 1 = {2, 3, 4}, Cluster 2 = {10, 12, 11, 18, 19, 23, 25, 27, 29, 30, 31}
(Using data from other K-Means question also)

o Update centroids:

▪ New m1=(2+3+4)/3=9/3=3

▪ New m2=(10+12+20+30+11+25)/6=108/6=18 (using only points from this example for

simplicity)

• Iteration 2:

o Assign points to new closest centroids (m1=3,m2=18):

▪ Points closest to m1=3: {2, 3, 4} (distances: |2-3|=1, |3-3|=0, |4-3|=1)

▪ Points closest to m2=18: {10, 11, 12, 20, 25, 30} (distances: |10-18|=8, |11-18|=7, |12-
18|=6, |20-18|=2, |25-18|=7, |30-18|=12)

▪ Clusters: Cluster 1 = {2, 3, 4}, Cluster 2 = {10, 11, 12, 20, 25, 30} (Notice: No points moved,
and centroids won't change)

• Final Clusters: Cluster 1 = {2, 3, 4}, Cluster 2 = {10, 11, 12, 20, 25, 30}.

(Note: The exact step-by-step numerical solution might vary slightly based on the full dataset provided in the
question, but the process remains the same).

4.3 Adaptive Hierarchical Clustering (HCA)

What is Hierarchical Clustering?

Hierarchical Clustering (also known as HCA or hierarchical cluster analysis) is another unsupervised learning
algorithm. It's used to group unlabeled data into clusters by building a hierarchy (like a tree) of clusters34. This tree-
like structure is called a "Dendrogram"35.

Difference from K-Means:

• In K-Means, you need to tell it how many clusters (K) you want upfront36.

• In Hierarchical Clustering, you

don't need to determine the number of clusters in advance37. You can decide the number of clusters later by
"cutting" the dendrogram at different levels.

Two Main Approaches:

1. Agglomerative (Bottom-Up):
o It starts by treating

each data point as its own tiny cluster38383838.

o Then, it repeatedly

merges the two closest clusters together39393939.

o This merging continues until

all data points are merged into one big cluster40404040. This is a popular example of HCA41.

2. Divisive (Top-Down):

o This is the reverse of agglomerative42.

o It starts with all data points in one big cluster.

o Then, it repeatedly splits the largest clusters into smaller ones.

o This continues until each data point is in its own cluster.

How Agglomerative Hierarchical Clustering (AHC) Works:

1. Start: Treat each data point as a single cluster. If there are 'n' data points, there are 'n' clusters43.

2. Merge Closest: Take the two closest data points or clusters and merge them to form one new cluster44.

3. Repeat Merging: Continue taking the two closest clusters and merging them together. The number of
clusters will decrease by one each time45.

4. Final Cluster: Repeat step 3 until only one single cluster is left (containing all data points)46.

5. Build Dendrogram: Once all clusters are combined, a Dendrogram (tree structure) is developed. You can then
use this dendrogram to decide where to divide the clusters based on your problem47.

Key Points:

• An

unsupervised ML algorithm48.

• Builds a

hierarchy of clusters in a tree-like form called a Dendrogram49.

• No need to pre-determine the number of clusters50.

• Two main approaches:

Agglomerative (bottom-up merging) and Divisive (top-down splitting)51.

4.4 Gaussian Mixture Models (GMMs)

What are Gaussian Mixture Models (GMMs)?

Gaussian Mixture Models (GMMs) are a type of ML algorithm used for clustering and are based on probability5252.
They assume that your data points come from a mix of different "Gaussian distributions" (which are like bell
curves)535353. Each bell curve represents a different cluster.

How They Work:

Instead of just finding a center for each cluster (like K-Means), GMMs try to figure out the shape (spread and
direction) of each cluster, assuming points in a cluster follow a bell curve54.
• Probabilistic: GMMs are "probabilistic" models. They estimate the probability that each data point belongs
to each cluster55. This means a point can belong to a cluster with a certain probability, not just 100%.

• Unknown Parameters: They assume data points are generated from Gaussian distributions with unknown
parameters565656. The goal is to estimate these parameters and the proportion of data points from each
distribution57.

• Robust to Outliers: GMMs are generally good at handling unusual data points (outliers) because they can
assign a low probability of belonging to any cluster, yielding accurate results even with outliers58.

Why are GMMs Needed?

• They can find clusters that are not perfectly round or equally sized, unlike K-Means59.

• They give you the probability of a data point belonging to a cluster, which can be more informative.

Key Points:

• A type of ML algorithm used for

clustering60606060.

• Assumes data points are generated from a

mixture of Gaussian distributions (bell curves)61.

• It's a

probabilistic model62.

• Can find clusters that are

not clearly defined63.

• Can

estimate the probability of a new point belonging to each cluster64.

• Relatively

robust to outliers65.

Key difference between GMMs & K-Means (as per questions):

• GMMS: Assumes data points are generated from a mixture of Gaussian distributions with unknown
parameters. Its goal is to estimate these parameters and the proportion of data points from each
distribution66.

• K-Means: Does not make any assumptions about the underlying distribution of data points. It simply divides
the data into K clusters, where each cluster is defined by its centroid67.

4.5 Optimization Using Evolutionary Techniques

What are Evolutionary Optimization Techniques?

These are methods in machine learning inspired by how nature evolves, using ideas like "survival of the fittest"68.
They are used to find the best solutions for difficult problems, especially optimization tasks (finding the best settings
or values for something). They try many solutions, combine them, and keep the best ones, allowing them to "evolve"
over time towards an optimal answer.

How They Work (General Idea):

1. Population: Start with a group of random possible solutions.

2. Fitness: Evaluate how "good" each solution is (how well it performs).

3. Selection: Keep the best solutions (like in natural selection).

4. Reproduction/Mutation: Create new solutions by combining (crossing over) parts of the best ones and
adding small random changes (mutations).

5. Repeat: Go back to step 2 and keep evolving the solutions over generations until a good answer is found.

Key Points:

• Inspired by

biological evolution (survival of the fittest)69.

• Used for

optimization tasks in ML70.

• Works by evolving a "population" of solutions over generations.

• Examples: Genetic Algorithms (a common type).

4.6 Number of Clusters

How to Choose "K" (Number of Clusters)?

In clustering, especially K-Means, you often need to decide how many clusters (K) to create717171. This is an
important choice because it affects how the data is grouped.

Finding the Best Number:

• Trial and Error: Sometimes, you try different values of K and see which one makes the most sense for your
data or problem72.

• Application Defined: For some problems, the number of clusters is already known or makes practical sense
(e.g., if you want to group customers into 3 specific loyalty tiers)73.

• Evaluation Metrics: There are methods that help determine a good K by looking at how "tight" the clusters
are or how well separated they are.

o Elbow Method: You plot a graph showing how much error decreases as you add more clusters.
Often, the graph forms an "elbow" shape, and the "elbow point" suggests a good K.

o Silhouette Score: This measures how similar an object is to its own cluster compared to other
clusters. A higher score is better.

Key Points:

• Deciding the

number of clusters (K) is crucial74.

• Can be chosen by

trial and error or by the application's needs75.

• Evaluation methods can help find an optimal K.

4.7 Advanced Discussion on Clustering (Linkage Methods)

Measuring Distance Between Clusters:

In hierarchical clustering, deciding which clusters to merge (or split) depends on how "close" they are76. This
"closeness" is measured using different "linkage methods"77.

Common Linkage Methods:

These methods define how the distance between two clusters is calculated:

• Single Linkage: The distance between two clusters is the shortest distance between any two points from the
different clusters78.

• Complete Linkage: The distance between two clusters is the longest distance between any two points from
the different clusters79. This method tends to form tighter clusters than single linkage80.

• Average Linkage: The distance between two clusters is the average distance between all possible pairs of
data sets, where one point is from each cluster81.

• Centroid Linkage: The distance between two clusters is the distance between their centroids (their center
points)82.

Key Points:

• Linkage Methods define how distance between clusters is measured83.

• Crucial for hierarchical clustering84.

• Different methods lead to

different clustering results85.

4.8 Expectation Maximization (EM) Algorithm

What is the EM Algorithm?

The Expectation-Maximization (EM) algorithm is a powerful method used for finding hidden (or "latent") variables in
data. It's often used to train models like Gaussian Mixture Models (GMMs) when some data is missing or when we
don't know which cluster each data point belongs to86.

Why Do We Need It?

Imagine you have a bag of coins, but you don't know if they're fair or biased. If you knew which coin was which, it
would be easy to flip them and count heads/tails. But you don't know. EM helps in these situations where there's
"missing information" or "hidden variables" that make direct calculation hard87. It helps estimate parameters for
models like GMMs, especially when clusters are not clearly defined88.

How it Works (Two Steps that Repeat):

1. Expectation (E-step): In this step, the algorithm guesses the "missing information" or the values of the
hidden variables89. For example, in GMMs, it guesses the probability that each data point belongs to each
cluster, based on the current (guessed) cluster properties.

2. Maximization (M-step): In this step, the algorithm uses the guessed information from the E-step to update
the model and maximize the likelihood of the observed data90. For example, in GMMs, this step recalculates
the best cluster properties (like their centers and shapes) based on the probabilities assigned in the E-step.

These two steps are repeated over and over. With each repetition, the algorithm's guesses get better and better,
leading to a good final model.

Key Points:

• An algorithm used to find

hidden variables or missing information in data91.

• Often used to

train models like Gaussian Mixture Models (GMMs)92.

• It's an

iterative two-step process: E-step (guessing) and M-step (updating)93.

• Needed when

direct calculation is hard due to unknown parameters or cluster assignments94.

Okay, let's create the notes for Unit V: Design and Analysis of Machine Learning Experiments, following the simple
language, short paragraph, and bullet point format. I will incorporate information and answer specific questions from
the provided past papers for this unit, citing them where appropriate.

Unit V: Design and Analysis of Machine Learning Experiments

5.1 Design and Analysis of Machine Learning Experiments

What are ML Experiments?

In Machine Learning, an "experiment" is a planned series of actions where we try different settings to see how they
affect a model's performance. It's like a scientific test1. We want to understand what makes our ML model work best.

Why Do We Do Experiments?

• To learn about the model: We feed data to a learner (our ML model) and see what output it gives2.

• To identify important factors: We play with different factors (like the algorithm used, the training set, or
input features) to see how they change the outcome3.

• To find the best settings: The goal might be to find the setup that makes the model perform its best4.

• To get reliable results: We want to be sure our findings are statistically meaningful and not just due to
chance5.

Key Ideas of Experimental Design:

• Hypothesis Formulation: Clearly stating what we expect to happen or what we want to test6.

• Variable Manipulation: Changing certain factors (inputs) to see their effect on the output7.

• Control: Keeping other factors constant to make sure our changes are truly causing the observed effects8.

Key Points:

• ML experiments are like

scientific tests for our models9.

• They help us understand

what factors affect a model's output10.

• We aim for a model with

high accuracy, minimal complexity, and one that is not easily affected by outside changes11.

• Key elements include

setting a hypothesis, changing variables, and keeping things controlled12.

5.2 Factors, Response, and Strategy of Experimentation

Factors and Response:

• Factors (Inputs): These are the things we can change or that vary in an ML experiment13. They can be:

o The

algorithm we choose14.

o The
training set used15.

o The

input features/factors or variables16.

o Other inputs like hyperparameters or data preprocessing steps.

o External or "uncontrollable" factors (like noise in data)17.

• Response (Output): This is what we measure to see the result of our changes18. It's usually the model's
performance (e.g., accuracy, error rate).

Strategy of Experimentation:

• Planning: It's important to plan experiments carefully to identify the most important factors19.

• Observation: We observe how the model's performance (response) changes when we adjust the factors20.

• Information Extraction: The goal is to extract information and identify the most important factors21.

• Eliminating Chance: We design experiments to make sure our conclusions are statistically sound and not just
due to random luck22.

Key Points:

• We change

factors (inputs) and observe the response (output/performance)23.

• Factors can be the

algorithm, data, features, or model settings24.

• The strategy involves

planning, observing changes, and getting useful information25.

• A trained learner can be shown as having controllable and uncontrollable factors as input, producing an
output26.

5.3 Guidelines for Machine Learning Experiments

General Guidelines:

• Clear Goals: Define what you want to achieve with the experiment (e.g., higher accuracy, faster training).

• Reproducibility: Make sure your experiment can be repeated by others (or yourself later) to get the same
results. This means keeping good records of your data, code, and settings.

• Baselines: Always compare your new model or approach against a simple, existing model (a "baseline") to
see if it's actually better.

• Proper Evaluation: Use appropriate evaluation metrics for your problem (e.g., accuracy for classification,
MSE for regression) and always test on unseen data.

Key Points:

• Define clear objectives.

• Ensure experiments can be reproduced.

• Compare against a starting point (baseline).

• Use the right measurements to evaluate.

5.4 Cross-Validation and Resampling Methods

What are Resampling Methods?

Resampling methods are techniques that involve repeatedly drawing samples from a training dataset2727. They are
used to train and evaluate ML models, giving a more robust estimate of performance than a single train-test
split2828. Cross-validation is a key resampling method29292929.

Cross-Validation:

(We covered this in detail in Unit 1.8, but here's a recap in this context.)

• Purpose: It's a technique for checking how well a model will work on new, unseen data30. It helps test the
model's stability31.

• How it works (e.g., k-Fold): The data is split into 'k' parts. The model is trained 'k' times. Each time, a
different part is used for testing, and the rest for training. The final score is the average of all tests32323232.

• Benefit: Gives a more reliable performance estimate and helps prevent overfitting.

Why use Resampling in Experiments?

• Better Performance Estimate: It helps get a more accurate idea of how the model will perform in the real
world33.

• Hyperparameter Tuning: It's used to find the best settings (hyperparameters) for the model.

• Model Selection: When comparing different models, resampling helps reliably identify which one is best.

Key Points:

• Resampling methods involve

drawing multiple samples from data34343434.

• Cross-validation is a key resampling technique35353535.

• It helps

validate model efficiency on unseen data36.

• It's important for

testing model stability before deployment37.

5.5 Measuring Classifier Performance

Why Measure Classifier Performance?

Once a classification model is trained, we need to know how well it predicts categories38. This is crucial for trusting
the model and deciding if it's good enough for its purpose.

How We Measure (Review from Unit 1.7):

We use specific metrics to quantify how good our classifier is.

• Accuracy: The overall percentage of correct predictions.

• Confusion Matrix: A table that shows all types of correct and incorrect predictions:
o True Positives (TP): Correctly predicted positive.

o True Negatives (TN): Correctly predicted negative.

o False Positives (FP): Predicted positive, but actually negative.

o False Negatives (FN): Predicted negative, but actually positive.

• Precision: Out of all the times the model predicted "positive," how many were actually correct.

• Recall: Out of all the actual "positives" in the data, how many did the model correctly identify.

• F1-Score: A single score that balances both Precision and Recall.

Key Points:

• Essential to understand

how well a model predicts categories39.

• Uses metrics like Accuracy, Confusion Matrix, Precision, Recall, F1-Score.

• Helps in model selection and understanding strengths/weaknesses.

5.6 Hypothesis Testing

What is Hypothesis Testing?

Hypothesis testing is a statistical method used to make decisions about a whole group (called a "population") based
on data from a smaller part of that group (called a "sample")40. We start with a belief or a proposed explanation (a
"hypothesis") and then use data to see if there's enough evidence to support or reject it41414141.

Why it's Needed in ML Experiments:

• Confirm Observations: Data science projects often start by exploring data. Hypothesis testing helps confirm if
what we observe in our sample data is true for the larger population42.

• Statistical Significance: It tells us if differences in model performance (e.g., between two algorithms) are
truly meaningful or just due to random chance. If a difference is "statistically significant," it means it's
probably not just a coincidence434343.

• Algorithm Selection: It's used to determine if differences in performance between two data samples or
metric performances are statistically significant or just noise444444.

Key Points:

• A statistical method to

make decisions about a population using sample data4545.

• Used to confirm if observed patterns are

real or just chance46464646.

• Helps determine if model differences are

statistically significant474747.

• Involves concepts like

Null Hypothesis Testing (assuming no difference)48.

5.7 Comparing Multiple Algorithms and Models

Why Compare Algorithms?

Comparing different ML algorithms and models is vital. It helps us choose the best one for a specific task and
dataset4949. There are also non-obvious benefits to comparing effectively50.

Goals of Comparison:

• Better Performance: The primary objective is to find the algorithm that gives the best results for our problem
(e.g., highest accuracy, lowest error)515151.

• Longer Lifetime: We want a model that understands the underlying patterns in data, so its predictions
remain good even with new, unseen data, reducing the need for constant retraining52.

• Easier Retraining: By comparing thoroughly, we record details that help us understand why a model was
chosen or why it failed, making retraining quicker if needed535353.

• Speedy Production: Comparison helps identify models that are fast and use computer resources optimally,
important for real-world use54.

How to Compare (Parameters of ML algorithms):

• Statistical Tests: Since ML models are based on statistics, we use statistical tests to compare them55.

o Null Hypothesis Testing: Tests if the differences in performance between two models are statistically
significant or just random noise56.

o ANOVA (Analysis of Variance): Checks if the means (averages) of different groups are similar or not,
often using one or more categorical features and one continuous target57.

o Chi-square: Used for categorical features to evaluate the likelihood of association or correlation
based on frequency distributions58.

o Student's t-test: Compares the averages of different samples to determine if differences are
statistically significant59.

o Ten-fold Cross-Validation: A standard way to compare algorithm performance on different datasets

configured with the same random seed for uniformity60.

• Learning Curves: Plotting model performance over time (training vs. validation) helps see if the model is
learning well or overfitting616161616161.

o Training Learning Curve: Plots evaluation metric score over time during training, tracking progress62.

o Validation Learning Curve: Plots evaluation score against time on the validation set, showing how
well the model generalizes and helping identify overfitting63.

• Bias-Variance Tradeoff: Comparing models involves understanding their bias (assumptions used to simplify
learning) and variance (how much predictions change with changes in training data)64. The ultimate goal is to
reduce both bias and variance to a minimum6565.

Key Points:

• Crucial for

selecting the best model for a task66.

• Aims for

better performance, longer life, easier retraining, and faster use676767676767676767.

• Uses
statistical tests (ANOVA, t-test, Chi-square, Null Hypothesis testing, ten-fold cross-validation) for objective
comparison68.

• Uses

learning curves to check training progress and generalization696969696969696969.

• Involves balancing

bias and variance70707070.

MLT Assignment 6
No ratings yet
MLT Assignment 6
4 pages
Machine-Learning Notes
No ratings yet
Machine-Learning Notes
24 pages
Introduction & Fundamentals: - Part I: Introduction - Part II: Fundamental Concepts - Part III: Classification Lab
No ratings yet
Introduction & Fundamentals: - Part I: Introduction - Part II: Fundamental Concepts - Part III: Classification Lab
50 pages
UNIT-2 Goyal Question & Answer
No ratings yet
UNIT-2 Goyal Question & Answer
5 pages
ML Ans For QP-1
No ratings yet
ML Ans For QP-1
22 pages
Research Areas in Artificial Intelligence and Machine Learning
100% (1)
Research Areas in Artificial Intelligence and Machine Learning
72 pages
6th - SEM Machine Learning Notes PDF
100% (1)
6th - SEM Machine Learning Notes PDF
36 pages
Understanding K-Means and ANN Basics
No ratings yet
Understanding K-Means and ANN Basics
9 pages
Data Science Interview Questions (#Day9)
No ratings yet
Data Science Interview Questions (#Day9)
9 pages
ML Unit-1
100% (2)
ML Unit-1
12 pages
Deep Learning Exam: Key Concepts
No ratings yet
Deep Learning Exam: Key Concepts
32 pages
Materials
No ratings yet
Materials
23 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
100 pages
Tirth PDF
No ratings yet
Tirth PDF
19 pages
Ch-4 AI Class 10 Machine Learning and Modelling 2025-26
No ratings yet
Ch-4 AI Class 10 Machine Learning and Modelling 2025-26
3 pages
Machine Learning and Deep Learning Basics
No ratings yet
Machine Learning and Deep Learning Basics
36 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
124 pages
Machine Learning Lab Viva
No ratings yet
Machine Learning Lab Viva
3 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Meta Motion Fitness Tracker 241109 213742 (1) Removed
No ratings yet
Meta Motion Fitness Tracker 241109 213742 (1) Removed
20 pages
Topic 6 Notes
No ratings yet
Topic 6 Notes
3 pages
311 Introduction To Machine Learning
No ratings yet
311 Introduction To Machine Learning
33 pages
Ranjit - Jadhav Term Work Sem-II
No ratings yet
Ranjit - Jadhav Term Work Sem-II
31 pages
ML 2 Marks
No ratings yet
ML 2 Marks
7 pages
Intro - Types of Machine Learning
No ratings yet
Intro - Types of Machine Learning
24 pages
Machine Learning and Eural Etwork
100% (1)
Machine Learning and Eural Etwork
21 pages
CS3491-AI ML-Chapter 1
No ratings yet
CS3491-AI ML-Chapter 1
19 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
Deep Learning Lecture 0 Introduction Alexander Tkachenko
No ratings yet
Deep Learning Lecture 0 Introduction Alexander Tkachenko
31 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Brainheaters Notes: SERIES 313-2018 (A.Y
No ratings yet
Brainheaters Notes: SERIES 313-2018 (A.Y
69 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
13 pages
Module 4
No ratings yet
Module 4
84 pages
Twenty Frequently Asked Interview Questions and Answers
No ratings yet
Twenty Frequently Asked Interview Questions and Answers
8 pages
Top 50 Machine Learning Interview Q A
No ratings yet
Top 50 Machine Learning Interview Q A
13 pages
Machine Learning Techniques Short Answers
No ratings yet
Machine Learning Techniques Short Answers
20 pages
Asg202508161528241103 0 238
No ratings yet
Asg202508161528241103 0 238
5 pages
Question-Answers in Machine Learning
No ratings yet
Question-Answers in Machine Learning
14 pages
Mangal 2024EN09
No ratings yet
Mangal 2024EN09
16 pages
ML Full Notes
No ratings yet
ML Full Notes
147 pages
AI Notes
No ratings yet
AI Notes
19 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
7 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
38 pages
Module1 - Deep Learning
No ratings yet
Module1 - Deep Learning
26 pages
ML Answers
No ratings yet
ML Answers
7 pages
Pattern Recognition and Computer Vision Unit-1
No ratings yet
Pattern Recognition and Computer Vision Unit-1
37 pages
Machine Learning AL-405 GS Answers
No ratings yet
Machine Learning AL-405 GS Answers
3 pages
Understanding Supervised Inductive Learning
No ratings yet
Understanding Supervised Inductive Learning
3 pages
AI Learning for Tech Enthusiasts
No ratings yet
AI Learning for Tech Enthusiasts
7 pages
ISML Module 4
No ratings yet
ISML Module 4
38 pages
ML Sample PDF
No ratings yet
ML Sample PDF
5 pages
Using Neural Networks For Data Mining: Mark W. Craven A,, Jude W. Shavlik B
No ratings yet
Using Neural Networks For Data Mining: Mark W. Craven A,, Jude W. Shavlik B
19 pages
ML Units 1 3 4 5 Summary QA
No ratings yet
ML Units 1 3 4 5 Summary QA
5 pages
AIMLINTRO
No ratings yet
AIMLINTRO
12 pages
Unit IV
No ratings yet
Unit IV
6 pages
Unit III
No ratings yet
Unit III
5 pages
Unit V
No ratings yet
Unit V
5 pages
Skimming Hindpool Area Overview
No ratings yet
Skimming Hindpool Area Overview
99 pages
Cybersecurity Red Team Audit Part1
No ratings yet
Cybersecurity Red Team Audit Part1
2 pages
Xii A S No. Roll No. Name of Student
No ratings yet
Xii A S No. Roll No. Name of Student
1 page
Xii A S No. Roll No. Name of Student
No ratings yet
Xii A S No. Roll No. Name of Student
1 page
PPT ch09
No ratings yet
PPT ch09
22 pages
Matlab Curve Fitting Guide
No ratings yet
Matlab Curve Fitting Guide
16 pages
Adiabatic Piston Dynamics and Gas Behavior
No ratings yet
Adiabatic Piston Dynamics and Gas Behavior
9 pages
Essential Postulates of Euclidean Geometry
No ratings yet
Essential Postulates of Euclidean Geometry
4 pages
Goodness-Of-Fit Test For A Logistic Regression Model Fitted Using Survey Sample Data
No ratings yet
Goodness-Of-Fit Test For A Logistic Regression Model Fitted Using Survey Sample Data
9 pages
WorkBook - Mathematics 8
No ratings yet
WorkBook - Mathematics 8
11 pages
Set and Venn Diagram
No ratings yet
Set and Venn Diagram
45 pages
Multiplying Decimals Lesson Plan
No ratings yet
Multiplying Decimals Lesson Plan
4 pages
Ai Unit 5 Part 3
No ratings yet
Ai Unit 5 Part 3
9 pages
Math 10 q1 Module 11
No ratings yet
Math 10 q1 Module 11
19 pages
g5 Science U2 Benchmarkassessment
100% (1)
g5 Science U2 Benchmarkassessment
9 pages
Feature Engineering
No ratings yet
Feature Engineering
6 pages
2024 g8 Mathematic Test1 - 022004
No ratings yet
2024 g8 Mathematic Test1 - 022004
3 pages
AQA Further Maths Exam Practice
100% (2)
AQA Further Maths Exam Practice
105 pages
UG - EC303 DSP Part-9 FIR in C55x PDF
No ratings yet
UG - EC303 DSP Part-9 FIR in C55x PDF
27 pages
2023 EMR Chaffai - Banking Market Power and Its Determinants - New Insights From MENA Countries
100% (1)
2023 EMR Chaffai - Banking Market Power and Its Determinants - New Insights From MENA Countries
20 pages
Lesson 6 - Circumference and Area of A Circle
No ratings yet
Lesson 6 - Circumference and Area of A Circle
19 pages
CMM Basics: Beginner Training Course
100% (1)
CMM Basics: Beginner Training Course
4 pages
Hypothesis Testing Examples
No ratings yet
Hypothesis Testing Examples
9 pages
F1 Math Syllabus 2223
No ratings yet
F1 Math Syllabus 2223
7 pages
MI1036 Chapter 1
No ratings yet
MI1036 Chapter 1
24 pages
ESM DUCT CALCULATOR Rev2
No ratings yet
ESM DUCT CALCULATOR Rev2
1 page
Amc 8-1996
No ratings yet
Amc 8-1996
6 pages
Is 702 A ML Notes
No ratings yet
Is 702 A ML Notes
100 pages
CBSE Class 8 Mathematics Worksheet - Square and Square Roots
89% (72)
CBSE Class 8 Mathematics Worksheet - Square and Square Roots
3 pages
Steady State Heat Conduction Analysis
No ratings yet
Steady State Heat Conduction Analysis
4 pages
Hw10 Solutions
No ratings yet
Hw10 Solutions
5 pages
2nd Year Syllabus
No ratings yet
2nd Year Syllabus
22 pages
Final A BCT2053 Skema
No ratings yet
Final A BCT2053 Skema
10 pages