0% found this document useful (0 votes)
38 views2 pages

SVM Assignment Answers

The document discusses Support Vector Machines (SVM), highlighting the importance of maximizing the margin for better classification and generalization. It differentiates between hard and soft margin SVMs, explaining that soft margins are necessary in cases of noisy or ambiguous data, such as spam classification. Additionally, it outlines a step-by-step k-means clustering example with k=2 and explains how Kernel PCA addresses the limitations of linear PCA by mapping data into a higher-dimensional space to capture non-linear relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views2 pages

SVM Assignment Answers

The document discusses Support Vector Machines (SVM), highlighting the importance of maximizing the margin for better classification and generalization. It differentiates between hard and soft margin SVMs, explaining that soft margins are necessary in cases of noisy or ambiguous data, such as spam classification. Additionally, it outlines a step-by-step k-means clustering example with k=2 and explains how Kernel PCA addresses the limitations of linear PCA by mapping data into a higher-dimensional space to capture non-linear relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment for CA2

Q1. Explain the concept of SVM. Why is maximizing the margin beneficial?
Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification and regression tasks. The main idea of SVM is to find a hyperplane (decision
boundary) that best separates the data points of different classes. Among all possible
hyperplanes, SVM chooses the one that maximizes the margin — the distance between
the hyperplane and the closest data points (called support vectors).
Why maximizing margin is beneficial?
• A larger margin reduces the risk of misclassification.
• It improves generalization ability, meaning the model performs better on unseen data.
• It makes the classifier more robust to noise in the training data.

Q2. What is the difference between soft and hard margin? Give an example where
soft margin is necessary.
Hard Margin SVM:
• Assumes data is linearly separable without any errors.
• Finds a hyperplane that perfectly separates the classes.
• Works only when there are no outliers or noise.
Soft Margin SVM:
• Allows some misclassifications by introducing a penalty (slack variable).
• Balances between maximizing margin and minimizing classification errors.
• Useful when data is noisy or not perfectly separable.
Example: In spam classification, some emails may be mislabeled or ambiguous. A soft
margin allows the model to tolerate a few misclassifications while still achieving good
accuracy.

Q3. Run step by step on the dataset (2,3),(3,3),(6,6),(7,7) with k=2.


We apply k-means clustering with k=2.

Step 1: Choose k=2 random centroids (say (2,3) and (7,7)).


Step 2: Assign points to nearest centroid.
- (2,3) → Cluster 1
- (3,3) → Cluster 1
- (6,6) → Cluster 2
- (7,7) → Cluster 2

Step 3: Recalculate centroids.


- Cluster 1 mean = (2.5, 3)
- Cluster 2 mean = (6.5, 6.5)

Step 4: Reassign points. Result remains stable.


Final Clusters:
Cluster 1: (2,3), (3,3)
Cluster 2: (6,6), (7,7)

Q4. Explain how Kernel PCA overcomes the limitations of linear PCA.
Linear PCA works only when the data is linearly separable, as it uses linear combinations
of variables. It fails when data lies on a non-linear manifold (e.g., concentric circles).
Kernel PCA uses the kernel trick to map input data into a higher-dimensional feature
space. PCA is then performed in this new space, capturing non-linear relationships. This
allows Kernel PCA to extract meaningful features from complex datasets.
Example: For a dataset shaped like concentric circles, linear PCA fails, but kernel PCA
with an RBF kernel can separate the data effectively.

You might also like