0% found this document useful (0 votes)
18 views3 pages

Machine Learning Questions

Here are VTU (Visvesvaraya Technological University) Data Structures notes available for free download. These notes cover all important topics like arrays, stacks, queues, linked lists, trees, graphs, searching, sorting, and hashing. Designed as per the VTU syllabus, they help students understand concepts clearly with definitions, examples, and diagrams. Useful for exam preparation, assignments, and quick revisions, the notes are written in a simple and structured format. Whether you are a begin

Uploaded by

krishukrishna38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

Machine Learning Questions

Here are VTU (Visvesvaraya Technological University) Data Structures notes available for free download. These notes cover all important topics like arrays, stacks, queues, linked lists, trees, graphs, searching, sorting, and hashing. Designed as per the VTU syllabus, they help students understand concepts clearly with definitions, examples, and diagrams. Useful for exam preparation, assignments, and quick revisions, the notes are written in a simple and structured format. Whether you are a begin

Uploaded by

krishukrishna38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Compare Linear & Non-Linear SVMs with Suitable Example


Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification and regression tasks. It works by finding the best hyperplane that separates
data points of different classes.

Linear SVM:
- Used when data is linearly separable.
- Separates data using a straight line (2D), plane (3D), or hyperplane (nD).
- Hyperplane Equation: w·x + b = 0
- Example: Classifying emails as spam or not spam using keyword and link count.

Non-Linear SVM:
- Used when data is not linearly separable.
- Uses kernel trick to transform data into higher dimensions.
- Common kernels: Polynomial, RBF, Sigmoid.
- Example: Tumor classification using size and shape, forming circular clusters.

Comparison Table:
| Feature | Linear SVM | Non-Linear SVM |
|---------------|-------------------|----------------------|
| Data Type | Linearly separable| Non-linearly separable|
| Kernel Used | Not required | Required |
| Complexity | Low | High |
| Speed | Fast | Slower |
| Example | Spam detection | Image classification |

2. Short Note on LDA


Latent Dirichlet Allocation (LDA) is a generative probabilistic model used to discover topics
in a collection of documents. Each document is considered a mixture of topics, and each
topic is a distribution over words.

Key Concepts:
- Document: A collection of words.
- Topic: A group of related words.
- Dirichlet Distribution: Used to model topic and word distributions.

Working of LDA:
1. Choose number of topics.
2. Randomly assign a topic to each word.
3. Iterate to improve assignment using statistical inference.
4. Output topic distribution for each document and word distribution for each topic.
Example:
Documents:
- Doc1: "apple banana mango"
- Doc2: "football cricket hockey"
- Doc3: "apple mango football"

LDA Topics:
- Topic 1: apple, banana, mango (fruits)
- Topic 2: football, cricket, hockey (sports)

Applications:
- Topic modeling, recommender systems, search engines, summarization.

3. Demonstrate K-Nearest Neighbour Algorithm with Use Case


K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification and
regression. It classifies a data point based on the majority label of its nearest neighbors.

How it Works:
1. Choose K (number of neighbors).
2. Calculate distance from the new point to all training points.
3. Select K nearest neighbors.
4. Assign class label by majority voting.

Use Case: Customer Classification


Classify a new customer as High or Low spender based on Age and Income.

Example:
| Customer | Age | Income | Class |
|----------|-----|--------|-------------|
| C1 | 25 | 25K | Low spender |
| C2 | 45 | 70K | High spender|
| C3 | 30 | 30K | Low spender |

New customer: Age 28, Income 28K → KNN predicts: Low spender

Advantages:
- Simple, no training time, good with small data.

Disadvantages:
- Slow with large data, sensitive to irrelevant features, needs scaling.

4. Random Forests Algorithm


Random Forest is an ensemble learning algorithm used for classification and regression. It
builds multiple decision trees and combines their outputs.
How it Works:
1. Create random samples from the dataset.
2. Build decision trees on each sample.
3. Randomly select features at each split.
4. Aggregate results (majority vote or average).

Key Concepts:
- Bagging: Combines results of multiple models trained on random subsets.
- Feature randomness reduces correlation.

Use Case: Disease Prediction


Predict disease based on age, blood pressure, sugar levels, etc.

Advantages:
- High accuracy, handles missing data, less overfitting, works for classification and
regression.

Disadvantages:
- Slower, less interpretable, high memory usage.

You might also like