ML Unit 2

Uploaded by

Roshan Kumar yadav (RauSan)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views8 pages

ML Unit 2

Uploaded by

Roshan Kumar yadav (RauSan)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

1.

Decision Tree Learning

1.1 Decision Tree Representation
• Decision Trees: A decision tree is a flowchart-like structure used for decision making.
Each internal node represents a "test" or "decision" based on an attribute, each branch
represents the outcome of the test, and each leaf node represents a class label (in
classification) or a value (in regression).
• Structure:
o Root Node: The starting point of the tree, where the first decision is made.
o Internal Nodes: Nodes representing decisions or tests.
o Edges: Branches linking nodes, each representing the outcome of a decision.
o Leaf Nodes: Final nodes representing the predicted output (class label or
continuous value).
1.2 Appropriate Problems for Decision Tree Learning
Decision trees can be used for:
• Classification: Predicting a categorical class label.
• Regression: Predicting a continuous output value.
• Decision trees are best used when the data is:
o Structured (tabular format).
o Features are easily interpretable.
o Relationships between features are non-linear.
1.3 Univariate Trees (Classification and Regression)
• Univariate Trees: Each decision node in the tree checks a single attribute (or feature)
at a time.
o Classification: A classification decision tree uses discrete features to predict a
class label.
o Regression: A regression decision tree predicts a continuous output, with the
leaves containing the predicted value.
1.4 Multivariate Trees
• Multivariate Trees: A decision node can test multiple features simultaneously (for
example, using linear combinations of features).
o Advantage: Often more efficient than univariate trees because they can make
more complex decisions at each node.
o Disadvantage: More computationally expensive and harder to interpret.
1.5 Basic Decision Tree Learning Algorithms
• ID3 (Iterative Dichotomiser 3):
o ID3 uses information gain to select the feature that best splits the data at each
node.
o Information Gain is computed using entropy (measure of uncertainty).
• CART (Classification and Regression Trees):
o CART constructs binary trees using Gini Impurity (for classification) or Mean
Squared Error (for regression) to determine the best split.
• C4.5:
o C4.5 is an extension of ID3, which uses gain ratio to avoid the bias towards
features with more values.
• CHAID (Chi-squared Automatic Interaction Detection):
o Uses Chi-square statistics to decide the best split at each node.
1.6 Hypothesis Space Search in Decision Tree Learning
• Hypothesis Space: The set of all possible trees that can be formed using the features.
• Search Process: The algorithm searches through this hypothesis space, trying to find
the tree that best classifies the data.
• Greedy Approach: Decision tree algorithms often use greedy techniques (e.g.,
choosing the best feature at each step) to explore this space.
• Problem: The greedy approach might lead to suboptimal trees because it does not
consider future splits.
1.7 Inductive Bias in Decision Tree Learning
• Inductive Bias: The assumptions made by the algorithm to generalize from the training
data to unseen data.
• Inductive Bias in Decision Trees:
o Preference for smaller trees (pruning).
o Bias towards features that are more informative.
o The assumption that the classes are separable by axis-parallel splits.
1.8 Issues in Decision Tree Learning
• Overfitting: Trees that are too complex may fit noise in the data.
o Solution: Pruning, cross-validation, or using simpler models.
• Handling Missing Data: Decision trees might struggle when some values are missing.
This can be handled by imputation or surrogate splits.
• Bias and Variance: Decision trees can have high variance and low bias (overfitting),
or low variance and high bias (underfitting).
• Computational Complexity: As the tree grows, the computational cost increases
exponentially.

2. Bayesian Learning
2.1 Bayes' Theorem and Concept Learning
• Bayes’ Theorem:
P(H∣D)=P(D∣H)⋅P(H)P(D)P(H|D) = \frac{P(D|H) \cdot
P(H)}{P(D)}P(H∣D)=P(D)P(D∣H)⋅P(H)
Where:
o P(H∣D)P(H|D)P(H∣D) is the posterior probability of hypothesis HHH given the
data DDD.
o P(D∣H)P(D|H)P(D∣H) is the likelihood of data DDD given hypothesis HHH.
o P(H)P(H)P(H) is the prior probability of hypothesis HHH.
o P(D)P(D)P(D) is the probability of the data.
• Concept Learning: Concept learning involves learning a concept (e.g., classification
rule) based on examples. Bayesian learning can be applied to derive the probability of
a hypothesis based on observed data.
2.2 Bayes Optimal Classifier
• The Bayes Optimal Classifier is the classifier that minimizes the expected
classification error.
• It is defined as: C∗(x)=arg⁡max⁡cP(c∣x)C^*(x) = \arg\max_c P(c|x)C∗(x)=argcmax
P(c∣x) Where ccc is the class label and xxx is the input feature vector.
• This optimal classifier uses the true posterior probabilities but is impractical for real-
world problems because it requires knowledge of the entire probability distribution.
2.3 Gibbs Sampling Algorithm
• Gibbs Sampling is a Markov Chain Monte Carlo (MCMC) method used to
approximate the distribution of parameters in Bayesian networks when exact sampling
is difficult.
• It iteratively samples from the conditional distribution of each variable given the others.
2.4 Naive Bayes Classifier
• Naive Bayes is a classification algorithm based on applying Bayes' theorem with the
naive assumption that all features are conditionally independent given the class.
• The classifier calculates:
P(c∣x)=P(c)⋅∏i=1nP(xi∣c)P(x)P(c|x) = \frac{P(c) \cdot \prod_{i=1}^{n}
P(x_i|c)}{P(x)}P(c∣x)=P(x)P(c)⋅∏i=1nP(xi∣c)
Where:
o ccc is the class label.
o xix_ixi is the feature.
o The independence assumption makes it computationally efficient.
• Assumptions:
o All features are independent.
o Features are discrete (though this can be adapted for continuous features).
• Advantages:
o Simple and efficient.
o Works well with small datasets and high-dimensional data.
• Limitations:
o The independence assumption is often unrealistic.
2.5 Bayesian Belief Networks
• Bayesian Belief Networks (BBNs) are graphical models that represent probabilistic
relationships among variables.
• Nodes represent random variables, and edges represent probabilistic dependencies.
• These networks can be used for reasoning under uncertainty and making predictions
based on known evidence.
2.6 Expectation Maximization (EM) Algorithm
• The EM algorithm is an iterative method for finding maximum likelihood estimates
of parameters in models with latent (hidden) variables.
• Steps:
1. E-step: Estimate the expected value of the latent variables given the observed
data and current parameter estimates.
2. M-step: Maximize the likelihood of the observed data given the expected values
of the latent variables.
• Applications: Used in clustering (e.g., Gaussian Mixture Models), missing data, and
other problems with hidden variables.
2 Marks Questions:
1. What is a Decision Tree in machine learning?
o A Decision Tree is a tree-like model used for decision-making and classification
tasks. Each internal node represents a decision based on a feature, and the
branches represent the outcomes of those decisions. The leaf nodes represent
the class label or predicted value.
2. What is the purpose of pruning in decision trees?
o Pruning is the process of removing branches from a decision tree that have little
significance or contribute to overfitting. It helps in improving the tree's
generalization by reducing complexity and avoiding overfitting.
3. What is Naive Bayes Classifier?
o Naive Bayes Classifier is a probabilistic classifier based on Bayes' Theorem,
assuming that features are conditionally independent given the class label. It is
particularly efficient for large datasets and is used for both binary and multi-
class classification tasks.
4. Define Gini Impurity.
o Gini Impurity is a metric used in decision trees to measure the impurity of a
node. It calculates the probability of incorrectly classifying a randomly chosen
element from the dataset. Lower Gini Impurity values indicate better splits.
5. What is the role of Bayes' Theorem in Bayesian Learning?
o Bayes' Theorem provides a method for updating the probability of a hypothesis
given new evidence. It is fundamental to Bayesian Learning, allowing the
model to learn and update beliefs about data as more evidence becomes
available.

4 Marks Questions:
6. Explain the difference between Univariate and Multivariate Decision Trees.
o Univariate Decision Trees test only one feature at each decision node, using
that feature's value to split the data into subsets. In contrast, Multivariate
Decision Trees can use multiple features to create more complex decision
boundaries at each node, often leading to better classification performance at
the cost of increased complexity.
7. What is Information Gain in Decision Tree Learning?
o Information Gain is used to measure the effectiveness of a feature in
classifying the data. It is calculated as the difference between the entropy of the
parent node and the weighted sum of the entropies of the child nodes. A higher
information gain indicates a better feature for splitting the data.
8. What are the advantages and disadvantages of the Naive Bayes classifier?
o Advantages:
▪ Simple and fast to train.
▪ Works well with high-dimensional data and small datasets.
▪ Handles missing data well.
o Disadvantages:
▪ Assumes all features are conditionally independent, which is often
unrealistic.
▪ Doesn't perform well with highly correlated features.
9. How does the Expectation-Maximization (EM) algorithm work?
o The EM algorithm works by iterating between two steps:
1. E-step: Estimate the expected value of the latent (hidden) variables, given the observed
data and current parameter estimates.
2. M-step: Maximize the likelihood of the observed data given the expected values of the
latent variables.
o The algorithm continues to iterate until convergence.
10. What are Bayesian Belief Networks and their applications?
o Bayesian Belief Networks (BBNs) are graphical models that represent
probabilistic dependencies among variables using a directed acyclic graph
(DAG). They are used for reasoning under uncertainty, making predictions, and
modeling complex probabilistic relationships. Applications include diagnostic
systems, risk analysis, and decision support systems.

6 Marks Questions:
11. Describe the basic algorithm for Decision Tree Learning.
o The basic steps of the Decision Tree Learning algorithm are:
1. Select the Best Feature: Choose the feature that best splits the data
(e.g., using Information Gain, Gini Index, or Chi-squared test).
2. Split the Data: Partition the dataset into subsets based on the selected
feature.
3. Create a Node: Create a decision node for the chosen feature.
4. Repeat for Each Subset: Recursively apply the decision tree algorithm
to each subset of the data, until all data is classified.
5. Stop when:
▪ All data in a node belongs to a single class.
▪ There are no more features to split on.
▪ The tree reaches a predefined maximum depth.
6. Prune the Tree: Optional step to remove branches that do not improve
the model's performance.
12. Explain the concept of Bayesian Learning with an example.
o Bayesian Learning is a method of learning where we update our beliefs about
a model’s parameters based on observed data using Bayes' Theorem.
o Example: Suppose we have a binary classification problem where we need to
predict whether an email is spam or not. Initially, we have a prior belief about
the probabilities of spam and non-spam emails. As we observe new emails
(data), we update these probabilities (posterior) to improve the classification of
future emails. The update uses Bayes' Theorem, which incorporates both the
prior knowledge and the likelihood of the observed data.
13. What is the difference between Classification and Regression in Decision Trees?
o Classification Trees: These trees predict a categorical class label (e.g., yes/no,
spam/ham). The decision at each node is made based on the best split of
categorical or numerical features that maximizes the homogeneity of the class
labels in the child nodes.
o Regression Trees: These trees predict a continuous value. Instead of class
labels, each leaf node contains a numeric value representing the predicted
output. The splitting criteria focus on minimizing the error in predicting the
target value (e.g., minimizing squared error).
14. Discuss the advantages and disadvantages of Decision Trees.
o Advantages:
▪ Easy to understand and interpret.
▪ Can handle both numerical and categorical data.
▪ Non-linear relationships between features do not require transformation.
▪ Can handle missing data through surrogate splits.
o Disadvantages:
▪ Prone to overfitting, especially with deep trees.
▪ Sensitive to noisy data and outliers.
▪ Can be computationally expensive for large datasets.
▪ Tends to create biased trees if some classes dominate the dataset.
15. Explain the working of a Naive Bayes Classifier with an example.
o A Naive Bayes Classifier works by calculating the posterior probability of each
class given the features of the data, using Bayes' Theorem. It assumes that the
features are independent, simplifying the computation.
o Example: For a spam email classifier, let’s assume we have two features:
"contains 'free'" and "contains 'offer'". The algorithm calculates the probability
of an email being spam or not, given these features:
P(spam∣X)=P(X∣spam)⋅P(spam)P(X)P(\text{spam}|X) =
\frac{P(X|\text{spam}) \cdot
P(\text{spam})}{P(X)}P(spam∣X)=P(X)P(X∣spam)⋅P(spam) Here, XXX is the
feature vector (contains 'free', contains 'offer'). The classifier computes the
likelihood of each feature given the class, multiplies by the prior probability of
each class, and chooses the class with the highest posterior probability.

Decision Tree Structure and Algorithms
No ratings yet
Decision Tree Structure and Algorithms
5 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Slide 3
No ratings yet
Slide 3
23 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Ch02 DecisionTree
100% (1)
Ch02 DecisionTree
41 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decissin Tree & Over Fitting
No ratings yet
Decissin Tree & Over Fitting
22 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
DWM Exp3 63
No ratings yet
DWM Exp3 63
7 pages
ML Unit3
No ratings yet
ML Unit3
24 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Machine Learning: Decision Trees & Algorithms
No ratings yet
Machine Learning: Decision Trees & Algorithms
24 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit 4 ML
No ratings yet
Unit 4 ML
24 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
21cs54 Aiml Module4
No ratings yet
21cs54 Aiml Module4
128 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
Bayes and Decision Tree
No ratings yet
Bayes and Decision Tree
36 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
15 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Decision Tree Modeling Explained
No ratings yet
Decision Tree Modeling Explained
58 pages
Intro to Decision Trees for ML Students
No ratings yet
Intro to Decision Trees for ML Students
15 pages
Machine Learning-Model Questions & Answers
No ratings yet
Machine Learning-Model Questions & Answers
18 pages
Unit 3 - ML (NEW)
No ratings yet
Unit 3 - ML (NEW)
68 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
6.2.unit-2 ML Handsout
No ratings yet
6.2.unit-2 ML Handsout
18 pages
Assignment of Decision Tree
No ratings yet
Assignment of Decision Tree
15 pages
Classification
No ratings yet
Classification
45 pages
ML Unit3
No ratings yet
ML Unit3
8 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
2.12 Chapter 6 Decision Tree
No ratings yet
2.12 Chapter 6 Decision Tree
56 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
AIML Module-04
No ratings yet
AIML Module-04
46 pages
Unit 2
No ratings yet
Unit 2
29 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Chapter 2,3,4
No ratings yet
Chapter 2,3,4
8 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Unit-1 Introduction To Machine Learning: 1. What Is Learning? Learning Data Example
No ratings yet
Unit-1 Introduction To Machine Learning: 1. What Is Learning? Learning Data Example
15 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
44 pages
Model Question Paper: Physics - I
No ratings yet
Model Question Paper: Physics - I
2 pages
Erebuni Yerevan - Concert Instruments
No ratings yet
Erebuni Yerevan - Concert Instruments
1 page
Transfer/Promotion Discrimination Form
No ratings yet
Transfer/Promotion Discrimination Form
6 pages
L - Chap-5
No ratings yet
L - Chap-5
34 pages
S8 - MP - Allocation List (A) - MP-24-Apr-2025
No ratings yet
S8 - MP - Allocation List (A) - MP-24-Apr-2025
3 pages
Advanced Proton Precession System
No ratings yet
Advanced Proton Precession System
2 pages
Icap Sales Tax Past Papers 2014 To 2024
No ratings yet
Icap Sales Tax Past Papers 2014 To 2024
24 pages
Angle of Pull and Pulley by Hamza Mir
No ratings yet
Angle of Pull and Pulley by Hamza Mir
25 pages
Karan Balance Sheet 31.03.204
No ratings yet
Karan Balance Sheet 31.03.204
1 page
LMCC Impact Study - Lindsay Johnson
No ratings yet
LMCC Impact Study - Lindsay Johnson
5 pages
ACCOMPLISHMENT REPORT 2024 NWMC School Memorandum Sample
100% (1)
ACCOMPLISHMENT REPORT 2024 NWMC School Memorandum Sample
4 pages
PUPCET Reviewer With Answer Keys
82% (72)
PUPCET Reviewer With Answer Keys
13 pages
English Grade 09 Worksheet 2
No ratings yet
English Grade 09 Worksheet 2
3 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
36 pages
AU - GMA Garnet SDS - Rev 17 - May 2019 - Final PDF
No ratings yet
AU - GMA Garnet SDS - Rev 17 - May 2019 - Final PDF
7 pages
Gnome Sheet
No ratings yet
Gnome Sheet
4 pages
Hytrin (Kandungan Sama Dengan Hytroz)
No ratings yet
Hytrin (Kandungan Sama Dengan Hytroz)
7 pages
Destiny Consultancy: "No Advice Only Solution"
No ratings yet
Destiny Consultancy: "No Advice Only Solution"
13 pages
GO16 - WD - CH03 - GRADER - 3G - AS - Environmental Newsletter and Research Paper 1.3 - Instructions
No ratings yet
GO16 - WD - CH03 - GRADER - 3G - AS - Environmental Newsletter and Research Paper 1.3 - Instructions
3 pages
Concrete Standards for Durable Structures
No ratings yet
Concrete Standards for Durable Structures
13 pages
EBIT (P Q) - FC - (VC Q) EBIT Q (P-VC) - FC: The Algebraic Approach
0% (1)
EBIT (P Q) - FC - (VC Q) EBIT Q (P-VC) - FC: The Algebraic Approach
4 pages
2023 TJPhO v1
No ratings yet
2023 TJPhO v1
11 pages
Abbreviated Mental Test
No ratings yet
Abbreviated Mental Test
3 pages
Method Statement for Battery System Installation
No ratings yet
Method Statement for Battery System Installation
50 pages
Maintenance Guide for Railway Equipment
No ratings yet
Maintenance Guide for Railway Equipment
2 pages
TIA EIA 568 B.2 1final
No ratings yet
TIA EIA 568 B.2 1final
86 pages
NORBIT Winghead
No ratings yet
NORBIT Winghead
2 pages
Weyand Et Al 2010 The Biological Limits To Running Speed Are Imposed From The Ground Up
No ratings yet
Weyand Et Al 2010 The Biological Limits To Running Speed Are Imposed From The Ground Up
12 pages
Mannitol: Uses, Dosage, and Side Effects
80% (5)
Mannitol: Uses, Dosage, and Side Effects
2 pages
Photoelectric Effect in Quantum Physics
No ratings yet
Photoelectric Effect in Quantum Physics
13 pages

ML Unit 2

Uploaded by

ML Unit 2

Uploaded by

1.

Decision Tree Learning

You might also like