Machine Learning
Chapter 1 - Introduction
Lecturer: Duc Dung Nguyen, PhD.
Contact:
[email protected]Faculty of Computer Science and Engineering
Hochiminh city University of Technology
Machine Learning
What is Machine learning?
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 1 / 23
Machine Learning
What is Machine learning?
• Arthur Samuel (1959): "Field of study that gives computers the ability to learn without
being explicitly programmed"
• Tom Mitchell (1997): "A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if its performance at tasks in
T, as measured by P, improves with experience E".
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 1 / 23
Machine Learning
Machine Learning
• How to construct programs that automatically improve with experience.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 2 / 23
Machine Learning
Machine Learning
• How to construct programs that automatically improve with experience.
• The scientific study of algorithms and statistical models that computer systems use to
perform a specific task without using explicit instructions, relying on patterns and
inference instead.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 2 / 23
Machine Learning
Machine Learning
• How to construct programs that automatically improve with experience.
• The scientific study of algorithms and statistical models that computer systems use to
perform a specific task without using explicit instructions, relying on patterns and
inference instead.
• A subset of artificial intelligence.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 2 / 23
Example
Experience
Example Gray? Mammal? Large? Vegetarian? Wild? Elephant
1 + + + + + +
2 + + + - + +
3 + + - + + -
4 - + + + + -
5 + - + - + -
1 + + + + - +
Prediction
7 + + + - + ?
8 + - + - + ?
9 + + + - - ?
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 3 / 23
Machine Learning
What is learning?
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 4 / 23
Machine Learning
Learning is an (endless) generalization or
induction process.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 5 / 23
Types of Machine Learning
Data + Label
• Supervised learning: the learner (learning algorithm) are trained on labeled examples, i.e.,
input where the desired output is known.
• Unsupervised learning: the learner operates on unlabeled examples, i.e., input where the
desired output is unknown.
grouping/clustering
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 6 / 23
Types of Machine Learning
• Reinforcement learning: between supervised and unsupervised learning. It is told when an
answer is wrong, but not how to correct it.
• Evolutionary learning: biological evolution can be seen as a learning process, to improve
survival rates and chance of having offspring.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 7 / 23
Types of Machine Learning
• The most common type: supervised learning.
– Classification: to find the class of an instance given its selected features.
– Regression: to find a function whose curve passes as close as possible to all of the given
data points.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 8 / 23
Phases of Machine Learning
How many phase do we have in machine
learning?
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 9 / 23
Phases of Machine Learning
BRK O YP ?KMRS O OK S Q
F KS S Q FO S Q 4 VcS Q
F KS S Q FO S Q DOKV
7K K 7K K 7K K
Validation set
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 10 / 23
Phases of Machine Learning
avg performance
• K-fold cross validation: (for small model and small data)
– Randomly partitioned k equal sized sub-samples.
– k - 1 for training and 1 for testing.
– k times (folds) of validation and taking the average.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 11 / 23
Phases of Machine Learning
Statistical significance test: to reject the null-hypothesis that the two
compared systems are equivalently efficient although their performance measures
are different.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 12 / 23
Phases of Machine Learning
loss of test increase
underfitting
-> overfitting
good checkpoint
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 13 / 23
Phases of Machine Learning
Overfitting
• There is noise in the data
• The number of training examples is too small to produce a representative sample of the
target concept.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 14 / 23
Performance Measures
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 15 / 23
Performance Measures
số ng dự đoán bị thật sự / số ng hệ thống dự đoán bị
• Precision:
number of correct system answers
P =
number of system answers
• Recall:
number of correct system answers
R=
number of correct problem answers
số ng dự đoán bị thật sự / số ng bị trong dataset
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 16 / 23
Performance Measures
Trade-off Precision vs Recall
TP
P recision =
TP + FP
TP
Recall =
TP + FN
TP + TN
Accuracy =
TP + TN + FP + FN
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 17 / 23
Performance Measures
F1 score: want to seek a balance between Precision and Recall
It is good when there is an uneven class distribution.
P ∗R
F1 = 2
P +R
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 18 / 23
Inductive Bias
Price
N ??
Example Quality Price Buy
Quality 1 Good Low Yes
2 Bad High No
Y 3 Good High ?
??
4 Bad Low ?
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 19 / 23
Inductive Bias
• A learner that makes no prior assumptions regarding the identity of the target
concept cannot classify any unseen instances.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 20 / 23
Inductive Bias
• A learner that makes no prior assumptions regarding the identity of the target
concept cannot classify any unseen instances.
• A learner that makes no a priori assumptions regarding the identity of the target concept
has no rational basic for classifying any unseen instances.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 20 / 23
Inductive Bias
• A learner that makes no prior assumptions regarding the identity of the target
concept cannot classify any unseen instances.
• A learner that makes no a priori assumptions regarding the identity of the target concept
has no rational basic for classifying any unseen instances.
• The inductive bias (learning bias): the set of assumptions that the learner uses to predict
outputs given inputs that it has not encountered.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 20 / 23
Inductive Bias
Common inductive bias in ML:
• Maximum conditional independence: if the hypothesis can be cast in a Bayesian
framework, try to maximize conditional independence (Naive Bayes classifier).
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 21 / 23
Inductive Bias
Common inductive bias in ML:
• Maximum conditional independence: if the hypothesis can be cast in a Bayesian
framework, try to maximize conditional independence (Naive Bayes classifier).
• Minimum cross-validation error: when trying to choose among hypotheses, select the
hypothesis with the lowest cross-validation error.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 21 / 23
Inductive Bias
Common inductive bias in ML:
• Maximum conditional independence: if the hypothesis can be cast in a Bayesian
framework, try to maximize conditional independence (Naive Bayes classifier).
• Minimum cross-validation error: when trying to choose among hypotheses, select the
hypothesis with the lowest cross-validation error.
• Maximum margin: when drawing a boundary between two classes, attempt to maximize
the width of the boundary (SVM). The assumption is that distinct classes tend to be
separated by wide boundaries.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 21 / 23
Inductive Bias
Common inductive bias in ML:
• Minimum description length: when forming a hypothesis, attempt to minimize the
length of the description of the hypothesis. The assumption is that simpler hypotheses are
more likely to be true.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 22 / 23
Inductive Bias
Common inductive bias in ML:
• Minimum description length: when forming a hypothesis, attempt to minimize the
length of the description of the hypothesis. The assumption is that simpler hypotheses are
more likely to be true.
• Minimum features: unless there is good evidence that a feature is useful, it should be
deleted.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 22 / 23
Inductive Bias
Common inductive bias in ML:
• Minimum description length: when forming a hypothesis, attempt to minimize the
length of the description of the hypothesis. The assumption is that simpler hypotheses are
more likely to be true.
• Minimum features: unless there is good evidence that a feature is useful, it should be
deleted.
• Nearest neighbors: assume that most of the cases in a small neighborhood in feature
space belong to the same class.
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 22 / 23