Machine Learning
Part 2
Supervised Learning
Supervised Learning
Supervised learningis themachine
learningtask of inferring a function
fromsupervised(labeled) training data.
(Wikipedia)
Thetraining dataconsist of a set oftraining
examples.
In supervised learning, each example is
apairconsisting of an input object (typically a
vector) and a desired output value.
A supervised learning algorithm analyzes the
training data and produces an inferred function,
which is called aclassifier(if the output is
discrete) or aregression function(if the output is
Learning a Class from Examples
Class C of a family car
Prediction: Is car x a family car?
Knowledge extraction: What do people expect from a family car?
Output:
Positive (+) and negative () examples
Input representation:
x : price, x : engine power
1
2
Training set X
X {xt ,rt }tN1
1 if x is positive
r
0 if x is negative
x1
x
x2
Class C
p1 price p2 AND e1 engine
power e2
Hypothesis class H
1 if h says x is positive
h( x)
0 if h says x is negative
Error of h on H
E(h| X ) 1 h xt rt
N
t 1
S, G, and the Version Space
most specific hypothesis, S
most general hypothesis, G
h H, between S and G is
consistent
and make up the
version space
(Mitchell, 1997)
Margin
Choose h with largest margin
Steps in Supervised
Learning
Determine the type of training examples. In the
case of handwriting analysis, for example, this
might be a single handwritten character, an
entire handwritten word, or an entire line of
handwriting.
Gather a training set. The training set needs to
be representative of the real-world use of the
function. Thus, a set of input objects is gathered
and corresponding outputs are also gathered,
either from human experts or from
measurements.
Steps in Supervised
Learning
Determine the input feature representation of the
learned function. The accuracy of the learned
function depends strongly on how the input
object is represented. Typically, the input object
is transformed into a feature vector, which
contains a number of features that are
descriptive of the object. The number of features
should not be too large, because of thecurse of
dimensionality; but should contain enough
information to accurately predict the output.
Determine the structure of the learned function
and corresponding learning algorithm. For
example, the engineer may choose to
usesupport vector machinesordecision trees.
Steps in Supervised
Learning
Complete the design. Run the learning algorithm
on the gathered training set. Some supervised
learning algorithms require the user to determine
certain control parameters. These parameters
may be adjusted by optimizing performance on a
subset (called avalidationset) of the training set,
or viacross-validation.
Evaluate the accuracy of the learned function.
After parameter adjustment and learning, the
performance of the resulting function should be
measured on a test set that is separate from the
training set.
Facts in Supervised
Learning
A wide range of supervised learning algorithms is
available, each with its strengths and
weaknesses.
There is no single learning algorithm that works
best on all supervised learning problems (No free
lunch theorem).
Noise and Model Complexity
Use the simpler one because
Simpler to use
(lower computational
complexity)
Easier to train (lower
space complexity)
Easier to explain
(more interpretable)
Generalizes better (lower
variance - Occams razor)
simpler explanations are more plausible and any unnecessary complexity should be shaved off.
Multiple Classes, Ci i=1,...,K
X {xt ,rt }tN1
t
1
if
x
Ci
t
ri
t
0
if
x
C j , j i
Train hypotheses
h (x), i =1,...,K:
i
t
1
if
x
Ci
t
hi x
t
0
if
x
C j , j i
Regression
X x ,r
t
t N
t 1
g x w1 x w0
rt
g x w2 x2 w1 x w0
rt f xt
Emperical error
1 N t
t 2
E g| X r g x
N t 1
1 N t
2
t
E w1 ,w0 | X r w1 x w0
N t 1
Model Selection & Generalization
Learning is an ill-posed problem; data is
not sufficient to find a unique solution
Generalization: How well a model
performs on new data
Overfitting: H more complex than C or f
Underfitting: H less complex than C or f
Triple Trade-Of
There is a trade-of between three
factors (Dietterich, 2003):
1.
2.
3.
Complexity of H, c (H),
Training set size, N,
Generalization error, E, on new data
As N
E
As c (H)
first Eand then E
Cross-Validation
To estimate generalization error, we need
data unseen during training. We split the
data as
Training set (50%)
Validation set (25%)
Test (publication) set (25%)
Resampling when there is few data
Dimensions of a Supervised
Learner
1.
Model:
2.
Loss function:
g x |
E | X L rt ,g xt |
t
3.
Optimization procedure:
* argminE | X