Instance-Based Learning
Lecture # 19
25th April, 2012
(Chapter 8)
Instance-Based Learning
Learning in these algorithms consists of storing the presented training
data
When a new query instance is encountered,
a set of similar related instances is retrieved
used to classify the new query instance
Instance-based approaches can construct a different approximation to
the target function for each distinct query instance
In fact, many techniques construct only a local approximation to the
target function
Instance-Based Learning
One disadvantage is that the cost of classifying new instances can be
high
Nearly all computation takes place at classification time rather than
when the training examples are first encountered
A second disadvantage is that they typically consider all attributes of
the instances when attempting to retrieve similar training examples
If the target concept depends on only a few attributes, then the
instances that are truly most "similar" may well be a large distance
apart
K-nearest Neighbor Learning
This algorithm assumes all instances correspond to points in the n-
dimensional space
The nearest neighbors of an instance are defined in terms of the standard
Euclidean distance
Let an arbitrary instance x be described by the feature vector
Then the distance between two instances xi and xj is defined to be
K-nearest Neighbor Learning (contd…)
K-nearest Neighbor Learning (contd…)
Voronoi Diagrams
The decision surface is a combination of convex
polyhedra surrounding each of the training
examples
Polyhedron indicates the set of query points
whose classification will be completely
determined by that training example
Query points outside the polyhedron are closer
to some other training example
This kind of diagram is often called the Voronoi
diagram of the set of training examples
Continuous-valued Target Functions
We have the algorithm calculate the mean value of the k-nearest
training examples rather than calculate their most common value
Distance-Weighted NN Algorithm
Weight the contribution of each of the k neighbors according to their distance to
the query point, giving greater weight to closer neighbors
Once we add distance weighting, there is no harm in allowing all
training examples to have an influence on the classification of the x
The only disadvantage of considering all examples is that classifier will
run more slowly
If all training examples are considered, we call the algorithm a global
method
If only the nearest training examples are considered, we call it a local
method
NN Learning
Robust to noisy training data and quite effective when it is provided a
sufficiently large set of training data
What is the inductive bias?
The inductive bias corresponds to an assumption that the classification of
an instance x, will be most similar to the classification of other instances
that are nearby in Euclidean distance
One practical issue is that the distance between instances is calculated
based on all attributes of the instance – Curse of dimensionality
To overcome problem, each attribute should be weighed differently
Lazy Learners
Lazy learning methods
defer the decision of how to generalize beyond the training data
until a new query instance is observed
Classify new query instances
by analyzing similar instances while ignoring instances that are
very different from the query
They represent instances
as real-valued points in an n-dimensional Euclidean space
Case-based Reasoning (CBR)
Case-based reasoning (CBR) is a learning paradigm based on the
first two of these principles, but not the third
In CBR, instances are typically represented using more rich
symbolic descriptions
The methods used to retrieve similar instances are
correspondingly more elaborate
CBR has been applied to problems such as conceptual design of
mechanical devices based on a stored library of previous designs,
reasoning about new legal cases based on previous rulings
Case-based Reasoning (CBR)
Lazy vs Eager Learners
There are differences in computation time between eager and
lazy methods
Lazy methods will generally require less computation during
training, but more computation when they predict
Lazy methods may consider the query instance x, when deciding
how to generalize beyond the training data D.
Eager methods cannot. By the time they observe the query
instance x, they have already chosen their (global) approximation
to the target function.
Lazy vs Eager Learners (contd…)
A lazy learner has the option of (implicitly) representing the
target function by a combination of many local approximations
An eager learner must commit at training time to a single global
approximation