MODULE -2
Concept Learning
Topics
• Concept learning task
• Concept Learning as search through a hypothesis
space
• Maximally Specific Hypotheses (FIND-S Algorithm)
• Version Spaces
• The Candidate Elimination Algorithm
• Inductive Bias
Concept Learning
• Learning involves acquiring general concepts from specific training examples.
Example: People continually learn general concepts or categories such as "bird,"
"car," "situations in which I should study more in order to pass the exam," etc.
• Each such concept can be viewed as describing some subset of objects or events
defined over a larger set
• Alternatively, each concept can be thought of as a Boolean-valued function
defined over this larger set. (Example: A function defined over all animals, whose
value is true for birds and false for other animals).
• Concept learning - Inferring a Boolean-valued function from training examples of its
input and output.
2
Notations
• The set of items over which the concept is defined is called the set of instances, which
we denote by X.
Example: X is the set of all possible days to play water sport, each represented by the
attributes: Sky, AirTemp, Humidity, Wind, Water, and Forecast
• The concept or function to be learned is called the target concept, which we denote by c. c
can be any Boolean valued function defined over the instances X. c : X {0, 1}
Example: The target concept corresponds to the value of the attribute EnjoySport
(i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).
• Instances for which c(x) = 1 are called positive examples, or members of the
target concept.
• Instances for which c(x) = 0 are called negative examples, or non-members of the
target concept.
• The ordered pair (x, c(x)) to describe the training example consisting of the
instance x and its target concept value c(x).
• D to denote the set of available training examples
• The symbol H to denote the set of all possible hypotheses that the learner may
consider regarding the identity of the target concept. Each hypothesis h in H
represents a Boolean-valued function defined over X
h:X {O, 1}
• The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X.
A Concept Learning Task
Consider the example task of learning the target concept
"Days on which my friend Vasu enjoys his favorite water sport."
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
The attribute EnjoySport indicates whether or not a Person enjoys his favorite
water sport on this day.
The task is to learn to predict the value of
EnjoySport for an arbitrary day, based on the
values of its other attributes ?
What hypothesis representation is provided to the learner?
Let’s consider a simple representation in which each hypothesis consists of a
conjunction of constraints on the instance attributes.
Let each hypothesis be a vector of six constraints, specifying the values of the six
attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.
For each attribute, the hypothesis will either
• Indicate by a "?' that any value is acceptable for this attribute,
• Specify a single required value (e.g., Warm) for the attribute, or
• Indicate by a "Φ" that no value is acceptable
If some instance x satisfies all the constraints of hypothesis h, then h classifies
x as a positive example (h(x) = 1).
The hypothesis that PERSON enjoys his favorite sport only on cold days with high
humidity (independent of the values of the other attributes) is represented by the
expression
<?, Cold, High, ?, ?, ?>
The most general hypothesis-that every day is a positive example-is represented by
<?, ?, ?, ?, ?, ?>
The most specific possible hypothesis-that no day is a positive example-is
represented by
<Φ , Φ, Φ, Φ, Φ, Φ>
EnjoySport Concept Learning Task
The Inductive Learning Hypothesis
Any hypothesis found to approximate the target
function well over a sufficiently large set of training
examples will also approximate the target function well
over other unobserved examples.
Concept learning as Search
• Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined
by the hypothesis representation.
• The goal of this search is to find the hypothesis that best fits the training
examples.
For example, the instances X and hypotheses H in the EnjoySport learning task. The attribute Sky has three possible
values, and AirTemp, Humidity, Wind, Water Forecast each have two possible values, the instance space X contains
exactly
• 3.2.2.2.2.2 = 96 Distinct instances
• 5.4.4.4.4.4 = 5120 Syntactically distinct hypotheses within H.
Every hypothesis containing one or more " Φ" symbols represents the empty set of instances; that is, it classifies
every instance as negative.
• 1 + (4.3.3.3.3.3) = 973 Semantically distinct hypotheses within H.
General-to-Specific Ordering of Hypotheses
• Consider the two hypotheses
h1 = <Sunny, ?, ?, Strong, ?, ?>
h2 = <Sunny, ?, ?, ?, ?, ?>
• Consider the sets of instances that are classified positive by hl and by h2.
• h2 imposes fewer constraints on the instance, it classifies more instances as
positive. So, any instance classified positive by hl will also be classified positive
by h2. Therefore, h2 is more general than hl.
General-to-Specific Ordering of Hypotheses
• Given hypotheses hj and hk, hj is more-general-than or- equal do hk if and only
if any instance that satisfies hk also satisfies hi
Definition: Let hj and hk be Boolean-valued functions defined over X. Then hj is
more general-than-or-equal-to hk (written hj ≥ hk) if and only if
• In the figure, the box on the left
represents the set X of all
instances, the box on the right the
set H of all hypotheses.
• Each hypothesis corresponds to
some subset of X-the subset of
instances that it classifies positive.
• The arrows connecting hypotheses
represent the more - general -than
relation, with the arrow pointing
toward the less general hypothesis.
• Note the subset of instances
characterized by h2 subsumes the
subset characterized by h l , hence
h2 is more - general– than h1
F IN D- S: Finding a Maximally Specific
Hypothesis
FIND-S Algorithm:
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
To illustrate this algorithm, assume the learner is given the sequence of training
examples from the EnjoySport task
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
The first step of FIND-S is to initialize h to the most specific hypothesis in H
h = <Ø, Ø, Ø, Ø, Ø, Ø>
x1 = <Sunny, Warm, Normal, Strong, Warm, Same>, +
Observing the first training example, it is clear that our hypothesis is too specific. In
particular, none of the "Ø" constraints in h are satisfied by this example, so each is
replaced by the next more general constraint that fits the example
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
This h is still very specific; it asserts that all instances are negative except for the single
positive training example
x2 = <Sunny, Warm, High, Strong, Warm, Same>, +
The second training example forces the algorithm to further generalize h, this time
substituting a "?' in place of any attribute value in h that is not satisfied by the new
example
h2 = <Sunny, Warm, ?, Strong, Warm, Same>
x3 = <Rainy, Cold, High, Strong, Warm, Change>, -
Upon encountering the third training the algorithm makes no change to h. The
FIND-S algorithm simply ignores every negative example.
h3 = < Sunny, Warm, ?, Strong, Warm, Same>
x4 = <Sunny, Warm, High, Strong, Cool, Change>, +
The fourth example leads to a further generalization of h
h4 = < Sunny, Warm, ?, Strong, ?, ? >
The key properties of the FIND-S algorithm:
• FIND-S is guaranteed to output the most specific hypothesis within H that is
consistent with the positive training examples
• FIND-S algorithm’s final hypothesis will also be consistent with the negative
examples provided the correct target concept is contained in H, and provided the
training examples are correct.
Unanswered by FI ND-S
1. Has the learner converged to the correct target concept?
2. Why prefer the most specific hypothesis?
3. Are the training examples consistent?
4. What if there are several maximally specific consistent hypotheses?