Course : Artificial Intelligence (COMP6065)
Non-official Slides
Knowledge in Learning
Session 20
Revised by Williem, S. Kom., Ph.D.
1
Learning Outcomes
At the end of this session, students will be able to:
• LO 5 : Apply various techniques to an agent when acting under
certainty
• LO 6 : Apply how to process natural language and other
perceptual signs in order that an agent can interact
intelligently with the world
2
Outline
1. A Logical Formulation of Learning
2. Knowledge in Learning
3. Explanation Based Learning
4. Learning Using Relevance Information
5. Inductive Logic Programming
6. Summary
3
A Logical Formulation of Learning
• In this session, the hypotheses are represented by a set of
logical sentences
– Allows for incremental construction of hypotheses
– Allows for prior knowledge
• Example
– The restaurant learning problem can be described as
4
A Logical Formulation of Learning
• Study case
– Change the example in logical representation
5
A Logical Formulation of Learning
• Thus, a decision tree can be
interpreted in logical
representation
6
A Logical Formulation of Learning
• As the example arrive, hypotheses that are not consistent with
the examples can be ruled out
• What is the meaning when an example is consistent?
– False negative: When the hypothesis says that it is
negative, but in fact it is positive
– False positive: When the hypothesis says that it is positive,
but in fact it is negative
7
A Logical Formulation of Learning
Hypothesis False negative False positive
8
A Logical Formulation of Learning
• How to find the logically consistent hypothesis?
– Current-best-hypothesis search
• To maintain a single hypothesis, and to adjust it as new
examples arrive
– Least-commitment search
• Candidate elimination algorithm has an incremental
property, where one never has to god back and
reexamine the old examples
9
Current-best-hypothesis Search
• Suppose we have some hypothesis hr
– As long as each new example is consistent, we do nothing
– If there is a false negative example?
• Generalization: We extend the hypothesis region
– If there is a false positive example?
• Specialization: We reduce the hypothesis region
10
Current-best-hypothesis Search
(extensions of predictor Hr)
Initial False False
hypothesis negative positive
a generalization a specialization
Generalization e.g. via dropping conditions
Alternate(x)Patrons(x, Some) Patrons(x, Some)
Specialization e.g. via adding conditions or via removing disjuncts
Alternate(x) Alternate(x)Patrons(x, Some)
11
Least-commitment search
• Assuming the original hypothesis space is the right answer, the
reduced disjunction must still contain the right answer because
only incorrect hypotheses have been removed
• The set of hypotheses remaining are called version space
• The learning algorithm is called version space learning or
candidate elimination algorithm
• The problem is how to deal with all remaining hypotheses?
– HUGE!
12
Least-commitment search
• The problem is similar to how we represent all numbers between
1 and 2
– Using boundary!!
• The boundary in the version space is generalization/specialization
– A most general boundary (G-set)
– A most specific boundary (S-set)
• Everything in between is guaranteed to be consistent with the
examples
13
Version Space
14
Version Space Learning
• Algorithm:
– False positive for Si: Si is too general, and there are no
consistent specializations for Si, so throw Si out of S-Set
– False negative for Si: Si is too specific, so replace it with
all its immediate generalizations.
– False positive for Gi: Gi is too general, so replace it with
all its immediate specializations.
– False negative for Gi: Gi is too specific, but there are no
consistent generalizations of Gi, so throw Gi out of G-Set
15
Version Space Learning
• We continue for each example until:
– We have exactly one hypothesis left
– The version space collapses – either S-set or G-set empty
– We run out of examples and have several hypotheses
remaining
16
Candidate Elimination Algorithm
• Given : A representation language and a set of positive
and negative examples expressed in that language.
• Compute : A concept description that is consistent with all
the positive examples and none of the negative examples.
1. Initialize G to contain one element : the null description (all
features are variables)
2. Initialize S to contain one element : the first positive
example
17
Candidate Elimination Algorithm
3. Accept a new training example.
• If it is a positive example
– Remove from G any descriptions that do not cover the
example
– Update the S set to contain the most specific set of
descriptions in the version space that cover the example
and the current element of the S set
18
Candidate Elimination Algorithm
• If it is a negative example
– Remove from S any descriptions that cover the example
– Update the G set to contain the most general set of
descriptions in the version space that do not cover the
example
• If S and G are both singleton sets, then if they are identical,
output their value and halt.
• If they are both singleton sets but they are different, then the
training cases were inconsistent. Output this result and halt.
• Otherwise, go to step 3.
19
Candidate Elimination
Algorithm
G : { obj(X,Y,Z) }, S: { } Pos : obj(small,red,ball)
Initialize G to be the most general concept in the space, and
Initialize S to the first pos. training instance
G : { obj(X,Y,Z) }, S :{ obj(small, red, ball) }
20
Candidate Elimination
Algorithm
G : { obj(X,Y,Z) }, S: { } Pos : obj(small,red,ball)
Initialize G to be the most general concept in the space, and
Initialize S to the first pos. training instance
G : { obj(X,Y,Z) }, S :{ obj(small, red, ball) } Neg : obj(small,blue,ball)
Delete all members of S that match n
For each g that match n, replace g with its most general specialization
that do not match n
G : { obj(X,red,Z) }, S :{ obj(small, red, ball) }
21
Candidate Elimination
Algorithm
G : { obj(X,Y,Z) }, S: { } Pos : obj(small,red,ball)
Initialize G to be the most general concept in the space, and
Initialize S to the first pos. training instance
G : { obj(X,Y,Z) }, S :{ obj(small, red, ball) } Neg : obj(small,blue,ball)
Delete all members of S that match n
For each g that match n, replace g with its most general specialization
that do not match n
G : { obj(X,red,Z) }, S :{ obj(small, red, ball) } Pos : obj(large,red,ball)
Delete all members of G that fail to match p
For every s, if s does not match p, replace s with its most specific
generalization that match p
G : { obj(X,red,Z) }, S :{ obj(X, red, ball) }
22
Candidate Elimination
Algorithm
G : { obj(X,Y,Z) }, S: { } Pos : obj(small,red,ball)
Initialize G to be the most general concept in the space, and
Initialize S to the first pos. training instance
G : { obj(X,Y,Z) }, S :{ obj(small, red, ball) } Neg : obj(small,blue,ball)
Delete all members of S that match n
For each g that match n, replace g with its most general specialization
that do not match n
G : { obj(X,red,Z) }, S :{ obj(small, red, ball) } Pos : obj(large,red,ball)
Delete all members of G that fail to match p
For every s, if s does not match p, replace s with its most specific
generalization that match p
G : { obj(X,red,Z) }, S :{ obj(X, red, ball) } Neg: obj(large,red,cube)
Delete all members of S that match n
For each g that match n, replace g with its most general specialization
that do not match n
23
G : { obj(X,red,ball) }, S :{ obj(X, red, ball) }
Candidate Elimination
Algorithm
Positive and Negative Examples of the Concept “Japanese economy car”
origin : Japan origin : Japan origin : Japan
mfr : Honda mfr : Toyota mfr : Toyota
color : Blue color : Green color : Blue
decade : 1980 decade : 1970 decade : 1990
type : Economy type : Sports type : Economy
(+) (-) (+)
origin : USA origin : Japan
mfr : Chrysler mfr : Honda
color : Red color : White
decade : 1980 decade : 1980
type : Economy type : Economy
(-) (+)
24
Candidate Elimination
Algorithm
A Version Space Example
• G = {(x1,x2,x3,x4,x5)}
S = {(Japan, Honda, Blue, 1980, Economy)}
• G = {(x1, Honda,x3,x4,x5) ,(x1,x2,Blue,x4,x5),
(x1,x2,x3,1980,x5),(x1,x2,x3,x4,Economy)}
• S = {(Japan, Honda, Blue, 1980, Economy)}
• G = {(x1,x2, Blue, x4,x5), (x1,x2,x3,x4,Economy)}
• S = {(Japan, x2, Blue, x4, Economy)}
• G = {(x1,x2, Blue, x4,x5), (x1,x2, Blue,x4,Economy),
(Japan,x2,x3,x4, Economy)}
• S = {(Japan,x2, Blue,x4, Economy)}
• G = {(Japan,x2,x3,x4,Economy)}
• S = {(Japan,x2,x3,x4,Economy)} 25
Knowledge in Learning
• Let Descriptions denote the conjunction of all the example
descriptions in the training set, and let Classifications denote
the conjunction of all the example classification.
• Then a Hypothesis that "explains the observations" must
satisfy the following property (recall that |= means "logically
entails"):
Hypothesis ۸ Descriptions |= Classifications
26
Knowledge in Learning
27
Knowledge in Learning
• There are three kinds of learning using prior knowledge:
– Explanation-based learning (EBL)
– Relevance-based learning (RBL)
– Knowledge-based inductive learning (KBIL)
• Inductive logic programming (ILP)
28
Explanation Based Learning
• A method for extracting general rules from individual
observations
• As example, consider the problem of differentiation!
– X2 = 2X
• Compare the student with and without calculus
knowledge
• Which one is faster?
29
Explanation Based Learning
• The technique of memoization has long been used in
computer science to speed up programs by saving the results
of computation.
• The basic idea of memo functions is to accumulate a database
of input—output pairs; when the function is called, it first
checks the database to see whether it can avoid solving the
problem from scratch.
• Explanation –based learning takes this a good deal further, by
creating general rules that cover an entire class of cases.
30
Learning Using Relevance Information
• A method for extracting general rules from the relevance
between prior knowledge and observations to explain the
observations
• RBL does not produce hypotheses that go beyond the logical
content of the background knowledge
– Cannot create a new knowledge from scratch
31
Learning Using Relevance Information
• Functional dependencies or determinations occur so
commonly in certain kinds of applications
• Determinations specify a sufficient basis vocabulary from which
to construct hypotheses concerning the target predicate
32
Inductive Logic Programming
• A method for extracting general rules from the background
knowledge and new hypothesis to explain the examples
• Inductive logic programming (ILP) combines inductive
methods with the power of first-order representations,
concentrating in particular on the representation of
hypotheses as logic programs.
33
Inductive Logic Programming
• It has gained popularity for three reasons :
1. ILP offers a rigorous approach to the general knowledge-
based inductive learning problem.
2. ILP offers complete algorithms for inducing general, first-
order theories from examples, which can therefore learn
successfully in domains where attribute-based algorithms
are hard to apply.
3. Inductive logic programming produces hypotheses that
are (relatively) easy for humans to read.
34
Inductive Logic Programming
• Example
– parent_of(charles,george)
– parent_of(george,diana)
– parent_of(bob,harry)
– parent_of(harry,elizabeth)
– grandparent_of(X,Y) : parent_of(X,Z), parent_of(Z,Y)
• Question query Answer
• grandparent_of(X,Y)? • grandparent_of(charles,diana)
• grandparent_of(bob,elizabeth)
35
Summary
• The use of prior knowledge in learning leads to a picture of
cumulative learning
• Prior knowledge helps learning by
– Eliminating otherwise consistent hypotheses
– Filling in the explanation examples, allowing shorter
hypotheses
36
References
• Stuart Russell, Peter Norvig. 2010. Artificial Intelligence : A
Modern Approach. Pearson Education. New Jersey.
ISBN:9780132071482
• http://aima.cs.berkeley.edu
37
Version Space
Case Study
38
Candidate Elimination
Algorithm
Exercises : How About The Concept of Elephant ?
39