0% found this document useful (0 votes)
17 views27 pages

AI Learning Techniques Explained

The document discusses different types of machine learning including rote learning, learning by taking advice, and learning in problem solving. Rote learning involves storing data for faster future retrieval. Learning by advice involves translating human advice into operational concepts. Learning in problem solving involves generalizing from experiences and adjusting parameters like weights to improve performance.

Uploaded by

canusha820
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views27 pages

AI Learning Techniques Explained

The document discusses different types of machine learning including rote learning, learning by taking advice, and learning in problem solving. Rote learning involves storing data for faster future retrieval. Learning by advice involves translating human advice into operational concepts. Learning in problem solving involves generalizing from experiences and adjusting parameters like weights to improve performance.

Uploaded by

canusha820
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Artificial Intelligence, VII SEM, CSE, Module - V

Module V
CHAPTER-17
LEARNING

 WHAT IS LEARNING?

 Learning denotes changes in the system that are adaptive in the sense that they enable the
system to do the same task or tasks drawn from the same population more efficiently and
more effectively the next time.

 ROTE LEARNING

 When a computer stores a piece of data, it is performing a rudimentary (simple) form of


learning. After all, this act of storage most probably allows the program to perform better in
the future. In the case of data caching, we store computed values so that we do not have to
recompute them later. When computation is more expensive than recall, this strategy can save a
significant amount of time. Caching has been used in AI programs to produce some surprising
performance improvements. Such caching is known as rote learning.
 Samuel’s checkers program learned to play checkers well enough to beat its creator. It
exploited two kinds of learning: rote learning and parameter (or coefficient) adjustment.
Samuel’s program used the minimax search procedure to explore checkers game trees. Here,
time constraints permitted it to search only a few levels in the tree. When it could search no
deeper, it applied its static evaluation function to the board position and used that score to
continue its search of the game tree. When it finished searching the tree and propagating the
values backward, it had a score for the position represented by the root of the tree. It could
then choose the best move and make it. But it also recorded the board position at the root of
the tree and the backed up score that had just been computed for it. This situation is shown in
Figure below (a).
Figure: Storing Backed-Up Values

Now suppose that in a later game, the situation shown in Figure above (b) were to arise.
Instead of using the static evaluation function to compute a score for position A, the stored

Dept., of CS&E,SCE Page 1


Artificial Intelligence, VII SEM, CSE, Module - V
value for A can be used. This creates the effect of having searched an additional several ply
since the stored value for A was computed by backing up values from exactly such a search.
 The capabilities of route learning which are important in more complex learning include:

Organized Storage of Information: In order to use a stored value faster than re-
computation of it, there must be a way to access the appropriate stored value quickly. In
Samuel’s program, this was done by indexing board positions by a few important
characteristics, such as the number of pieces. But as the complexity of the stored
information increases, more sophisticated techniques are necessary.
Generalization: The number of distinct objects that might potentially be stored can be very
large. To keep the number of stored objects down to a manageable level, some kind of
generalization is necessary. In Samuel’s program, for example, the number of distinct objects
that could be stored was equal to the number of different board positions that can arise in a
game.

 LEARNING BY TAKING ADVICE

 A computer can do very little without a program for it to run. When a programmer writes a
series of instructions into a computer, a rudimentary kind of learning takes place. The
programmer is a sort of teacher, and the computer is a sort of student. After being
programmed, the computer is now able to do something it previously could not. Suppose the
program is written in a high –level language like LISP. Some interpreter or compiler must
intervene to change the teacher’s instructions into code that the machine can execute directly.
 People process advice in a similar way. In chess, the advice “fight for control of the center of
the board” is useless unless the player can translate the advice into concrete moves and
plans. A computer program might make use of the advice by adjusting its static evaluation
function to include a factor based on the number of center squares attacked by its own pieces.
 Mostow [1983], describes a program called FOO, which accepts advice for playing hearts, a
card game. A human user first translates the advice from English into a representation that
FOO can understand. For example, “Avoid taking points” becomes:
(avoid (take-points me)(trick))
 FOO must operationalize this advice by turning it into an expression that contains concepts
and actions, FOO can use when playing the game of hearts. One strategy FOO can follow is to
UNFOLD (clarify) an expression by replacing some term by its definition. By UNFOLDing the
definition of avoid, FOO comes up with:
(achieve (not (during (trick) (take-points me))))
 FOO considers the advice to apply to the player called “me”. Next, FOO UNFOLDs the
definition of trick:
(achieve (not
(during
(scenario
(each p1 (players) (play-card
p1)) (take-trick (trick-winner)))
Dept., of CS&E,SCE Page 2
Artificial Intelligence, VII SEM, CSE, Module - V
(take-points me))))

Dept., of CS&E,SCE Page 3


Artificial Intelligence, VII SEM, CSE, Module - V
 In other words, the player should avoid taking points during the scenario consisting of (1)
players playing cards and (2) one player taking the trick. FOO then uses case analysis to
determine which steps could cause one to take points. It rules out step 1 on the basis that it
knows of no intersection of the concepts take-points and play-card. But step 2 could affect
taking points, so FOO UNFOLDs the definition of take-points:
(achieve (not (there-exists c1 (cards-played)
(there-exists c2 (point-cards)
(during (take (trick-winner)
c1) (take me c2 ))))))
 This advice says that the player should avoid taking point-cards during the process of the
trick-winner taking the trick. The question for FOO now is: Under what conditions does (take
me c2) occur during (take (trick-winner) c1)? By using a technique called partial match, FOO
hypothesizes that points will be taken if me = trick-winner and c2 =c1. It transforms the advice
into:
(achieve (not (and (have-points (cards-played))
(= (trick-winner) me))))
 This means “Do not win a trick that has points”. We have not traveled very far conceptually
from “avoid taking points,” but it is important to note that the current vocabulary is one that
FOO can understand in terms of actually playing the game of hearts. Through a number of other
transformations, FOO eventually settles on:
(achieve (>= (and (in-suit-led (card-of me ))
(possible (trick-has-points)))
(low (card-of me)))
 In other words, when playing a card that is the same suit as the card that was played first, if the
trick possibly contains points, then play a low card. At last, FOO has translated the rather vague
advice “avoid taking points” into a specific, usable [Link] is able to play a better game
of hearts after receiving this advice.

 LEARNING IN PROBLEM-SOLVING

 The problem-solver could improve its performance by taking advice from a teacher. Can a
program get better without the aid of a teacher? It can, by generalizing from its own
experiences.

 Learning by Parameter Adjustment


 Many programs rely on an evaluation procedure that combines information from several
sources into a single summary statistic. Game-playing programs do this in their static
evaluation functions, in which a variety of factors are combined into a single score reflecting
the desirability.
 In designing such programs, it is often difficult to know a priori how much weight should be
attached to each feature being used. One way of finding the correct weights is to begin with
some estimate of the correct settings and then to let the program modify the settings on the

Dept., of CS&E,SCE Page 4


Artificial Intelligence, VII SEM, CSE, Module - V
basis of its experience. Features that appear to be good predictors of overall success will have
their weights increased, while those that do not will have their weights decreased, maybe
even to the point of being dropped entirely.
 Samuel’s checkers program exploited this kind of learning in addition to the rote learning
described above, and it provides a good example of its use. As its static evaluation function,
the program used a polynomial of the form c1t1 + c2t2 +.....................+ c16t16
 The t terms are the values of the sixteen features that contribute to the evaluation. The c
terms are the co efficient (weights) that are attached to each of these values. As learning
progresses, the c values will change.
 The most important question in the design of a learning program based on parameter
adjustment is “when should the value of a coefficient be increased and when should it be
decreased?” The second question to be answered is then “By how much should the value be
changed?” The simple answer to the first question is that the coefficients of terms that
predicted the final outcome accurately should be increased, while the coefficients of poor
predictors should be decreased.

 Learning with Macro-Operators


 The idea is the same: to avoid expensive re-computation. For example, suppose you are
faced with the problem of getting to the downtown post office. The solution may involve
getting in the car, starting it, and driving along a certain route. Substantial planning may go
into choosing the appropriate route, but need not plan about how to go about starting car.
Treat START-CAR as an atomic action, even though it really consists of several actions:
sitting down, adjusting the mirror, inserting the key, and turning the key. Sequences of actions
that can be treated as a whole are called macro-operators.
 Macro-operators were used in the early problem-solving system STRIPS. Suppose we are
given an initial blocks world situation in which ON(C, B) and ON(A, Table) are both true.
STRIPS can achieve the goal ON(A, B) by devising a plan with the four steps UNSTACK(C,
B), PUTDOWN(C) ,PICKUP(A),STA∙K(A,B). STRIPS now builds a MACROP with
preconditions ON(C, B), ON(A, Table) and postconditions ON(C, Table), ON(A,B). The
body of the MACROP consists of the four steps just mentioned. In future planning, STRIPS is
free to use this complex macro-operator just as it would use any other operator.
C A

B B
A C

Start: ON(C, B) Goal: ON (A, B)


 Generalization is not so easy, however. Sometimes constants must retain their specific values.
Suppose our domain included an operator called STACK-ON-B(x), with preconditions that
both x and B be clear, and with postcondition ON(x, B). Consider the same problem as
above:
- STRIPS might come up with the plan UNSTACK(C, B), PUTDOWN(C), STACK-ON-
B(A). Let’s generalize this plan and store it as a MACROP. The precondition becomes

Dept., of CS&E,SCE Page 5


Artificial Intelligence, VII SEM, CSE, Module - V
ON(x3,x2), the postcondition becomes ON(x1,x2), and the plan itself becomes
UNSTACK(x3, x2), PUTDOWN(x3), STACK-ON-B(x1). Now, suppose we encounter a
slightly different problem:
E D A D
C B C B
Start:A ON (E, C) E
Goal: ON (A, C)
ON (D, B)
- The generalized MACROP, let x1 =A, x2= C, and x3= E. Its preconditions are satisfied,
so we construct the plan UNSTACK (E, C), PUTDOWN (E), STACK-ON-B (A). But
this plan does not work. The problem is that the postcondition of the MACROP is over
generalized. In this case, the difficulty will be discovered when the last step is attempted.
Although we cleared C, which is where we wanted to put A, we failed to clear B, which
is where the MACROP is going to try to put it. Since B is not clear, STACK-ON-B
cannot be executed. If B had happened to be clear, the MACROP would have executed to
completion, but it would not have accomplished the stated goal.

 Learning by Chunking
 Chunking is a process similar in flavor to macro-operators. The idea of chunking comes from
the psychological literature on memory and problem solving. SOAR is an example which
exploits chunking so that its performance can increase with experience. In fact, the designers
of SOAR hypothesize that chunking is a universal learning method, i.e., it can account for all
types of learning in intelligent systems. SOAR solves problems by firing productions, which
are stored in long-term memory. When SOAR detects a useful sequence of production firings,
it creates a chunk, which is essentially a large production that does the work of an entire
sequence of smaller ones.
 The Utility Problem
 Uses mechanism called PRODIGY.
 PRODIGY is annotation tool.
 Effective and efficient , based on EBL.

 PRODIGY which acquires control knowledge automatically. PRODIGY employs several
learning mechanisms. One mechanism uses explanation-based learning (EBL). PRODIGY
can examine a trace of its own problem-solving behavior and try to explain why certain paths
failed. The program uses those explanations to formulate control rules that help the problem
solver avoid those paths in the future. So while SOAR learns primarily from examples of
successful problem solving, PRODIGY also learns from its failures. A major contribution of
the work on EBL in PRODIGY was the identification of the utility problem in learning
systems.

Dept., of CS&E,SCE Page 6


Artificial Intelligence, VII SEM, CSE, Module - V

 LEARNING FROM EXAMPLES: INDUCTION


 Also deterministic SL or discovery learning.
 It is of the form (x,f(x)) -- > output(Y).
 Discovers rules by observing examples.

The idea of producing a classification program is that, it can evolve its own class definitions. This
task of constructing class definitions is called concept learning or induction.

 Winston’s Learning Program


 Winston [1975] describes an early structural concept learning program. This program
operated in a simple blocks world domain. Its goal was to construct representations of the
definitions of concepts in the blocks domain. For example, it learned the concepts House,
Tent, and Arch shown in Figure below. The figure also shows an example of a near miss for
each concept. A near miss is an object that is not an instance of the concept but that is very
similar to such instances.
Figure: Some Blocks World Concepts

 The program started with a line drawing of a blocks world structure. This structural
description was then provided as input to the learning program. An example of such a
structural description for the House of Figure above is shown in Figure (a) below. Node A
represents the entire structure, which is composed of two parts: node B - Wedge, and node C
- Brick. Figures (b) and (c) show descriptions of the two Arch structures of Figure above.
These descriptions are identical except for the types of the objects on the top; one is a Brick
while the other is a Wedge. Notice that the two supporting objects are related not only by left-
of and right-of links, but also by a does-not-marry link, which says that the two objects do
not marry. Two objects marry if they have faces that touch and they have a common edge.
Figure: Structural Descriptions

Dept., of CS&E,SCE Page 7


Artificial Intelligence, VII SEM, CSE, Module - V

Dept., of CS&E,SCE Page 8


Artificial Intelligence, VII SEM, CSE, Module - V
 The basic approach that Winston’s program took to the problem of concept formation can be
described as follows:
- Begin with a structural description of one known instance of the concept. Call that
description the concept definition.
- Examine descriptions of other known instances of the concept. Generalize the definition
to include them.
- Examine descriptions of near misses of the concept. Restrict the definition to exclude
these.
 To see how this approach works, we trace it through the process of learning what an arch is.
Suppose that the arch description of Figure (b) above is presented first. It then becomes the
definition of the concept Arch. Then suppose that the arch description of Figure (c) above is
presented. The comparison routine will return a structure similar to the two input structures
except that it will note that the objects represented by the nodes labeled C are not identical.
This structure is shown as Figure below.
 Figure: The Comparison of Two Arches

 The c-note link from node C describes the difference found by the comparison routine. It
notes that the difference occurred in the isa link and that in the first structure the isa link
pointed to Brick, and in the second it pointed to Wedge. At this point, a new description of
the concept Arch can be generated. This description could say simply that node C must be
either a Brick or a Wedge. Assuming that happens at the node Object, the Arch definition
shows in Figure below can be built.
Figure: The Arch Description after Two Examples

Dept., of CS&E,SCE Page 9


Artificial Intelligence, VII SEM, CSE, Module - V
 The near miss arch shown in Figure above (first figure) is presented. This time, the
comparison routine will note that the only difference between the current definition and the
near miss is in the does-not-marry link between nodes B and D. But since this is a near miss,
we do not want to broaden the definition to include it. Instead, we want to restrict the definition
so that it is specifically excluded. To do this, we modify the link does-not- marry, which may
simply be recording something that has happened by chance to be true of the small number of
examples that have been presented. It must now say must-not-marry. The Arch description at
this point is shown in Figure below.
Figure: The Arch Description after a Near Miss

 Version Spaces
 The goal is the same: to produce a description that is consistent with all positive examples
but no negative examples in the training set. Version spaces work by maintaining a set of
possible descriptions and evolving that set as new examples and near misses are presented.
Consider Figure below, a frame representing an individual car.
Figure: An Example of the Concept Car

 Now, suppose that each slot may contain only the discrete values shown in Figure below.
The choice of features and values is called the bias of the learning system. In this example,
the bias is fairly simple: e.g., we can learn concepts that have to do with car manufacturers.
A clear statement of the bias of a learning system is very important to its evaluation.
Figure: Representation Language for Cars

Dept., of CS&E,SCE Page 10


Artificial Intelligence, VII SEM, CSE, Module - V

 Concept descriptions and training examples can be stated in terms of these slots and values.
For example, the concept “Japanese economy car” can be represented as in Figure below.
The names x1, x2, and x3 are variables. The presence of x2, for example, indicates that the
color of a car is not relevant to whether the car is a Japanese economy car. Now the learning
program is: Given a representation language such as in Figure above (second one) and given
positive and negative training examples such as those in Figure above (first one), how can
we produce a concept description such as that in Figure below, that is consistent with all the
training examples?
Figure: The Concept “Japanese economy car”

 For example, the description in Figure above is more general than in figure above (first one).
In fact, the representation language defines a partial ordering of descriptions. A portion of that
partial ordering is shown in Figure below.
 Figure: Partial Ordering of Concepts Specified by the Representation Language

Dept., of CS&E,SCE Page 11


Artificial Intelligence, VII SEM, CSE, Module - V
The entire partial ordering is called the concept space, and can be depicted as in Figure below.
Figure: Concept and Version Spaces

Algorithm: Candidate Elimination


Given: A representation language and a set of positive and negative examples expressed in
that language.
Compute: A concept description that is consistent with all the positive examples and none of the
negative examples.
1. Initialize G to contain one element: the null description (all features are variables).
2. Initialize S to contain one element: the first positive example.
3. Accept a new training example.
If it is a positive example, first remove from G any descriptions that do not cover the
example. Then, update the S set to contain the most specific set of descriptions in the version
space that cover the example and the current elements of the S set.
- That is, generalize the elements of S as little as possible so that they cover the new training
example.
- If it is a negative example, first remove from S any descriptions that cover the example.
Then, update the G set to contain the most general set of descriptions in the version space
that do not cover the example. That is, describe the elements of G as small as possible so
that the negative example is no longer covered by any of the elements of G.
4. If S and G are both singleton sets, then if they are identical, output their value and halt. If
they are both singleton sets but they are different, then the training cases were inconsistent.
Output this result and halt. Otherwise, go to step 3.

 Example: Trace the operation of the candidate elimination algorithm. Suppose we want to
learn the concept of “Japanese economy car” from the examples as in figure below.
Figure: Positive and Negative Examples of the Concept “Japanese economy car”.

Dept., of CS&E,SCE Page 12


Artificial Intelligence, VII SEM, CSE, Module - V

 Now, will process the second example. The G set must be specialized in such a way that the
negative example is no longer in the version space. In this representation language,
specialization involves replacing variables with constants. Here are the available
specializations:
G = {(x1, x2, x3, x4, x5)}
S = {(Japan, Honda, Blue, 1980, Economy)}
G = {(x1, Honda, x3, x4, x5), (x1, x2, Blue, x4, x5),
(x1, x2, x3, 1980, x5), (x1, x2, x3, x4, Economy)}
The S set is unaffected by the negative example.
 Now we come to the third example, a positive one. The first order of business is to remove
from the G set any descriptions that are inconsistent with the positive example. Our new G
set is:
G = {(x1, x2, Blue, x4, x5), (x1, x2, x3, x4, Economy)}
 We must now generalize the S set to include the new example. This involves replacing
constants with variables. Here is the new S set:
S = {(Japan, x2, Blue, x4, Economy)}
 At this point, the S and G sets specify a version space that can be translated roughly into
English as: “The target concept may be as specific as ‘Japanese, blue economy car,’ or as
general as either ‘blue car’ or ‘economy car.’ ”
 Next, we get another negative example, a car whose origin is USA. The S set is unaffected,
but the G set must be specialized to avoid covering the new example. The new G set is:
G = {(Japan, x2, Blue, x4, x5), (Japan, x2, x3, x4, Economy)}
 We now know that the car must be Japanese, because all of the descriptions in the version
space contain Japan as origin. Our final example is a positive one. We first remove from
the G set any descriptions that are inconsistent with it, leaving:
G = {(Japan, x2, x3, x4, Economy)}
 We then generalize the S set to include the new example:
S = {(Japan, x2, x3, x4, Economy)}
 S and G are both singletons, so the algorithm has converged on the target concept. No more
examples are needed.

Dept., of CS&E,SCE Page 13


Artificial Intelligence, VII SEM, CSE, Module - V

 Decision Trees
 A third approach to concept learning is the induction of decision trees, as exemplified by the
ID3 program of Quinlan. ID3 uses a tree representation for concepts, such as the one
shown in Figure below. To classify a particular input, we start at the top of the tree and
answer questions until we reach a leaf, where the classification is stored. Figure below
represents the familiar concept “Japanese economy car”. ID3 is a program that builds
decision trees automatically, given positive and negative instances of a concept.
 ID3 uses an iterative method to build up decision trees, preferring simple trees over complex
ones, on the theory that simple trees are more accurate classifiers of future inputs. It begins by
choosing a random subset of the training examples. This subset is called the window.
Figure: A Decision Tree

 EXPLANATION-BASED LEARNING

 A number of independent studies led to the characterization of this approach as explanation-


based learning. An EBL system attempts to learn from a single example X by explaining why
X is an example of the target concept. The explanation is then generalized, and the system’s
performance is improved through the availability of this knowledge.
 We can think of EBL programs as accepting the following as input:
- A Training Example: What the learning program “sees” in the world. E.g., an Example of
the Concept Car above.
- A Goal Concept: A high-level description of what the program is supposed to learn
- An Operationally Criterion: A description of which concepts are usable
- A Domain Theory: A set of rules that describe relationships between objects and actions in
 Output is: EBL computes a generalization of the training example that is sufficient to
describe the goal concept, and also satisfies the operationality criterion.
 Explanation-based generalization (EBG) is an algorithm for EBL. It has two steps: (1) explain
and (2) generalize. During the first step, the domain theory is used to prune away all the
unimportant aspects of the training example with respect to the goal concept. What is left is an
explanation of why the training example is an instance of the goal concept. This explanation
is expressed in terms that satisfy the operationality criterion. The next step is to generalize the
explanation as far as possible while still describing the goal concept.
 We want to be able to generalize from a single example of a cup. Suppose the example is:
Training Example:
owner (Object23, Ralph)  has-part(Object23, Concavity12) 
is (Object23, Light)  color (Object23, Brown) …
Dept., of CS&E,SCE Page 14
Artificial Intelligence, VII SEM, CSE, Module - V
 Clearly, some of the features of Object23 are more relevant to its being a cup than others. In
EBL we rely on domain knowledge, such as:
Domain Knowledge:
is (x, Light)  has-part(x, y)  isa (y. Handle) → liftable (x)
has-part (x, y)  isa (y, Bottom)  is (y, Flat) → stable (x)
has-part (x, y)  isa (y, concavity)  is (y, Upward-Pointing) → open-vessel(x)
 We also need a goal concept to operationalize:
Goal Concept: Cup
X is a Cup if X is liftable, stable, and open-vessel.
 Operationality Criterion: Concept definition must be expressed in purely structural terms
(e.g., Light, Flat, etc.).
 Given a training example and a functional description, we want to build a general structural
description of a cup. The first step is to explain why Object23 is a cup. We do this by
constructing a proof, as shown in Figure below.
Figure: An Explanation

 If we gather up all the assumptions and replace constants with variables, we get the
following description of a cup:
has-part (x, y)  isa (y. Concavity)  is(y, Upward-Pointing) 
has-part (x, z)  isa (z, Bottom)  is(z, Flat) 
has-part (x, w)  isa (w, Handle)  is(x, Light)
 This definition satisfies the operationality criterion and could be used by a robot to classify
objects.

 DISCOVERY

 Discovery is certainly learning. Suppose that we want to build a program to discover things,
for example, in mathematics, we expect that such a program would have to rely heavily on the
problem-solving techniques we have discussed.

Dept., of CS&E,SCE Page 15


Artificial Intelligence, VII SEM, CSE, Module - V

 AM: Theory-Driven Discovery


 One such program called AM, and it worked from a few basic concepts of set theory to
discover a good deal of standard number theory.
 AM exploited a variety of general-purpose AI techniques. It used a frame system to represent
mathematical concepts. One of the major activities of AM is to create new concepts and fill
in their slots. An example of an AM concept is shown in Figure below.
Figure: An AM Concept: Prime Number

 AM also uses heuristic search, guided by a set of 250 heuristics rules representing hints
about activities that are likely to lead to “interesting” discoveries. Examples of the kind of
heuristics AM used are:
- If f is a function from A to B and B is ordered, then consider the elements of A that are
mapped into extremal elements of B. Create a new concept representing this subset of A.
- If some examples of some concept X are also examples of another concept Y, create a
new concept representing the intersection of X and Y.
- If very few examples of a concept X are found, then, add to the agenda the task of finding
a generalization of X.
 Generate-and-test is used to form hypotheses on the basis of a small number of examples and
then to test the hypotheses on a larger set to see if they still appear to hold. Finally, an agenda
controls the entire discovery process. AM operates in cycles, each time choosing the most
promising task from the agenda and performing it.

 BACON: Data-Driven Discovery


 A model of data-driven scientific discovery that has been implemented as a program called
BACON, named after Sir Francis Bacon, an early philosopher of science.
 BACON begins with a set of variables for a problem. For example, in the study of the
behavior of gases, some variables are p, the pressure on the gas, V, the volume of the gas,
Dept., of CS&E,SCE Page 16
Artificial Intelligence, VII SEM, CSE, Module - V
n, the amount of gas in moles, and T, the temperature of the gas. Physicists have long
known a law, called the ideal gas law that relates these variables. BACON is able to derive
this law on its own. First, BACON holds the variables n and T constant, performing
experiments at different pressures p1, p2, and p3. BACON notices that as the pressure
increases, the volume V decreases. Therefore, it creates a theoretical term pV. This term is
constant. BACON systematically moves on to vary the other variables. It tries an experiment
with different values of T, and finds that that pV changes. The two terms are linearly
related with an intercept of 0, so BACON creates a new term pV/T. Finally, BACON varies
the term n and finds another linear relation between n and pV/T. For all values of n, p, V,
and T, pV/nT = 8.32. This is, in fact, the ideal gas law. Figure below shows BACON’s
reasoning in a tabular format.
Figure: BACON Discovering the Ideal Gas Law

n T p V pV pV/T pV/nT
1 300 100 24.96
1 300 200 12.48
1 300 300 8.32 2496
1 310 2579.2
1 320 2662.4 8.32
2 320 16.64
3 320 24.96 8.32

 Clustering
 A third type of discovery, called clustering, is very similar to induction, inductive learning, a
program learns to classify objects based on the labeling provided by a teacher. In clustering,
no class labeling are provided. The program must discover for itself the natural classes that
exist for the objects, in addition to a method for classifying instances.
 AUTOCLASS is one program that accepts a number of training cases and hypothesizes a set
of classes. For any given case, the program provides a set of probabilities that predict into
which classes the case is likely to fall.

 ANALOGY

 Analogy is a powerful inference tool. Human often solve problems by making analogies to
things they already understand how to do. Two methods of analogical problem solving that
have been studied in AI are transformational and derivational analogy.

 Transformational Analogy
 Suppose you are asked to prove a theorem in plane geometry. The idea is to transform a solution
to a previous problem into a solution for the current problem. Figure below shows this process.

Dept., of CS&E,SCE Page 17


Artificial Intelligence, VII SEM, CSE, Module - V
Figure: Transformational Analogy

 An example of transformational analogy is shown in Figure below. The program has seen
proofs about points and line segment. For example, it knows a proof that the line segment RN
is exactly as long as the line segment OY, given that RO is exactly as long as NY. The program
is now asked to prove a theorem about angles, namely that the angle BD is equivalent to the
angle CE, given that angles BC and DE are equivalent. The proof about line segments is
retrieved and transformed into a proof about angles by substituting the notion of line for
point angle for line segment, AB for R, AC for O, AD for N and AE for Y.
Figure: Solving a Problem by Transformational Analogy

 Derivational Analogy
 The detailed history of problem- solving episode is called its derivation. Analogical reasoning
that takes these histories into account is called derivational analogy as shown in figure below.
 The derivational analogy is a necessary component in the transfer of skills in complex domains.
For example, suppose you have coded an efficient sorting routine in Pascal, and then you
asked to recode the routine in LISP. A line-by-line translation is not appropriate, but you will
reuse the major structure and control decisions you made when you constructed the Pascal
program. One way to model this behavior is to have a problem-solver “replay” the previous
derivation and modify it when necessary.
 If the original reasons and assumptions for a step’s existence still hold in the new problem, the
step is copied over. If some assumption is no longer valid, another assumption must be found. If
one cannot be found, then we can try to find justification for some alternative stored in the
derivation of the original problem. Or perhaps we can try some step marked as leading to search
Dept., of CS&E,SCE Page 18
Artificial Intelligence, VII SEM, CSE, Module - V
failure in the original derivation, if the reasons to failure conditions are not valid in the current
derivation.
Figure: Derivational Analogy

 FORMAL LEARNING THEORY

 A “theory of the learnable” which classifies problems by how difficult they are to learn.
Formally, a device learns a concept if it can, given positive and negative examples, produce an
algorithm that will classify future examples correctly with probability 1/h. The complexity of
learning a concept is a function of three factors: the error tolerance (h), the number of binary
features present in the examples (t), and the size of the rule necessary to make the
discrimination (f). If the number of training examples required is polynomial in h, t, and f, then
the concept is said to be learnable.
 Some interesting results have been demonstrated for concept learning. Consider the problem of
learning conjunctive feature descriptions. For example, from the list of positive and negative
examples of elephants shown in Figure below. We want to induce the description “gray,
mammal, large”. It has been shown that in conjunctive learning the number of randomly
chosen training examples is proportional to the logarithm of the total number of features.
Figure: Six Positive and Negative Examples of the Concept Elephant

gray? mammal? large? vegetarian? wild?


+ + + + + + (Elephant)
+ + + - + + (Elephant)
+ + - + + - (Mouse)
- + + + + - (Giraffe)
+ - + - + - (Dinosaur)
+ + + + - + (Elephant)

 NEURAL NET LEARNING AND GENETIC LEARNING

 The very first efforts in machine learning tried to mimic animal learning at a neural level.
These efforts were quite different from the symbolic manipulation methods we have seen so
far. Collections of idealized neurons were presented with stimuli and pushed into changing

Dept., of CS&E,SCE Page 19


Artificial Intelligence, VII SEM, CSE, Module - V
their behavior via forms of reward and punishment. Researchers hoped that by imitating the
learning mechanisms of animals, they might build learning machines from very simple parts.
Such hopes proved elusive. However, the field of neural network learning has seen resurgence in
recent years, partly as a result of the discovery of powerful new learning algorithms.
 While neural network models are based on a computational “brain metaphor,” a number of
other learning techniques make use of a metaphor based on evolution. In this work, learning
occurs through a selection process that begins with a large population of random programs.
Learning algorithms inspired by evolution are called “Genetic Algorithms”.

Dept., of CS&E,SCE Page 20


Artificial Intelligence, VII SEM, CSE, Module - V
CHAPTER – 20

EXPERT SYSTEMS

Expert systems solve problems that are normally solved by human “experts”. To solve expert-level
problems, expert systems need access to a substantial domain knowledge base, which must be built as
efficiently as possible. They also need to exploit one or more reasoning mechanisms to apply their
knowledge to the problems they are given. Then they need a mechanism for explaining what they have
done to the users who rely on them. One way to look at expert systems is that they represent applied AI
in a very broad sense.
The problems that expert systems deal with are highly diverse. There are some general issues that arise
across these varying domains. But it also turns out that there are powerful techniques that can be defined
for specific classes of problems.

 REPRESENTING AND USING DOMAIN KNOWLEDGE

 Expert systems are complex AI programs. Almost all the techniques that studied have been
exploited in at least one expert system. However, the most widely used way of representing
domain knowledge in expert systems is as a set of production rules, which are often coupled
with a frame system that defines the objects that occur in the rules. We saw one example of an
expert system rule, which was taken from the MYCIN system. Let’s look at a few additional
examples drawn from some other representative expert systems. All the rules we show are
English versions of the actual rules that the systems use. Differences among these rules illustrate
some of the important differences in the ways that expert systems operate.

R1 or XCON is a program that configures DEC VAX systems. Its rules look like this:
If: the most current active context is distributing massbus devices, and
there is a single-port disk drive that has not been assigned to a massbus, and
there are no unassigned dual-port disk drives, and
the number of devices that each massbus should support is known, and
there is a massbus that has been assigned at least one disk drive and that should support
additional disk drives, and the type of cable needed to connect the disk drive to the previous
device on the massbus is known
then: assign the disk drive to the massbus.

 Notice that R1’s rules, unlike MYCIN’s, contain no numeric measures of certainty. In the
task domain with which R1 deals, it is possible to state exactly the correct thing to be done
in each particular set of circumstances. One reason for this is that there exists a good deal of
human expertise in this area. Another is that since R1 is doing a design task it is not
necessary to consider all possible alternatives, one good one is enough. As a result,
probabilistic information is not necessary in R1.
PROSPECTOR is a program that provides advice on mineral exploration. Its rules look like
this;

Dept., of CS&E,SCE Page 21


Artificial Intelligence, VII SEM, CSE, Module - V
If: magnetic or pyrite in disseminated or veinlet form is present
Then: (2, -4) there is favorable mineralization and texture for the propylitic stage.

 In PROSPECTOR, each rule contains two confidence estimates. The first indicates the
extent to which the presence of the evidence described in the condition part of the rule
suggests the validity of the rule’s conclusion. In the PROSPECTOR rule shown above, the
number 2 indicates that the presence of the evidence is mildly encouraging. The second
confidence estimate measures the extent to which the evidence is necessary to the validity
of the conclusion, or stated another way, the extent to which the lack of the evidence
indicates that the conclusion is not valid. In the example rule shown above, the number -4
indicates that the absence of the evidence is strongly discouraging for the conclusion.

DESIGN ADVISOR is a system that critiques chip designs. Its rules look like:
If: the sequential level count of ELEMENT is greater than 2,
UNLESS the signal of ELEMENT is resettable
then: critique for poor resetability
DEFEAT: poor resetability of ELEMENT
due to: sequential level count of ELEMENT greater than 2
by: ELEMENT is directly resettable

 The DESIGN ADVISOR gives advice to a chip designer, who can accept or reject the
advice. If the advice is rejected, the system can exploit a justification-based truth
maintenance system to revise its model of the circuit.

Reasoning with the Knowledge


As these example rules have shown, expert systems exploit many of the representation and
reasoning mechanisms that we have discussed. Because these programs are usually written
primarily as rule-based systems, forward chaining, backward chaining, or some combination of
the two, is usually used. For example, MYCIN used backward chaining to discover what
organisms were present; then it used forward chaining to reason from the organism to a treatment
regime. R1, on the other hand, used forward chaining. As the field of expert systems matures, more
systems that exploit other kinds of reasoning mechanisms are being developed. The DESIGN
ADVISOR is an example of such a system. In addition to exploiting rules, it makes extensive use of
a justification-based truth maintenance system.

 EXPERT SYSTEM SHELLS

 Initially, each expert system that was built was created from scratch, usually in LISP. But,
after several systems had been built this way, it became clear that these systems often had a
lot in common. In particular, since the systems were constructed as a set of declarative
representations combined with a interpreter for those representations, it was possible to
separate the interpreter from the domain-specific knowledge and thus to create a system
that could be used to construct new expert systems by adding new knowledge
Dept., of CS&E,SCE Page 22
Artificial Intelligence, VII SEM, CSE, Module - V
corresponding to the new problem domain. The resulting interpreters are called shells. One
influential example of such a shell is EMYCIN which was derived from MYCIN.
 Early expert system shells provided mechanisms for knowledge representation, reasoning, and
explanation. Later, tools for knowledge acquisition were added. But as experience with
using these systems to solve real world problems grew, it became clear that expert system
shells needed to do something else as well. They needed to make it easy to integrate expert
systems with other kinds of programs. Expert systems cannot operate in a vacuum, any
more than their human counterparts can. They need access to corporate databases, and
access to them needs to be controlled just as it does for other systems. They are often
embedded within larger application programs that use primarily conventional programming
techniques. So one of the important features that a shell must provide is an easy-to –use
interface between an expert system that is written with the shell and larger, more
conventional, programming environment.

 EXPLANATION

 In order for an expert system to be an effective tool, people must be able to interact with it
easily. To facilitate this interaction, the expert system must have the following two
capabilities:
- Explain its reasoning: In many of the domains in which expert systems operate, people
will not accept results unless they have been convinced of the accuracy of the reasoning
process that produced those results. This is particularly true, for example, in medicine,
where a doctor must accept ultimate responsibility for a diagnosis, even if that diagnosis
was arrived at with considerable help from a program. Thus it is important that the
reasoning process used in such programs proceed in understandable steps and that enough
meta-knowledge (knowledge about the reasoning process) be available so the
explanations of those steps can be generated.
- Acquire new knowledge and modifications of old knowledge: Since expert systems
derive their power from the richness of the knowledge bases they exploit, it is extremely
important that those knowledge bases be as complete and as accurate as possible. But
often there exists no standard codification of that knowledge. Rather it exists only inside
the heads of human experts. One way to get this knowledge into a program is through
interaction with the human expert. Another way is to have the program learn expert
behavior from raw data.

 TEIRESIAS was the first program to support explanation and knowledge acquisition.
TEIRESIAS served as a front-end for the MYCIN expert system. A fragment of a
TEIRESIAS-MYCIN conversation with a user (a doctor) is shown in Figure below. The
program has asked for a piece of information that it needs in order to continue its reasoning.
The doctor wants to know why the program wants the information, and later asks how the
program arrived at a conclusion that it claimed it had reached. We can now return to the
trace of TEIRESIAS-MYCIN’s behavior shown in Figure below.
Figure: A Portion of a Dialogue with TEIRESIAS

Dept., of CS&E,SCE Page 23


Artificial Intelligence, VII SEM, CSE, Module - V

 The first question that the user asks is a “WHY” question, which is assumed to mean “Why
do you need to know that?” Particularly for clinical tests that are either expensive or
dangerous, it is important for the doctor to be convinced that the information is really needed
before ordering the test. Because MYCIN is reasoning backward, the question can easily be
answered by examining the goal tree. Doing so provides two kinds of information:
- What higher-level question might the system be able to answer if it had the requested
piece of information? Ex. In this case, it could help determine the category of
ORGANISM-1.
- What other information does the system already have that makes it think that the
requested piece of knowledge would help? Ex. In this case, facts [2.1] to [2.4].

 KNOWLEDGE ACQUISITION
 How are expert systems built? Usually, a knowledge engineer interviews a domain expert to
explain expert knowledge, which is then translated into rules. After the initial system is built,

Dept., of CS&E,SCE Page 24


Artificial Intelligence, VII SEM, CSE, Module - V
it must be iteratively refined until it approximates expert-level performance. This process is
expensive and time-consuming, so it is worthwhile to look for more automatic ways of
constructing expert knowledge bases. While no totally automatic knowledge acquisition
systems yet exist, there are many programs that interact with domain experts to extract expert
knowledge efficiently. These programs provide support for the following activities:
- Entering knowledge
- Maintaining knowledge base consistency
- Ensuring knowledge base completeness
 The most useful knowledge acquisition programs are those that are restricted to a particular
problem-solving paradigm, e.g., diagnosis or design. It is important to be able to enumerate the
roles that knowledge can play in the problem-solving process. For example, if the paradigm is
diagnosis, then the program can structure its knowledge base around symptoms, hypotheses,
and causes. It can identify symptoms for which the expert has not yet provided causes. Since
one symptom may have multiple causes, the program can ask for knowledge about how to decide
when one hypothesis is better than another. If we move to another type of problem-solving, say
designing artifacts, then these acquisition strategies no longer apply, and we must look for other
ways of profitably interacting with an expert. We now examine two knowledge acquisition
systems in detail.

1. MOLE is a knowledge acquisition system for heuristic classification problems, such as


diagnosing diseases. In particular, it is used in conjunction with the cover-and-differentiate
problem-solving method. An expert system produced by MOLE accepts input data, comes up
with a set of candidate explanations or classifications that cover (or explain) the data, then
uses differentiating knowledge to determine which one is best. The process is iterative, since
explanations must themselves be justified, until causes are ascertained.
 MOLE interacts with a domain expert to produce a knowledge base that a system called
MOLE-p (MOLE-performance) uses to solve problems. The acquisition proceeds through
several steps:
1. Initial knowledge base construction: MOLE asks the expert to list common symptoms
or complaints that might require diagnosis. For each symptom, MOLE prompts for a list
of possible explanations. MOLE then iteratively seeks out higher-level explanations
until it comes up with a set of ultimate causes.
2. Refinement of the knowledge base: MOLE now tries to identify the weaknesses of the
knowledge base. One approach is to find holes and prompt the expert to fill them. It is
difficult, in general, to know whether a knowledge base is complete, so instead MOLE
lets the expert watch MOLE-p solving sample problems. Whenever MOLE-p makes an
incorrect diagnosis, the expert adds new knowledge. There are several ways in which
MOLE-p can reach the wrong conclusion. It may incorrectly reject a hypothesis because
it does not feel that the hypothesis is needed to explain any symptom. It may advance a
hypothesis because it is needed to explain some otherwise inexplicable hypothesis. Or it
may lack differentiating knowledge for choosing between alternative hypotheses. For
example, suppose we have a patient with symptoms A and B. Further suppose that
symptom A could be caused by events X and Y, and that symptom B can be caused by
Dept., of CS&E,SCE Page 25
Artificial Intelligence, VII SEM, CSE, Module - V
Y and Z. MOLE-p might conclude y, since it explains both A and B. If the expert
indicates that this decision was incorrect, then MOLE will ask what evidence should be
used to prefer X and/or Z over Y.

2. SALT is another knowledge acquisition system builds a dependency network as it converses


with the expert. Each node stands for a value of a parameter that must be acquired or
generated. These are three kinds of links: contributes –to, constrains, and suggests- revision-
of. Associated with the first type of link are procedures that allow SALT to generate a value for
one parameter based on the value of another. The second type of link, constrains, rules out
certain parameter values. The third link, suggests-revision-of, points to ways in which a
constraint violation can be fixed. SALT uses the following heuristics to guide the acquisition
process:
1. Every non-input node in the network needs at least one contributes- to link coming into it.
If links are missing, the expert is prompted to fill them in.
2. No contributes-to loops are allowed in the network. Without a value for at least one
parameter in the loop, it is impossible to compute values for any parameter in that loop. If a
loop exists, SALT tries to transform one of the contributes-to links into a constrains link.
3. Constraining links should have suggests-revision-of links associated with them. These
constrain links that are created when dependency loops are broken.

 Control knowledge is also important. It is critical that the system propose extensions and
revisions that lead toward a design solution. SALT allows the expert to rate revisions in
terms of how much trouble they tend to produce.
 SALT compiles its dependency network into a set of production rules. An expert can
watch the production system solve problems and can override the system’s decision. At that
point, the knowledge base can be changed or the override can be logged for future inspection.

 META-DENDRAL was the first program to use learning techniques to construct rules for
an expert system automatically. It built rules to be used by DENDRAL, whose job was to
determine the structure of complex chemical compounds. META-DENDRAL was able to
induce its rules based on a set of mass spectrometry data. It was then able to identify
molecular structures with very high accuracy. META-DENDRAL used the version space
learning algorithm. Another popular method for automatically constructing expert systems is
the induction of decision trees. Decision tree expert systems have been built for assessing
consumer credit applications, analyzing hypothyroid conditions, and diagnosing soybean
diseases, among many other applications.

Dept., of CS&E,SCE Page 26


Artificial Intelligence, VII SEM, CSE, Module - V

Question Bank

1. Explain various learning Techniques with examples.


2. What is an analogy? Explain two types of Analogy with examples.
3. What is formal learning theory? Explain with examples.
4. What is an expert system? List and explain various expert systems.
5. What is an expert system shell? Explain with an example.
6. What is knowledge Acquisition? Discuss various tools for KA.
7. List and describe the steps in Natural language processing (NLP)
8. What is Syntactic and Semantic analysis? Explain.
9. Explain Focus in understanding.
10. What are spell checking techniques? Explain.
11. List and Describe the steps in Natural language processing (NLP)
12. What are spell checking techniques? Explain.
13. Explain rote learning with an example.
14. What do you mean by learning? Discuss Inductive learning with an example.
15. What is an analogy? Explain derivational Analogy.
16. Write short notes on the following Expert systems: Prospector, Design Advisor

Dept., of CS&E,SCE Page 27

Common questions

Powered by AI

The knowledge acquisition process for expert systems using MOLE involves iteratively constructing and refining a knowledge base. Initially, MOLE asks domain experts to list symptoms and possible explanations that might require diagnosis. It then seeks out higher-level explanations until ultimate causes are identified. To refine the knowledge base, MOLE identifies weaknesses and fills gaps by letting experts observe problem-solving simulations, prompting them to add or adjust knowledge when discrepancies occur between expected and observed outcomes .

Explanation in expert systems plays a crucial role in facilitating user interaction and ensuring the acceptance of the system's conclusions. It allows the system to explain its reasoning process, which is necessary for results to be accepted, especially in domains demanding high credibility. Providing explanations helps build trust, as users are more likely to believe conclusions that they understand and can evaluate against known standards or logic .

Neural network learning aims to mimic the neural mechanisms of animal learning through stimuli and response, focusing on changing the behavior of neurons based on rewards and punishments. This approach is metaphorically based on brain function. In contrast, genetic learning uses a metaphor based on evolution, where learning occurs through a selection process among a large population of random programs, evolving towards better solutions. These techniques differ fundamentally in their inspiration: neural networks are inspired by biological neurons, while genetic algorithms are based on evolutionary principles .

A clear statement of a learning system's bias—defined by the choice of features and values—enables a comprehensive evaluation of the system's capabilities and limitations. It helps in determining whether the system can generalize beyond its training data and accurately predict or classify new instances. Understanding bias allows researchers to assess the appropriateness of the model's predictions and ensure it aligns with the intended objectives, facilitating improvements and adjustments in the learning process .

The 'does-not-marry' link indicates that two supporting objects in an arch do not touch or share a common edge. When a near miss occurs, indicating a structure that is similar but not a true instance of the arch, the definition is adjusted by modifying this link to 'must-not-marry' to explicitly exclude the near misses from the concept definition, preventing the inclusion of incorrect structures .

Expert system shells address challenges related to the labor-intensive process of building expert systems from scratch by providing reusable frameworks that separate domain knowledge from the interpreter. This modular approach facilitates rapid development and consistency across systems by allowing developers to focus on encoding knowledge specific to a new domain while reusing a common underlying platform for reasoning and user interaction. Shells also improve integration with external systems, ensuring that expert systems can operate within broader computing environments effectively .

Winston's program begins with a structural description of a known instance of a concept, called the concept definition. The program then examines descriptions of other known instances of the concept to generalize the definition to include them. It also examines descriptions of near misses of the concept and restricts the definition to exclude these. This approach involves iteratively refining the concept definition by comparing new instances and adjusting the definition to cover all positive examples while excluding negative ones .

The partial ordering of concepts in representation languages establishes a hierarchy where concepts are organized based on their generality. This structure allows machine learning systems to systematically evaluate and navigate through different levels of abstraction when forming concept descriptions. It enables efficient concept learning by guiding the refinement process towards more specific or general representations, ensuring that all positive examples are covered while excluding negative ones, ultimately aiding the development of accurate, consistent models .

When processing negative examples, the Candidate Elimination algorithm updates the G set by removing any descriptions that cover the negative example. Then, it modifies the G set to contain the most general descriptions that do not cover the example. This involves specializing descriptions, such as replacing variables with constants, ensuring that the negative example is no longer possible. The S set is unaffected by negative examples but is updated based on positive examples .

Production rules provide a standardized way of representing domain knowledge in expert systems, simplifying the design by allowing systems to be encoded with clear if-then statements that guide decision-making processes. This rule-based design impacts functionality by enabling modular and scalable systems, where individual rules can be added, modified, or removed without significant restructuring. This flexibility allows expert systems to handle complex reasoning over numerous domains efficiently, utilizing methods like forward and backward chaining for dynamic decision-making .

You might also like