(R18A1205) Artificial Intelligence
(R18A1205) Artificial Intelligence
ON
ARTIFICIAL INTELLIGENCE
[R18A1205]
IIIYearB.Tech.IT-ISem LT/P/D C
3 -/-/- 3
(R18A1205) ARTIFICIALINTELLIGENCE
1. Artificial Intelligence, Elaine Rich, Kevin Knight, Shivasankar B. Nair, The McGrawHill
publications, Third Edition, 2009.
2. George F. Luger, Artificial Intelligence: Structures and Strategies for Complex Problem
Solving, Pearson Education, 6th ed., 2009.
COURSE OUTCOMES:
2. To design Knowledge based systems for a given problem and apply reasoning technique.
Introduction:
1) Game Playing
Deep Blue Chess program beat world champion Gary Kasparov
2) Speech Recognition
PEGASUS spoken language interface to American Airlines' EAASY SABRE reservation
system, which allows users to obtain flight information and make reservations over the
Applications of AI:
AI algorithms have attracted close attention of researchers and have also been applied successfully
to solve problems in engineering. Nevertheless, for large and complex problems, AI algorithms
consume considerable computation time due to stochastic feature of the search approaches
Building AI Systems:
1) Perception
Intelligent biological systems are physically embodied in the world and experience the
world through their sensors (senses). For an autonomous vehicle, input might be images
from a camera and range information from a rangefinder. For a medical diagnosis system,
perception is the set of symptoms and test results that have been obtained and input to the
system manually.
"The automation of] activities that we "The study of the computations that
associate with human thinking, activities make it possible to perceive, reason,
such as decision-making, problem solving, and act" (Winston, 1992)
learning..."(Bellman, 1978)
c) "The art of creating machines that perform d) "A field of study that seeks to explain
functions that require intelligence when and emulate intelligent behavior in
performed by people" (Kurzweil, 1990) terms of computational processes"
(Schalkoff, 1 990)
"The study of how to make computers "The branch of computer science
do things at which, at the moment, that is concerned with the
people are better" (Rich and Knight, 1 automation of intelligent behavior"
99 1 ) (Luger and Stubblefield, 1993)
Human- Rationall
Like y
b. Focus is not just on behavior and I/O, but looks like reasoning process.
c. Goal is not just to produce human-like behavior but to produce a sequence of steps of the
reasoning process, similar to the steps followed by a human in solving the same task.
a. The study of mental faculties through the use of computational models; that it is, the
study of computations that make it possible to perceive reason and act.
c. Goal is to formalize the reasoning process as a system of logical rules and procedures of
inference.
a. The art of creating machines that perform functions requiring intelligence when
performed by people; that it is the study of, how to make computers do things which, at
the moment, people do better.
b. Focus is on action, and not intelligent behavior centered around the representation of the world
o The interrogator can communicate with the other 2 by teletype (to avoid
the machine imitate the appearance of voice of the person)
o The interrogator tries to determine which the person is and which the
machine is.
o The machine tries to fool the interrogator to believe that it is the human,
and the person also tries to convince the interrogator that it is the human.
o If the machine succeeds in fooling the interrogator, then conclude that the
machine is intelligent.
a. Tries to explain and emulate intelligent behavior in terms of computational process; that
it is concerned with the automation of the intelligence.
✓ A human agent has eyes, ears, and other organs for sensors and hands, legs, mouth,
and other body parts for actuators.
✓ A robotic agent might have cameras and infrared range finders for sensors and
various motors foractuators.
✓ A software agent receives keystrokes, file contents, and network packets as sensory
inputs and acts on the environment by displaying on the screen, writing files, and
sending network packets.
Percept:
We use the term percept to refer to the agent's perceptual inputs at any given instant.
Percept Sequence:
An agent's percept sequence is the complete history of everything the agent has ever perceived.
Agent function:
Mathematically speaking, we say that an agent's behavior is described by the agent function
that maps any given percept sequence to an action.
Agent program
Internally, the agent function for an artificial agent will be implemented by an agent
program. It is important to keep these two ideas distinct. The agent function is an abstract
Agent function
Fig 2.1.6(i): The REFLEX-VACCUM-AGENT program is invoked for each new percept
(location, status) and returns an action each time
• A Rational agent is one that does the right thing. we say that the right action is the one that will
cause the agent to be most successful. That leaves us with the problem of deciding how and
when to evaluate the agent's success.
We use the term performance measure for the how—the criteria that determine how successful
an agent is.
✓ Ex-Agent cleaning the dirty floor
✓ Performance Measure-Amount of dirt collected
✓ When to measure-Weekly for better results
ENVIRONMENTS:
The Performance measure, the environment and the agents actuators and sensors comes under the
heading task environment. We also call this as PEAS(Performance,Environment,Actuators,Sensors)
➢ The job of AI is to design the agent program: a function that implements the agent mapping
from percepts to actions. We assume this program will run on some sort of ARCHITECTURE
computing device, which we will call the architecture.
➢ The architecture might be a plain computer, or it might include special-purpose hardware for
certain tasks, such as processing camera images or filtering audio input. It might also include
software that provides a degree of insulation between the raw computer and the agent program,
so that we can program at a higher level. In general, the architecture makes the percepts from
the sensors available to the program, runs the program, and feeds the program's action choices
to the effectors as they are generated.
➢ The relationship among agents, architectures, and programs can be summed up as follows:
agent = architecture + program
Agent programs:
➢ Intelligent agents accept percepts from an environment and generates actions. The early
versions of agent programs will have a very simple form (Figure 2.4)
Types of agents:
Agents can be grouped into four classes based on their degree of perceived intelligence and capability :
➢ Simple Reflex Agents
➢ Model-Based Reflex Agents
➢ Goal-Based Agents
➢ Utility-Based Agents
Simple reflex agents:
➢ Simple reflex agents ignore the rest of the percept history and act only on the basis of
the current percept.
➢ The agent function is based on the condition-action rule.
➢ If the condition is true, then the action is taken, else not. This agent function only succeeds when
the environment is fully observable.
Goal-based agents:
Utility-based agents:
Together the initial state,actions and transition model implicitly defines the state space of the problem
State space: set of all states reachable from the initial state by any sequence of actions
➢ The goal test, determining whether the current state is a goal state. Here, the goal state is {In:
Bucharest}
➢ The path cost function, which determine the cost of each path, which is reflecting in the
performance measure.
we define the cost function as c(s, a, s’), where s is the current state and a is the action performed by the
agent to reach state s’.
Goal State
➢ States: a state description specifies the location of each of the eight tiles in one of the nine
squares. For efficiency, it is useful to include the location of the blank.
➢ Actions: blank moves left, right, up, or down.
➢ Transition Model: Given a state and action, this returns the resulting state. For example if we
apply left to the start state the resulting state has the 5 and the blank switched.
➢ Goal test: state matches the goal configuration shown in fig.
➢ Path cost: each step costs 1, so the path cost is just the length of the path.
Which search algorithm one should use will generally depend on the problem
domain. There are four important factors to consider:
2. Optimality – Is the solution found guaranteed to be the best (or lowest cost) solution if there
exists more than one solution?
3. Time Complexity – The upper bound on the time required to find a solution, as a function of
the complexity of the problem.
4. Space Complexity – The upper bound on the storage space (memory) required at any point
during the search, as a function of the complexity of the problem.
Also called blind, exhaustive or brute-force search, uses no information about the problem to guide the
search and therefore may not be very efficient.
Informed Search:
Also called heuristic or intelligent search, uses information about the problem to guide the search, usually
guesses the distance to a goal state and therefore efficient, but the search may not be always possible.
Uninformed Search (Blind searches):
➢ One simple search strategy is a breadth-first search. In this strategy, the root node is
expanded first, then all the nodes generated by the root node are expanded next, and then
their successors, and so on.
➢ In general, all the nodes at depth d in the search tree are expanded before the nodes at depth d
+ 1.
BFS illustrated:
Step 1: Initially frontier contains only one node corresponding to the source state A.
Step 2: A is removed from fringe. The node is expanded, and its children B and C are generated.
They are placed at the back of fringe.
Figure 2
Frontier: B C
Step 3: Node B is removed from fringe and is expanded. Its children D, E are generated and put
at the back of fringe.
Figure 3
Frontier: C D E
Step 4: Node C is removed from fringe and is expanded. Its children D and G are added to the
back of fringe.
Step 5: Node D is removed from fringe. Its children C and F are generated and added to the back
of fringe.
Figure 5
Frontier: E D G C F
Figure 6
Frontier: D G C F
Figure 7
Step 8: G is selected for expansion. It is found to be a goal node. So the algorithm returns the
path A C G by following the parent pointers of the node corresponding to G. The algorithm
terminates.
Disadvantages:
➢ Requires the generation and storage of a tree whose size is exponential the depth of the
shallowest goal node.
➢ The breadth first search algorithm cannot be effectively used unless the search space is
quite small.
Applications Of Breadth-First Search Algorithm
GPS Navigation systems: Breadth-First Search is one of the best algorithms used to find neighboring
locations by using the GPS system.
Broadcasting: Networking makes use of what we call as packets for communication. These packets
follow a traversal method to reach various networking nodes. One of the most commonly used traversal
DFS illustrated:
Figure 1
Step 2: A is removed from fringe. A is expanded and its children B and C are put in front of
fringe.
Figure 2
FRINGE: B C
Step 3: Node B is removed from fringe, and its children D and E are pushed in front of fringe.
Figure 3
FRINGE: D E C
Step 4: Node D is removed from fringe. C and F are pushed in front of fringe.
Figure 4
FRINGE: C F E C
Step 5: Node C is removed from fringe. Its child G is pushed in front of fringe.
Figure 5
FRINGE: G F E C
Step 6: Node G is expanded and found to be a goal node.
Figure 6
FRINGE: G F E C
Note that the time taken by the algorithm is related to the maximum depth of the search tree. If the search
tree has infinite depth, the algorithm may not terminate. This can happen if the search space is infinite. It
can also happen if the search space contains cycles. The latter case can be handled by checking for cycles
in the algorithm. Thus Depth First Search is not complete.
Advantages:
➢ It combines the benefits of BFS and DFS search algorithm in terms of fast search and memory
efficiency.
Disadvantages:
➢ The main drawback of IDDFS is that it repeats all the work of the previous phase.
Complete: Yes
Time: O(bd)
Space: O(bd)
Optimal: Yes, if step cost = 1 or increasing function of depth.
Conclusion:
We can conclude that IDS is a hybrid search strategy between BFS and DFS inheriting
their advantages.
IDS is faster than BFS and DFS.
It is said that “IDS is the preferred uniformed search method when there is a large search space and the depth
of the solution is not known
• might not always find the best solution but is guaranteed to find a good solution in
reasonable time. By sacrificing completeness it increases efficiency.
• Useful in solving tough problems which
o could not be solved any other way.
o solutions take an infinite time or very long time to compute.
Source state
1 3 2
6 5 4
8 7
destination state
1 2 3
4 5 6
7 8
Then the Manhattan distance would be sum of the no of moves required to move each
number from source state to destination state.
Number in 8 1 2 3 4 5 6 7 8
puzzle
No. of moves 0 2 1 2 0 2 2 0
to reach
destination
1 3 2
6 5 4
8 7
Destination state
1 2 3
4 5 6
7 8
Here just calculate the number of tiles that have to be changed to reach goal state
Here 1,5,8 need not be changed
2,3,4,6,7 should be changed, so the heuristic value will be 5(because 5 tiles have to be changed)
✓ Hill climbing algorithm is a local search algorithm which continuously moves in the
direction of increasing elevation/value to find the peak of the mountain or best solution to
the problem. It terminates when it reaches a peak value where no neighbor has a higher
value.
✓ It is also called greedy local search as it only looks to its good immediate neighbor state
and not beyond that.
✓ Hill Climbing is mostly used when a good heuristic is available.
✓ In this algorithm, we don't need to maintain and handle the search tree or graph as it only
keeps a single current state.
Local Maximum: Local maximum is a state which is better than its neighbor states, but there is also another state which is
higher than it.
Global Maximum: Global maximum is the best possible state of state space landscape. It has the highest value of objective
function.
Flat local maximum: It is a flat space in the landscape where all the neighbor states of current states have the same value.
OPEN is a priority queue of nodes that have been evaluated by the heuristic function but which
have not yet been expanded into successors. The most promising nodes are at the front.
CLOSED are nodes that have already been generated and these nodes must be stored because a
graph is being used in preference to a tree.
Algorithm:
• If it has been generated before change the parent if this new path is better
and in that case update the cost of getting to any successor nodes.
Example:
1. It is not optimal.
The A* search algorithm (pronounced "Ay-star") is a tree search algorithm that finds a path
from a given initial node to a given goal node (or one passing a given goal test). It employs a
"heuristic estimate" which ranks each node by an estimate of the best route that goes through that
node. It visits the nodes in order of this heuristic estimate.
Similar to greedy best-first search but is more accurate because A* takes into account the nodes
that have already been traversed.
g is a measure of the distance/cost to go from the initial node to the current node
Thus fis an estimate of how long it takes to go from the initial node to the solution
Algorithm:
save n in CLOSED
b) If m € [OPEN U CLOSED]
Move m to OPEN.
Description:
• A* begins at a selected node. Applied to this node is the "cost" of entering this node
(usually zero for the initial node). A* then estimates the distance to the goal node from
the current node. This estimate and the cost added together are the heuristic which is
assigned to the path leading to this node. The node is then added to a priority queue, often
called "open".
• The algorithm then removes the next node from the priority queue (because of the way a
priority queue works, the node removed will have the lowest heuristic). If the queue is
empty, there is no path from the initial node to the goal node and the algorithm stops. If
the node is the goal node, A* constructs and outputs the successful path and stops.
• If the node is not the goal node, new nodes are created for all admissible adjoining nodes;
the exact way of doing this depends on the problem at hand. For each successive node,
A* calculates the "cost" of entering the node and saves it with the node. This cost is
calculated from the cumulative sum of costs stored with its ancestors, plus the cost of the
operation which reached this new node.
• The algorithm also maintains a 'closed' list of nodes whose adjoining nodes have been
checked. If a newly generated node is already in this list with an equal or lower cost, no
further processing is done on that node or with the path associated with it. If a node in
the closed list matches the new one, but has been stored with a higher cost, it is removed
from the closed list, and processing continues on the new node.
▪ The algorithm A* is admissible. This means that provided a solution exists, the first solution
found by A* is an optimal solution. A* is admissible under the following conditions:
▪ A* is also complete.
Sometimes a problem is not embedded in a long set of action sequences but requires picking the
best option from available choices. A good general-purpose problem solving technique is to list the
constraints of a situation (either negative constraints, like limitations, or positive elements that you
want in the final solution). Then pick the choice that satisfies most of the constraints.
Formally speaking, a constraint satisfaction problem (or CSP) is defined by a set of variables,
X1;X2; : : :
;Xn, and a set of constraints, C1;C2; : : : ;Cm. Each variable Xi has anonempty domain Di of
possible values. Each constraint Ci involves some subset of tvariables and specifies the allowable
combinations of values for that subset. A state of theproblem is defined by an assignment of values
to some or all of the variables, {Xi = vi;Xj =vj ; : : :} An assignment that does not violate any
constraints is called a consistent or
legalassignment. A complete assignment is one in which every variable is mentioned, and a
solution to a CSP is a complete assignment that satisfies all the constraints. Some CSPs also
require a solution that maximizes an objectivefunction.
1. Initial state: the empty assignment fg, in which all variables are unassigned.
2. Successor function: a value can be assigned to any unassigned variable, provided that it does
not conflict with previously assigned variables.
Examples:
The task of coloring each region red, green or blue in such a way that no neighboring
regions have the same color.
We are given the task of coloring each region red, green, or blue in such a way that the
neighboring regions must not have the same color.
To formulate this as CSP, we define the variable to be the regions: WA, NT, Q, NSW, V, SA, and
T. The domain of each variable is the set {red, green, blue}. The constraints require
neighboring regions to have distinct colors: for example, the allowable combinations for
WA and NT are the pairs
{(red,green),(red,blue),(green,red),(green,blue),(blue,red),(blue,green)}. (The constraint
can also be represented as the inequality WA ≠ NT). There are many possible solutions,
as follows:
> Initial state : the empty assignment {},in which all variables are unassigned.
> Successor function: a value can be assigned to any unassigned variable, provided
that it does not conflict with previously assigned variables.
> Goal test: the current assignment is complete.
> Path cost: a constant cost(E.g.,1) for every step.
Advanced Search: Constructing Search Trees, Stochastic Search, AO* Search Implementation, Minimax
Search, Alpha-Beta Pruning Basic Knowledge Representation and Reasoning: Propositional Logic, First-
Order Logic, Forward Chaining and Backward Chaining, Introduction to Probabilistic Reasoning, Bayes
Theorem
Game Playing
Adversarial search, or game-tree search, is a technique for analyzing an adversarial game in order to try to
determine who can win the game and what moves the players should make in order to win. Adversarial
search is one of the oldest topics in Artificial Intelligence. The original ideas for adversarial search were
developed by Shannon in 1950 and independently by Turing in 1951, in the context of the game of
chess—and their ideas still form the basis for the techniques used today.
2-Person Games:
o Players: We call them Max and Min.
o Initial State: Includes board position and whose turn it is.
Example:
• Pruning: eliminating a branch of the search tree from consideration without exhaustive
examination of each node
• - Pruning: the basic idea is to prune portions of the search tree that cannot improve
the utility value of the max or min node, by just considering the values of nodes seen so
far.
• Alpha-beta pruning is used on top of minimax search to detect paths that do not need to
be explored. The intuition is:
• The MAX player is always trying to maximize the score. Call this .
• The MIN player is always trying to minimize the score. Call this .
• Alpha cutoff: Given a Max node n, cutoff the search below n (i.e., don't generate or
examine any more of n's children) if alpha(n) >= beta(n)
(alpha increases and passes beta from below)
• Beta cutoff.: Given a Min node n, cutoff the search below n (i.e., don't generate or
examine any more of n's children) if beta(n) <= alpha(n)
(beta decreases and passes alpha from above)
• Carry alpha and beta values down during search Pruning occurs whenever alpha >= beta
Algorithm:
1) Setup phase: Assign to each left-most (or right-most) internal node of the tree,
variables: alpha = -infinity, beta = +infinity
2) Look at first computed final configuration value. It’s a 3. Parent is a min node, so
set the beta (min) value to 3.
4) Look at next value, 2. Since parent node is min with b=+inf, 2 is smaller, change b.
6) Max node is now done and we can set the beta value of its parent and propagate node
state to sibling subtree’s left-most path.
8) The next node is 4. Smallest value goes to the parent min node. Min subtree is done, so
the parent max node gets the alpha (max) value from the child. Note that if the max node
had a 2nd subtree, we can prune it since a>b.
10) Next value is a 2. We set the beta (min) value of the min parent to 2. Since no other
children exist, we propagate the value up the tree.
12) Finally, no more nodes remain, we propagate values up the tree. The root has a value
of 3 that comes from the left-most child. Thus, the player should choose the left-most
child’s move in order to maximize his/her winnings. As you can see, the result is the same
as with the mini-max example, but we did not visit all nodes of the tree.
The Depth first search and Breadth first search given earlier for OR trees or graphs can be easily
adopted by AND-OR graph. The main difference lies in the way termination conditions are
determined, since all goals following an AND nodes must be realized; where as a single goal node
following an OR node will do. So for this purpose we are using AO* algorithm.
Like A* algorithm here we will use two arrays and one heuristic function.
OPEN:
It contains the nodes that has been traversed but yet not been marked solvable or unsolvable.
CLOSE:
It contains the nodes that have already been processed.
6 7:The distance from current node to goal node.
Algorithm:
Step 1: Place the starting node into OPEN.
Step 2: Compute the most promising solution tree say T0.
Step 3: Select a node n that is both on OPEN and a member of T0. Remove it from OPEN and place it
in
CLOSE
Step 4: If n is the terminal goal node then leveled n as solved and leveled all the ancestors of n as
solved. If the starting node is marked as solved then success and exit.
Step 5: If n is not a solvable node, then mark n as unsolvable. If starting node is marked as unsolvable,
then return failure and exit.
Step 6: Expand n. Find all its successors and find their h (n) value, push them into OPEN.
Step 7: Return to Step 2.
Step 8: Exit.
Step 1:
In the above graph, the solvable nodes are A, B, C, D, E, F and the unsolvable nodes are G, H. Take A as
the starting node. So place A into OPEN.
• Humans are best at understanding, reasoning, and interpreting knowledge. Human knows things,
which is knowledge and as per their knowledge they perform various actions in the real world.
• But how machines do all these things comes under knowledge representation
• There are three factors which are put into the machine, which makes it valuable:
• Knowledge: The information related to the environment is stored in the machine.
• Reasoning: The ability of the machine to understand the stored knowledge.
• Intelligence: The ability of the machine to make decisions on the basis of the stored information.
• A knowledge representation language is defined by two aspects:
• The syntax of a language describes the possible configurations that can constitute sentences.
• The semantics determines the facts in the world to which the sentences refer.
• For example, the syntax of the language of arithmetic expressions says that if x and y are
expressions denoting numbers, then x > y is a sentence about numbers. The semantics of the language
says that x > y is false when y is a bigger number than x, and true otherwise From the syntax and
semantics, we can derive an inference mechanism for an agent that uses the language.
• Recall that the semantics of the language determine the fact to which a given sentence refers.
Facts are part of the world,
• whereas their representations must be encoded in some way that can be physically stored within an
agent. We cannot put the world inside a computer (nor can we put it inside a human), so all
reasoning mechanisms must operate on representations of facts, rather than on the facts themselves.
Because sentences are physical configurations of parts of the agent,
Reasoning must be a process of constructing new physical configurations from old ones. Proper
reasoning should ensure that the new configurations represent facts that actually follow from the
facts that the old configurations represent.
• We want to generate new sentences that are necessarily true, given that the old sentences are true.
• In mathematical notation, the relation of entailment between a knowledge base KB and a sentence a
PROPOSITIONAL LOGIC:
• Propositional logic (PL) is the simplest form of logic where all the statements are made by
propositions.
• A proposition is a declarative statement which is either true or false.
It is a technique of knowledge representation in logical and mathematical form
Q= Rohan is hardworking. → P∧ Q.
2. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction, where P and
Q are the propositions.
3. Example: "Ritika is a doctor or Engineer",
Semantics
• The semantics of prepositional logic is also quite straightforward. We define it by specifying the
interpretation of the proposition symbols and constants, and specifying the meanings of the logical
connectives.
Validity
• Truth tables can be used not only to define the connectives, but also to test for valid sentences.
• Given a sentence, we make a truth table with one row for each of the possible combinations of
truth values for the proposition symbols in the sentence.
• If the sentence is true in every row, then the sentence is valid. For example, the sentence ((P V H)
A ¬H) => P
Translating English into logic:
• User defines semantics of each propositional symbol
• P: It is Hot
• Q: It is Humid
• R:It is raining
1. If it is humid then it is hot
Q->P
.If it is hot and humid , then it is raining
(P A Q)->R
Limitations of Propositional logic:
• In propositional logic, we can only represent the facts, which are either true or false.
• PL is not sufficient to represent the complex sentences or natural language statements.
• The propositional logic has very limited expressive power.
• Consider the following sentence, which we cannot represent using PL logic.
• "Some humans are intelligent", or "Sachin likes cricket
➢ The declarative nature of propositional logic, specify that knowledge and inference are separate,
and inference is entirely domain-independent. Propositional logic is a declarative language
because its semantics is based on a truth relation between sentences and possible worlds.
➢ It also has sufficient expressive power to deal with partial information, using disjunction and
negation.
For example, we were forced to write a separate rule about breezes and pits for each square, such as
B1,1⇔ (P1,2 ∨ P2,1) .
In English, it seems easy enough to say, “Squares adjacent to pits are breezy.”
The syntax and semantics of English somehow make it possible to describe the environment concisely
The models of a logical language are the formal structures that constitute the possible worlds under
consideration. Each model links the vocabulary of the logical sentences to elements of the possible world,
so that the truth of any sentence can be determined. Thus, models for propositional logic link proposition
symbols to predefined truth values. Models for first-order logic have objects. The domain of a model is
the set of objects or domain elements it contains. The domain is required to be nonempty—every possible
world must contain at least one object.
A relation is just the set of tuples of objects that are related.
For Example:
Richard the Lionheart, King of England from 1189 to 1199; His younger brother, the evil King John, who
ruled from 1199 to 1215; the left legs of Richard and John; crown
Unary Relation : John is a king Binary Relation :crown is on head of john , Richard is brother ofjohn The
unary "left leg" function includes the following mappings: (Richard the Lionheart) ->Richard's left leg
(King John) ->Johns left Leg
Symbols are the basic syntactic elements of first-order logic. Symbols stand for objects, relations, and
functions.
The symbols are of three kinds: Constant symbols which stand for objects; Example: John, Richard
Predicate symbols, which stand for relations; Example: OnHead, Person, King, and Crown
Function symbols, which stand for functions. Example: left leg Symbols will begin with uppercase
letters.
Interpretation The semantics must relate sentences to models in order to determine truth. For this to
happen, we need an interpretation that specifies exactly which objects, relations and functions are referred
to by the constant, predicate, and function symbols.
Richard refers to Richard the Lionheart and John refers to the evil king John. Brother refers to the
brotherhood relation OnHead refers to the "on head relation that holds between the crown and King
John; Person, King, and Crown refer to the sets of objects that are persons, kings, and crowns. LeftLeg
refers to the "left leg" function,
The truth of any sentence is determined by a model and an interpretation for the sentence's symbols.
Therefore, entailment, validity, and so on are defined in terms of all possiblemodels and all possible
interpretations. The number of domain elements in each model may be unbounded-for example, the
domain elements may be integers or real numbers. Hence, the number of possible models is anbounded,
as is the number of interpretations.
Term
A term is a logical expression that refers to an object. Constant symbols are therefore terms. Complex
Terms A complex term is just a complicated kind of name. A complex term is formed by a function
symbol followed by a parenthesized list of terms as arguments to the function symbol For example: "King
John's left leg" Instead of using a constant symbol, we use LeftLeg(John). The formal semantics of terms
Consider a term f (tl,. . . , t,). The function symbol frefers to some function in the model (F); the argument
terms refer to objects in the domain (call them d1….dn); and the term as a whole refers to the object that is the
value of the function Fapplied to dl, . . . , d,. For example,: the LeftLeg function symbol refers to the
function “ (King John) -+ John's left leg” and John refers to King John, then LeftLeg(John) refers to King
John's left leg. In this way, the interpretation fixes the referent of every term.
Atomic sentences
An atomic sentence is formed from a predicate symbol followed by a parenthesized list of terms: For
Example: Brother(Richard, John).
Atomic sentences can have complex terms as arguments. For Example: Married (Father(Richard),
Mother( John)).
An atomic sentence is true in a given model, under a given interpretation, if the relation referred to by the
predicate symbol holds among the objects referred to by the arguments
Complex sentences Complex sentences can be constructed using logical Connectives, just as in
Thus, the sentence says, “For all x, if x is a king, then x is a person.” The symbol x is called a
variable. Variables are lowercase letters. A variable is a term all by itself, and can also serve as the
argument of a function A term with no variables is called a ground term.
Assume we can extend the interpretation in different ways: x→ Richard the Lionheart, x→ King John, x→ Richard’s
left leg, x→ John’s left leg, x→ the crown
The universally quantified sentence ∀x King(x) ⇒Person(x) is true in the original model if the sentence King(x)
⇒Person(x) is true under each of the five extended interpretations. That is, the universally quantified sentence is
equivalent to asserting the following five sentences:
Richard the Lionheart is a king ⇒Richard the Lionheart is a person. King John is a king ⇒King John is a person.
Richard’s left leg is a king ⇒Richard’s left leg is a person. John’s left leg is a king ⇒John’s left leg is a person. The crown is
a king ⇒the crown is a person.
Universal quantification makes statements about every object. Similarly, we can make a statement about some object
in the universe without naming it, by using an existential quantifier.
“The sentence ∃x P says that P is true for at least one object x. More precisely, ∃x P is true in a given model if P is true
in at least one extended interpretationthat assigns x to a domain element.” ∃x is pronounced “There exists an x such that . .
.” or “For some x . . .”.
Richard the Lionheart is a crown ∧Richard the Lionheart is on John’s head; King John is a crown
∧King John is on John’s head; Richard’s left leg is a crown ∧Richard’s left leg is on John’s head; John’s left leg is a crown ∧John’s
left leg is on John’s head; The crown is a crown ∧the crown is on John’s head. The fifth assertion is true in the model, so
the original existentially quantified sentence is true in the model. Just as ⇒appears to be the natural connective to use
with ∀, ∧is the natural connective to use with ∃.
Nested quantifiers
For example, “Brothers are siblings” can be written as ∀x∀y Brother (x, y) ⇒Sibling(x, y). Consecutive quantifiers of
the same type can be written as one quantifier with several variables.
For example, to say that siblinghood is a symmetric relationship, we can write∀x, y Sibling(x, y) ⇔Sibling(y, x).
For example: 1. “Everybody loves somebody” means that for every person, there is someone that person loves: ∀x∃y
Loves(x, y) . 2. On the other hand, to say “There is someone who is loved by everyone,” we write ∃y∀x Loves(x, y) .
Universal and Existential quantifiers are actually intimately connected with each other, through negation.
Example assertions:
1. “ Everyone dislikes medicine” is the same as asserting “ there does not exist someone who likes medicine” , and vice versa:
“∀x ¬Likes(x, medicine)” is equivalent to “¬∃x Likes(x, medicine)”.
2. “Everyone likes ice cream” means that “ there is no one who does not like ice cream” : ∀xLikes(x, IceCream) is equivalent
to ¬∃x ¬Likes(x, IceCream) .
Because ∀is really a conjunction over the universe of objects and ∃is a disjunction that they obey De
Morgan’s rules. The De Morgan rules for quantified and unquantified sentences are as follows:
First-order logic includes one more way to make atomic sentences, other than using a predicateand terms .We can use
the equality symbol to signify that two terms refer to the same object.
For example,
“Father(John) =Henry” says that the object referred to by Father (John) and the object referred to by
Henry are the same.
Because an interpretation fixes the referent of any term, determining the truth of an equality sentence is simply a
matter of seeing that the referents of the two terms are the same object.The equality symbol can be used to state facts
about a given function.It can also be used with negation to insist that two terms are not the same object.
For example,
“Richard has at least two brothers” can be written as, ∃x, y Brother (x,Richard ) ∧Brother (y,Richard
) ∧¬(x=y) .
The sentence
∃x, y Brother (x,Richard ) ∧Brother (y,Richard ) does not have the intended meaning.
In particular, it is true only in the model where Richard has only one brother considering the extended interpretation in
which both x and y are assigned to King John. The addition of ¬(x=y) rules out such models.
Assertions:
Sentences are added to a knowledge base using TELL, exactly as in propositional logic. Such sentences are
called assertions.
For example,
John is a king, TELL (KB, King (John)). Richard is a person. TELL (KB, Person (Richard)). All kings are
persons: TELL (KB, ∀x King(x) ⇒Person(x)).
Asking Queries:
We can ask questions of the knowledge base using ASK. Questions asked with ASK are called queries or
goals.
For example,
Any query that is logically entailed by the knowledge base should be answered affirmatively. Fo
The answer is true, but this is perhaps not as helpful as we would like. It is rather like answering
“Can you tell me the time?” with “Yes.”
If we want to know what value of x makes the sentence true, we will need a different function,
ASKVARS, which we call with ASKVARS (KB, Person(x)) and which yields a stream of answers.
In this case there will be two answers: {x/John} and {x/Richard}. Such an answer is called a substitution
or binding list.
ASKVARS is usually reserved for knowledge bases consisting solely of Horn clauses, because in such
knowledge bases every way of making the query true will bind the variables to specific values.
We use functions for Mother and Father, because every person has exactly one of each of these.
We can represent each function and predicate, writing down what we know in termsof the other symbols.
For example:-
1. one’s mother is one’s female parent: ∀m, c Mother (c)=m ⇔Female(m) ∧Parent(m,
.
2. One’s husband is one’s male spouse: ∀w, h Husband(h,w) ⇔Male(h) ∧Spouse(h,w) .
4. Parent and child are inverse relations: ∀p, c Parent(p, c) ⇔Child (c, p) .
5. A grandparent is a parent of one’s parent: ∀g, c Grandparent (g, c) ⇔∃p Parent(g, p) ∧Parent(p, c)
Axioms:
Each of these sentences can be viewed as an axiom of the kinship domain. Axioms are commonly associated
with purely mathematical domains. They provide the basic factual information from which useful
conclusions can be derived.
Kinship axioms are also definitions; they have the form ∀x, y P(x, y) ⇔. . ..
The axioms define the Mother function, Husband, Male, Parent, Grandparent, and Sibling predicates in terms
of other predicates.
Our definitions “bottom out” at a basic set of predicates (Child, Spouse, and Female) in terms of which the
others are ultimately defined. This is a natural way in which to build up the representation of a
domain, and it is analogous to the way in which software packages are built up by successive
definitions of subroutines from primitive library functions.
Theorems:
Not all logical sentences about a domain are axioms. Some are theorems—that is, they are entailed by the
axioms.
For example, consider the assertion that siblinghood is symmetric: ∀x, y Sibling(x, y) ⇔Sibling(y, x) .
It is a theorem that follows logically from the axiom that defines siblinghood. If we ASK the knowledge
base this sentence, it should return true. From a purely logical point of view, a knowledge base need
contain only axioms and no theorems, because the theorems do not increase the set of conclusions that
follow from the knowledge base. From a practical point of view, theorems are essential to reduce the
computational cost of deriving new sentences. Without them, a reasoning system has to start from first
principles every time.
Not all axioms are definitions. Some provide more general information about certain predicates without
∀xPerson(x) ⇔. . .
Fortunately, first-order logic allows us to make use of the Person predicate without completely defining it.
Instead, we can write partial specifications of properties that every person has and properties that make
something a person:
∀xPerson(x) ⇒. . . ∀x . . . ⇒Person(x) .
Axioms can also be “just plain facts,” such as Male (Jim) and Spouse (Jim, Laura).Such facts form the
descriptions of specific problem instances, enabling specific questions to be answered. The answers to
these questions will then be theorems that follow from the axioms
Numbers are perhaps the most vivid example of how a large theory can be built up from NATURAL
NUMBERS a tiny kernel of axioms. We describe here the theory of natural numbers or non-negative
integers. We need:
That is, 0 is a natural number, and for every object n, if n is a natural number, then S(n) is a natural number.
So the natural numbers are 0, S(0), S(S(0)), and so on. We also need axioms to constrain the successor
function: ∀n 0 != S(n) . ∀m, n m != n ⇒ S(m) != S(n) .
Now we can define addition in terms of the successor function: ∀m NatNum(m) ⇒ + (0, m) = m .
∀m, n NatNum(m) ∧ NatNum(n) ⇒ + (S(m), n) = S(+(m, n))
To make our sentences about numbers easier to read, we allow the use of infix notation. We can also write
S(n) as n + 1, so the second axiom becomes :
This axiom reduces addition to repeated application of the successor function. Once we have addition, it is
straightforward to define multiplication as repeated addition, exponentiation as repeated multiplication,
integer division and remainders, prime numbers, and so on. Thus, the whole of number theory (including
cryptography) can be built up from one constant, one function, one predicate and four axioms.
Sets
The domain of sets is also fundamental to mathematics as well as to commonsense reasoning. Sets can be
represented as individualsets, including empty sets.
To know whether an element is a member of a set Distinguish sets from objects that are not sets.
The empty set is a constant written as { }. There is one unary predicate, Set, which is true of sets. The binary
predicates are
s1 ∩ s2 (the intersection of two sets), s1 ∪ s2 (the union of two sets), and {x|s} (the set resulting from
adjoining element x to set s).
Inference engine:
The inference engine is the component of the intelligent system in artificial intelligence, which
applies logical rules to the knowledge base to infer new information from known facts. The first inference
engine was part of the expert system. Inference engine commonly proceeds in two modes, which are:
a. Forward chaining
b. Backward chaining
Horn clause and definite clause are the forms of sentences, which enables knowledge base to use a more
restricted and efficient inference algorithm. Logical inference algorithms use forward and backward
chaining approaches, which require KB in the form of the first-order definite clause.
Definite clause: A clause which is a disjunction of literals with exactly one positive literal is known as
a definite clause or strict horn clause.
Horn clause: A clause which is a disjunction of literals with at most one positive literal is known as
horn clause. Hence all the definite clauses are horn clauses.
It is equivalent to p ∧ q → k.
A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when using an
inference engine. Forward chaining is a form of reasoning which start with atomic sentences in the
knowledge base and applies inference rules (Modus Ponens) in the forward direction to extract more data
until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are satisfied,
and add their conclusion to the known facts. This process repeats until the problem is solved.
Properties of Forward-Chaining:
Consider the following famous example which we will use in both approaches:
Step-1:
In the first step we will start with the known facts and will choose the sentences which do not have
implications, such as: American(Robert), Enemy(A, America), Owns(A, T1), and Missile(T1). All
these facts will be represented as below.
At the second step, we will see those facts which infer from available facts and with satisfied premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers from the
conjunction of Rule (2) and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from Rule-(7).
Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we can
add Criminal(Robert) which infers all the available facts. And hence we reached our goal statement.
Example:
In backward-chaining, we will use the same above example, and will rewrite all the rules.
Backward-Chaining proof:
In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and then infer
further rules.
Step-1:
At the first step, we will take the goal fact. And from the goal fact, we will infer other facts, and at last, we
will prove those facts true. So our goal fact is "Robert is Criminal," so following is the predicate of it.
At the second step, we will infer other facts form goal fact which satisfies the rules. So as we can see in
Rule-1, the goal predicate Criminal (Robert) is present with substitution {Robert/P}. So we will add all
the conjunctive facts below the first level and will replace p with Robert.
Here we can see American (Robert) is a fact, so it is proved here.
Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it satisfies
Rule-(5). Weapon (q) is also true with the substitution of a constant T1 at q.
Step-4:
At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which satisfies
Step-5:
At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule- 6. And
hence all the statements are proved true using backward chaining.
2 It is a bottom-up It is a top-down
. approach approach
BAYES Theorem:
• Bayes' Theorem is a way of finding a probability when we know certain other probabilities.
The formula is
• Which tells us: how often A happens given that B happens, written P(A|B),
• When we know: How often B happens given that A happens, written P(B|A)
• and how likely A is on its own, written P(A)
• and how likely B is on its own, written P(B)
Example:
We can then discover the probability of dangerous Fire when there is Smoke:
=1% x 90/10%
=9%
Example 2:
But cloudy mornings are common (about 40% of days start cloudy)
And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%)
We will use Rain to mean rain during the day, and Cloud to mean cloudy morning.
Artificial intelligence is a system that is concerned with the study of understanding, designing and
implementing the ways, associated with knowledge representation to computers.
In any intelligent system, representing the knowledge is supposed to be an important technique to encode
the knowledge.
The main objective of AI system is to design the programs that provide information to the computer,
which can be helpful to interact with humans and solve problems in various fields which require human
intelligence.
What is Knowledge?
Knowledge is an useful term to judge the understanding of an individual on a given subject.
In intelligent systems, domain is the main focused subject area. So, the system specifically focuses on
acquiring the domain knowledge.
Issues in knowledge representation
The main objective of knowledge representation is to draw the conclusions from the knowledge, but there
are many issues associated with the use of knowledge representation techniques.
ii. What should be the number (small or large) of low-level primitives or high-level facts?
High-level facts may be insufficient to draw the conclusion while Low-level primitives may
require a lot of storage.
For example: Suppose that we are interested in following facts:
John spotted Alex.
Now, this could be represented as "Spotted (agent(John), object (Alex))"
Such a representation can make it easy to answer questions such as: Who spotted Alex?
Suppose we want to know : "Did John see Sue?"
Given only one fact, user cannot discover that answer.
Hence, the user can add other facts, such as "Spotted (x, y) → saw (x, y)"
4. Representing sets of objects.
There are some properties of objects which satisfy the condition of a set together but not as individual;
Example: Consider the assertion made in the sentences:
"There are more sheep than people in Australia", and "English speakers can be found all over the world."
These facts can be described by including an assertion to the sets representing people, sheep, and English.
5. Finding the right structure as needed
To describe a particular situation, it is always important to find the access of right structure. This can be
done by selecting an initial structure and then revising the choice.
While selecting and reversing the right structure, it is necessary to solve following problem statements.
They include the process on how to:
● Logic will be said as non-monotonic if some conclusions can be invalidated by adding more knowledge
into our knowledge base.
● "Human perceptions for various things in daily life, "is a general example of non-monotonic reasoning.
Example: Let suppose the knowledge base contains the following knowledge:
● Birds can fly
● Pitty is a bird
So from the above sentences, we can conclude that Pitty can fly.
However, if we add one another sentence into knowledge base "Pitty is a penguin", which concludes
"Pitty cannot fly", so it invalidates the above conclusion.
ACTING UNDER UNCERTAINTY
Agents may need to handle uncertainity ,whether due to partial observability,non determininsm or
combination of two.
Summarizing Uncertainity:
Consider the following Simple rule:
Toothache=> Cavity
Not all the patients with toothaches have cavities ,some of them may have gum disease ,an abscess or
Trying to use first-order logic to cope with a domain like medical diagnosis thus fails for three main
reasons:
Laziness: It is too much work to list the complete set of antecedents or consequents needed to ensure an
exceptionless rule, and too hard to use the enormous rules that result.
Theoretical ignorance: Medical science has no complete theory for the domain.
Practical ignorance: Even if we know all the rules, we may be uncertain about a particular patient
because all the necessary tests have not or cannot be run.
The agent's knowledge can at best provide only a degree of belief in the relevant sentences. Our main tool
for dealing with degrees of belief will be probability theory, which assigns a numerical degree of belief
between 0 and 1 to sentences.
Probability provides a way of summarizing the uncertainty that comes from our laziness and ignorance.
We may not know for sure what afflicts a particular patient, but we believe that there is, say, an 80%
chance—that is, a probability of 0.8—that the patient has a cavity if he or she has a toothache
Prior probability We will use the notation P(A) for the unconditional or prior probability that the
proposition A is true.
For example, if Cavity denotes the proposition that a particular patient has a cavity,
P(Cavity) = 0
means that in the absence of any other information, the agent will assign a probability of 0.1 (a 10%
chance) to the event of the patient's having a cavity.
It is important to remember that P(A) can only be used when there is no other information. As soon as
some new information B is known, we have to reason with the conditional probability of A given B
instead of P(A).
The proposition that is the subject of a probability statement can be represented by a proposition symbol,
as in the P(A) example. Propositions can also include equalities involving so-called random variables. For
example, if we are concerned about the random variable Weather, we might have
We can view proposition symbols as random variables as well, if we assume that they have a domain
[true,false). Thus, the expression P(Cavity) can be viewed as shorthand for P(Cavity = true). Similarly,
P(->Cavity) is shorthand for P(Cavity =false). Usually, we will use the letters A, B, and so on for Boolean
random variables, and the letters X, Y, and so on for multivalued variables.
Sometimes, we will want to talk about the probabilities of all the possible values of a random variable.
In this case, we will use an expression such as P(Weather), which denotes vector of values for the
probabilities of each individual state of the weather.
Given the preceding values, for example, we would write P(Weather) = (0.7,0.2,0.08,0.02)
This statement defines a probability distribution for the random variable Weather.
We will also use expressions' such as P(Weather, Cavity) to denote the probabilities of all combinations
of the values of a set of random variables.
In this case, P(Weather, Cavity) denotes a 4 x 2 table of probabilities. We will see that this notation
simplifies many equations. We can also use logical connectives to make more complex sentences and
assign probabilities to them. For example, P(Cavity A -^Insured) - 0.06 says there is an 6% chance that a
patient has a cavity and has no insurance
Conditional probability:
➢ Once the agent has obtained some evidence concerning the previously unknown propositions
making up the domain, prior probabilities are no longer applicable. Instead, we use conditional or
posterior probabilities, with the notation P(A|B).
➢ This is read as "the probability of A given that all we know is B."
For example, indicates that if a patient is observed to have a toothache, and no other information is yet
available,
then the probability of the patient having a cavity will be 0.8.
➢ It is important to remember that P(A|B) can only be used when all we know is B. As soon as we
Axioms of Probability:
➢ All probabilities are between 0 and 1.
0 < P(A) < 1
➢ Necessarily true (i.e., valid) propositions have probability 1, and necessarily false (i.e.,
unsatisfiable) propositions have probability 0. P(True) = 1 P(False) = 0
➢ The probability of a disjunction is given by P(A V 5) = P(A) + P(B) - P(A A B)
Adding across a row or column gives the unconditional probability of a variable, for example, P(Cavity)
= 0.06 + 0.04 = 0.10.
P(Cavity V Toothache) = 0.04 + 0.01 + 0.06 = 0.11
Bayes Rule:
Consider the following situation. You have a new burglar alarm installed at home. It is fairly reliable at
detecting a burglary, but also responds on occasion to minor earthquakes. (This example is due to Judea
Pearl, a resident of Los Angeles; hence the acute interest in earthquakes.) You also have two neighbors,
John and Mary, who have promised to call you at work when they hear the alarm. John always calls when
he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too.
Mary, on the other hand, likes rather loud music and sometimes misses the alarm altogether. Given the
evidence of who has or has not called, we would like to estimate the probability of a burglary.
This simple domain is described by the belief network in Figure 15.2
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between random
variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link that
means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then node A is called the
parent of Node B.
o Node C is independent of node A.
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which
determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability. So let's first
understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3.. xn, are
known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
In general for each variable Xi, we can write the equation as:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia,
who have taken a responsibility to inform Harry at work when they hear the alarm. David always calls Harry
when he hears the alarm, but sometimes he got confused with the phone ringing and calls at that time too. On
the other hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here we
would like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
Solution:
The Bayesian network for the above problem is given below. The network structure is showing that burglary
and earthquake is the parent node of the alarm and directly affecting the probability of alarm's going off, but
David and Sophia's calls depend on alarm probability.
The network is representing that our assumptions do not directly perceive the burglary and also do not notice
the minor earthquake, and they also not confer before calling.
The conditional distributions for each node are given as conditional probabilities table or CPT.
Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set of cases
for the variable.
In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two parents,
then CPT will contain 4 probability values
List of all events occurring in this network:
Burglary (B)
Earthquake(E)
Alarm(A)
David Calls(D)
Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the
above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."
From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
from the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
What is learning?
Most often heard criticisms of AI is that machines cannot be called intelligent until theyare able to learn to
do new things and adapt to new situations, rather than simply doing asthey are told to do.
Some critics of AI have been saying that computers cannot learn!
Definitions of Learning: changes in the system that are adaptive in the sense that they enable the system to
do the same task or tasks drawn from the same population more efficiently and more effectively the next
time.
Learning covers a wide range of phenomenon:
Skill refinement: Practice makes skills improve. More you play tennis, better you get
Knowledge acquisition: Knowledge is generally acquired through experience
Rote learning:
Rote Learning is basically memorisation.
• Saving knowledge so it can be used again.
• Retrieval is the only problem.
Samuel's Checkers program employed rote learning (it also used parameter adjustment
which will be discussed shortly).
A minimax search was used to explore the game tree.
Time constraints do not permit complete searches.
It records board positions and scores at search ends.
Now if the same board position arises later in the game the stored value can be recalled
and the end effect is that deeper searched have occurred.
Organisation
-- access of the stored value must be faster than it would be to recompute it. Methods
such as hashing, indexing and sorting can be employed to enable this.
E.g Samuel's program indexed board positions by noting the number of pieces.
Generalisation
-- The number of potentially stored objects can be very large. We may need to
generalise some information to make the problem manageable.
E.g Samuel's program stored game positions only for white to move. Also rotations along
diagonals are combined
FOO (First Operational Operationaliser), for example, is a learning system which is used to learn
the game of Hearts.
It converts the advice which is in the form of principles, problems, and methods into effective
executable (LISP) procedures (or knowledge). Now this knowledge is ready to use.
A human user first translates the advice from English into a representation
The ‘t’ terms are the values that contribute to the evaluation. The ‘C’ terms are the coefficients
(weights) that are attached to these values.
But many moves must have contributed to that final outcome, Even if the program wins it
may have made some wrong moves along the way
Because of the limitations Samuel program did two things:
When the program is in learning mode paly against the copy of itself, At the end of the
game if the modified function won then the modified version is accepted otherwise the old one is
retained.
Periodically,one term in the scoring function was eliminated and replaced by another.
Macro operators can be used for games like 8-Puzzle(foe ex we have correctly placed 4 tiles
and our job is to put fifth without disturbing the earlier tiles.
A macro will not disturb 4 files externally (but within the macro tiles are disturbed).
Expert Systems: Representing and Using Domain Knowledge, Shell, Explanation, Knowledge
Acquisition.
PROSPECTOR is a program that provides advice on mineral exploration. It’s rule looks like this:
IF: magnetite and pyrite is disseminated or veinlet form is present
Then( 2,-4) there is a favourable mineralization and texture for the propylitic stage
Here each rule contains two estimates.
The first indicates that the presence of evidence described in the condition part of the rule suggests the
validity of the rules conclusion
The second measures the extent to which the evidence is necessary to the validity of the conclusion.
2 indicates the presence of the evidence is encouraging..
-4 indicates that the absence of the evidence is slightly discouraging
■ Because these programs are usually written primarily as rule based systems, forward chaining and
backward chaining are usually used .
• Knowledge acquisition system: It is the first and fundamental step. It helps to collect the experts
knowledge required to solve the Problems and build the knowledge base.
• Knowledge Base: This component is the heart of expert systems. It stores all factual and heuristic
knowledge about the application domain. It provides with the various representation techniques for all the
data.
• Inference mechanism: Inference engine is the brain of the expert system. This component is mainly
responsible for generating inference from the given knowledge from the knowledge base and produce line
of reasoning in turn the result of the user's query.
• Explanation subsystem: This part of shell is responsible for explaining or justifying the final or
intermediate result of user query. It is also responsible to justify need of additional knowledge
• User interface: It is the means of communication with the user. It decides the utility of expert system.
• To build an expert system using system shell, one needs to enter all necessary knowledge about a task
domain into the shell.
Explanation:
An expert system is said to be effective when people can interact with it easily.
To facilitate the interaction ,the expert system must have the following two properties:
1. Explain its reasoning: In many of the domains in which experts system operate ,people will not
accept results unless they have been convinced of the accuracy of the of the reasoning process that
produced those results.
An expert system is said to be effective when people can interact with it easily.
2. Acquire new knowledge and modifications of old knowledge: since expert systems derive their
power from the richness of the knowledge bases they is exploit it ,it is extremely important that those
knowledge bases be complete and as accurate as possible
One way to get the knowledge into a program is through interaction with the human expert. Or to have a
program that learns the expert behaviour from raw data.
• TEIRESIAS was the first program to support explanation and knowledge acquisition.
● The question can be answered by looking at the goal tree and chaining backward from the stated fact to
the evidence that allowed a rule that determined the fact to fire.
Knowledge Acquisition:
● How are experts system built?
Knowledge Engineer Interviews domain experts and to get the clear knowledge and the they are
translated into rules- This process is expensive and time consuming.
● Look for Automatic ways of constructing expert knowledge bases, but no automatic knowledge
acquisition systems exist yet.
● But there are programs that interact with domain experts to extract knowledge efficiently.
1. Entering knowledge
2. maintaining Knowledge base consistency
3. Ensuring Knowledge base completeness.
● The most useful knowledge acquisition programs are those that are restricted to a particular problem
solving paradigm eg: diagnosis or design.
● If the paradigm is diagnosis then the program can structure its knowledge base around symptoms,
hypothesis and causes.
● Since one system have many multiple causes the program can ask for knowledge about how to decide
when one hypothesis is better than another.
● MOLE is a knowledge acquisition system for heuristic classification problems, such as diagnosing
diseases.
● An Expert system produced by MOLE accepts input data ,comes up with a set of candidate explanations
or classifications that cover(explain) the data., the uses differentiating knowledge to determine which one
is best.
● MOLE interacts with the human expert to produce a knowledge base that a system called MOLE-
p(performance ) uses to solve problems
The acquisition proceeds through several steps:
1. Initial knowledge base construction. MOLE asks the expert to list common symptoms or
complaints that might require diagnosis.
For each symptom ,MOLE prompts for a list of possible explanations.
Whenever an event has multiple explanations, MOLE tries to determine the conditions under which the
explanation is correct.
The expert provides COVERING knowledge ,that is the knowledge that a hypothesized event does occur,
then the symptom will definitely appear.
2. Refinement of knowledge Base:
MOLE now tries to identify the weaknesses of knowledge base.One approach is to find holes and prompt
the expert to fill them.
MoLE lets the expert watch MOLE-P solving sample problems.
When ever MOLE-p makes an incorrect diagnosis ,the expert adds new knowledge.
For Ex: suppose we have a patient with Symptoms A and B. Futher suppose that symptom A could be
caused by the events X and Y, and that symptom B can be caused by Y and Z.
MOLE-p may conclude Y, since it explains both A and B.
If the expert indicates that this decision was incorrect,then MOLE will ask what evidence should be used
tp prefer X and/or Z over Y
● These parameters must be consistent with each other and they must result in the design that satisfies
external constraints imposed by cost factors, the type of building involved and the expected payloads.
● One problem solving method useful for design tasks is called propose and Revise.
● Here the system first proposes an extension to the current design. Then it checks whether the extension
violates any global or local constraints.
● It turns out that domain experts are good at listing overall design constraints and providing local
constraints on the individual parameters ,but not so good at explaining how to arrive at global solutions.
● The SALT program provides mechanisms for elucidating this knowledge from the expert.
● Each node stands for a value of a parameter that must be acquired or generated.
● Contributes- to link are are procedures that allows SALT to generate a value for one parameter based on
the value of another..
2. No contribute-to loops are allowed in the network.If a loop exists ,SALT tries to transform one of
the contributes to links into constrains links.
● One approach to automate this task is to interview loan officers in an attempt to extract domain
knowledge.
● Another approach is to inspect the records of loans the bank has made in the past and try to generate
rules automatically that will maximize the number of good loans and minimize the number of bad ones in
the future.
● META DENDRAL was the first program to use learning techniques to construct rules for the
expert system automatically