ARTIFICIAL INTELLIGENCE
Uncertain Knowledge
and Reasoning
1
ARTIFICIAL INTELLIGENCE
2020/2021 Semester 2
Quantifying Uncertainty:
Chapter 13
2
课程到目前为止,都是基于如下“完美”假
设:
• 世界(问题)的状态都是完全可以观察的
当前状态是明确知道的
• 所执行的动作(决策)的描述是清楚的
下一步到达的状态是可以明确预测的
现在我们研究一个智能体如何去应对所谓的
“非完美” 信息
有时候,我们还需要考虑动态的问题世界
3
Introductory Example
A robot with
imperfect
sensing must
reach a goal
location among
moving obstacles
(dynamic world)
Goal
4
Outline
• Acting under Uncertainty
• Probability
• Inference using Full Joint Distribution
• Independence
• Bayes' Rule and It’s Use
5
Acting under Uncertainty
Let action At = leave for airport t minutes before flight
Will At get me there on time?
Problems:
1. partial observability (road state, other drivers' plans, etc.)
2. noisy sensors (traffic reports)
3. uncertainty in action outcomes (flat tire, etc.)
4. immense complexity of modeling and predicting traffic
Hence a purely logical approach either
1. risks falsehood: “A25 will get me there on time”, or
2. leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there's no accident on the bridge
and it doesn't rain and my tires remain intact etc etc.”
6
Methods for handling uncertainty
• Default or nonmonotonic logic(非单调逻辑):
– Assume my car does not have a flat tire
– Assume A25 works unless contradicted by evidence
• Issues: What assumptions are reasonable? How to handle
contradiction?
• Rules with fudge factors (经验系数):
– A25 |→0.3 get there on time
– Sprinkler |→ 0.99 WetGrass
– WetGrass |→ 0.7 Rain
• Issues: Problems with combination, e.g., Sprinkler causes Rain??
• Probability
– Model agent‘s degree of belief, given the available evidence,
– A25 will get me there on time with probability 0.04
7
Probability
• Probabilistic assertions summarize effects of
– laziness: failure to enumerate exceptions,
qualifications, etc.
– ignorance: lack of relevant facts, initial conditions, etc.
• Theoretical ignorance: Medical science has no complete
theory for the domain.
• Practical ignorance: Even if we know all the rules, we might
be uncertain about a particular patient because not all the
necessary tests have been or can be run.
degree of belief
8
Probability
• Probability provides a way of summarizing the
uncertainty that comes from our laziness and
ignorance.
• Subjective probability:
– Probabilities relate propositions to agent's own state
of knowledge
e.g., P(A25 | no reported accidents) = 0.06
– These are not assertions about the world
– Probabilities of propositions change with new
evidence:
e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15
9
Making decisions under uncertainty
Suppose I believe the following:
P(A25 gets me there on time | …) = 0.04
P(A90 gets me there on time | …) = 0.70
P(A120 gets me there on time | …) = 0.95
P(A1440 gets me there on time | …) = 0.9999
• Which action to choose?
Depends on my preferences for missing flight vs. time spent
waiting, etc.
– Utility theory is used to represent and infer preferences
– Decision theory = probability theory + utility theory
10
Making decisions under uncertainty
11
Basic concept of probability
• Basic element: random variable
• Similar to propositional logic: possible worlds defined by
assignment of values to random variables.
• Boolean random variables
e.g., Cavity (do I have a cavity?)
• Discrete random variables
e.g., Weather is one of <sunny, rainy, cloudy, snow>
– Domain values must be exhaustive and mutually exclusive
• Elementary proposition constructed by assignment of a value
to a random variable:
– e.g., Weather = sunny, Cavity = false (abbreviated as cavity)
• Complex propositions formed from elementary propositions
and standard logical connectives
– e.g., Weather = sunny Cavity = false
12
Basic concept of probability
• Atomic event: A complete specification of the state of
the world about which the agent is uncertain
E.g., if the world consists of only two Boolean variables Cavity
and Toothache, then there are 4 distinct atomic events:
Cavity = false Toothache = false
Cavity = false Toothache = true
Cavity = true Toothache = false
Cavity = true Toothache = true
• Atomic events are mutually exclusive and exhaustive
13
Axioms of probability
• For any propositions A, B
– 0 ≤ P(A) ≤ 1
– P(true) = 1 and P(false) = 0
– P(A B) = P(A) + P(B) - P(A B)
14
Using the axioms of probability
• P(a⋁¬a ) = P(a) + P (¬ a ) - P(a⋀¬a )
(by axiom 3 with b = ¬ a )
• P(true) = P(a) + P (¬ a ) - P(false)
(by logical equivalence)
• 1 = P(a) + P (¬ a ) (by axiom 2)
• P (¬ a ) = 1 - P(a) (by algebra).
15
Prior probability
• Prior or unconditional probabilities of propositions
e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to
belief prior to arrival of any (new) evidence
• Probability distribution gives values for all possible assignments:
P(Weather) = <0.72,0.1,0.08,0.1> (normalized, i.e., sums to 1)
• Joint probability distribution for a set of random variables gives
the probability of every atomic event on those random variables
P(Weather,Cavity) = a 4 × 2 matrix of values:
Weather = sunny rainy cloudy snow
Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08
• Every question about a domain can be answered by the joint
distribution
16
Conditional probability
• Conditional or posterior probabilities
e.g., P(cavity | toothache) = 0.8
i.e., given that toothache is all I know
• (Notation for conditional distributions:
P(Cavity | Toothache) = 2-element vector of 2-element vectors)
• If we know more, e.g., cavity is also given, then we have
P(cavity | toothache,cavity) = 1
• New evidence may be irrelevant, allowing simplification, e.g.,
P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8
• This kind of inference, sanctioned by domain knowledge, is
crucial
17
Conditional probability
• Definition of conditional probability:
P(a | b) = P(a b) / P(b) if P(b) > 0
• Product rule gives an alternative formulation:
P(a b) = P(a | b) P(b) = P(b | a) P(a)
• A general version holds for whole distributions, e.g.,
P(Weather,Cavity) = P(Weather | Cavity) P(Cavity)
• (View as a set of 4 × 2 equations, not matrix mult.)
• Chain rule is derived by successive application of product rule:
P(X1, …,Xn) = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)
= P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1)
=…
= Пi= 1^n P(Xi | X1, … ,Xi-1)
18
Conditional probability
P(Cavity, Toothache)
Cavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
P(Cavity) P(Toothache)
Cavity = false 0.9 Toothache = false 0.85
Cavity = true 0.1 Toothache = true 0.15
• What is P(Cavity = true | Toothache = false)?
0.05 / 0.85 = 0.059
• What is P(Cavity = false | Toothache = true)?
0.1 / 0.15 = 0.667
19
Conditional distributions
• A conditional distribution is a distribution over the values of
one variable given fixed values of other variables
P(Cavity, Toothache)
Cavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
P(Cavity | Toothache = true) P(Cavity|Toothache = false)
Cavity = false 0.667 Cavity = false 0.941
Cavity = true 0.333 Cavity = true 0.059
P(Toothache | Cavity = true) P(Toothache | Cavity = false)
Toothache= false 0.5 Toothache= false 0.889
Toothache = true 0.5 Toothache = true 0.111
20
Inference by enumeration
• Start with the joint probability distribution:
•
• For any proposition φ, sum the atomic events where it is true:
P(φ) = Σω:ω╞φ P(ω)
21
Inference by enumeration
• Start with the joint probability distribution:
• For any proposition φ, sum the atomic events where it is true:
P(φ) = Σω:ω╞φ P(ω)
• P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
•
22
Inference by enumeration
• Start with the joint probability distribution:
•
• Can also compute conditional probabilities:
P(cavity | toothache) = P(cavity toothache)
P(toothache)
= 0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
23
Normalization
• Denominator can be viewed as a normalization constant α
P(Y | X)= α P(X | Y)P(Y)
P(Cavity | toothache) = α P(Cavity,toothache)
= α [P(Cavity, toothache, catch) + P(Cavity, toothache, catch)]
= α [<0.108,0.016> + <0.012,0.064>]
= α <0.12,0.08> = <0.6,0.4>
General idea: compute distribution on query variable by fixing
evidence variables and summing over hidden variables
24
Inference by enumeration
Typical Problem:
Given evidence variables E=e, estimate the posterior joint
distribution of the query variables Y
Let the hidden variables be H,
P(Y | E = e) = α P(Y,E = e) = α Σh P(Y, E= e, H = h)
• Obvious problems:
1. Worst-case time complexity O(dn) where d is the largest arity
2. Space complexity O(dn) to store the joint distribution
3. How to find the numbers for O(dn) entries?
25
Independence
• Two events A and B are independent if and only if
P(A B) = P(A) P(B)
– In other words, P(A | B) = P(A) and P(B | A) = P(B)
– This is an important simplifying assumption for modeling,
e.g., Toothache and Weather can be assumed to be
independent
• Are two mutually exclusive events independent?
– No, but for mutually exclusive events we have
P(A B) = P(A) + P(B)
• Conditional independence: A and B are conditionally
independent given C iff P(A B | C) = P(A | C) P(B | C)
26
Independence
• A and B are independent iff
P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)
P(Toothache, Catch, Cavity, Weather)
= P(Toothache, Catch, Cavity) P(Weather)
• 32 entries reduced to 12; for n independent biased coins,
O(2n) →O(n)
• Absolute independence powerful but rare
• Dentistry is a large field with hundreds of variables, none of
which are independent. What to do?
27
Conditional independence
• If I have a cavity, the probability that the probe catches in it
doesn't depend on whether I have a toothache:
(1) P(catch | toothache, cavity) = P(catch | cavity)
• The same independence holds if I haven't got a cavity:
(2) P(catch | toothache, cavity) = P(catch | cavity)
• Catch is conditionally independent of Toothache given Cavity:
P(Catch | Toothache, Cavity) = P(Catch | Cavity)
• Equivalent statements:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
28
Conditional independence
• Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
I.e., 2 + 2 + 1 = 5 independent numbers
• In most cases, the use of conditional independence reduces the
size of the representation of the joint distribution from
exponential in n to linear in n.
• Conditional independence is our most basic and robust form of
knowledge about uncertain environments.
29
Conditional independence:
Example
• Toothache: boolean variable indicating whether the patient has a toothache
• Cavity: boolean variable indicating whether the patient has a cavity
• Catch: whether the dentist’s probe catches in the cavity
• If the patient has a cavity, the probability that the probe catches in it doesn't
depend on whether he/she has a toothache
P(Catch | Toothache, Cavity) = P(Catch | Cavity)
• Therefore, Catch is conditionally independent of Toothache given Cavity
• Likewise, Toothache is conditionally independent of Catch given Cavity
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
• Equivalent statement:
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
30
Bayes' Rule
• Bayes' rule
P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y)
P(disease | symptom) =
P(symptom | disease) P(disease)
P(symptom)
31
Bayes' Rule
• Useful for assessing diagnostic probability from causal probability:
– P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)
– E.g., let M be meningitis (脑膜炎), S be stiff neck:
– We know:
P(s|m)=0.5; P(m)=1/5000; P(s)=1/20;
– Query:
P(m|s) = P(s|m) P(m) / P(s) = 0.5 × 0.0002 / 0.05 = 0.002
– Note: posterior probability of meningitis still very small!
32
Bayes Rule example
• Marie is getting married tomorrow, at an outdoor
ceremony in the desert. In recent years, it has rained only
5 days each year (5/365 = 0.014). Unfortunately, the
weatherman has predicted rain for tomorrow. When it
actually rains, the weatherman correctly forecasts rain
90% of the time. When it doesn't rain, he incorrectly
forecasts rain 10% of the time. What is the probability
that it will rain on Marie's wedding?
33
Bayes Rule example
• Marie is getting married tomorrow, at an outdoor ceremony in the
desert. In recent years, it has rained only 5 days each year (5/365 =
0.014). Unfortunately, the weatherman has predicted rain for
tomorrow. When it actually rains, the weatherman correctly
forecasts rain 90% of the time. When it doesn't rain, he incorrectly
forecasts rain 10% of the time. What is the probability that it will
rain on Marie's wedding?
P(Predict | Rain ) P(Rain )
P(Rain | Predict )
P(Predict )
P(Predict | Rain ) P(Rain )
P(Predict | Rain ) P(Rain ) P(Predict | Rain ) P(Rain )
0.9 * 0.014
0.111
0.9 * 0.014 0.1* 0.986
34
Bayes rule: Another example
• 1% of women at age forty who participate in routine
screening have breast cancer. 80% of women with
breast cancer will get positive mammographies (早期胸
部肿瘤X线测定法) . 9.6% of women without breast
cancer will also get positive mammographies. A
woman in this age group had a positive
mammography in a routine screening. What is the
probability that she actually has breast cancer?
35
Bayes rule: Another example
• 1% of women at age forty who participate in routine screening
have breast cancer. 80% of women with breast cancer will get
positive mammographies. 9.6% of women without breast
cancer will also get positive mammographies. A woman in
this age group had a positive mammography in a routine
screening. What is the probability that she actually has breast
cancer?
P(Positive | Cancer ) P(Cancer )
P(Cancer | Positive)
P(Positive)
P(Positive | Cancer ) P(Cancer )
P(Positive | Cancer ) P(Cancer ) P(Positive | Cancer ) P(Cancer )
0.8 * 0.01
0.0776
0.8 * 0.01 0.096 * 0.99
36
Bayes' Rule and conditional independence
P(Cavity | toothache catch)
= α P(toothache catch | Cavity) P(Cavity)
= α P(toothache | Cavity) P(catch | Cavity) P(Cavity)
• This is an example of a naï
ve Bayes model:
•
• P(Cause,Effect1, … ,Effectn) = P(Cause) ПiP(Effecti|Cause)
• Total number of parameters is linear in n
37
Summary
• Probability is a rigorous formalism for uncertain
knowledge
• Joint probability distribution specifies probability of
every atomic event
• Queries can be answered by summing over atomic
events
• Independence and conditional independence
provide the basic tools to reduce the joint size
38
Exercises
• 13.8
• 13.15
39