0% found this document useful (0 votes)

17 views61 pages

08 Graphical Models

Uploaded by

zhanghaojing62

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views61 pages

08 Graphical Models

Uploaded by

zhanghaojing62

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Topic 8: GRAPHICAL MODELS

STAT 37710/CAAM 37710/CMSC 35400 Machine Learning

Risi Kondor, The University of Chicago
Three types of “Probability”
1. Frequency of repeated trials: if an experiment is repeated infinitiely
many times, 0 ≤ p(A) ≤ 1 is the fraction of times that the outcome will
be A . Typical example: number of times that a coin comes up heads.
→ Frequentist probability.
2. Degree of belief: A quantity obeying the same laws as the above,
describing how likely we think a (possibly deterministic) event is. Typical
example: the probability that the Earth will warm by more than 5◦ F by
2100. → Bayesian probability.
3. Subjective probability: “I’m 110% sure that I’ll go out to dinner with you
tonight.”

Mixing these three notions is a source of lots of trouble. We will start with the
frequentist interpretation and then discuss the Bayesian one.

2
2/62
/62
Why do we need probability for ML?

Two distinct reasons:

1. To analyze, understand and predict the performance of learning
algorithms (Statistical Learning Theory, PAC model, etc.)
2. To build flexible and intuitive probabilistic models.

3
3/62
/62
Probabilistic vs. Algorithmic learning
• Algorithmic ML (e.g., SVMs):
◦ Strictly focus on the task at hand → discriminative
◦ Black box
◦ Algorithms often motivated directly by optimization methods → fast
◦ Examples: the perceptron, SVM, etc.
◦ “Frequentist”
• Probabilistic ML (e.g., graphical models):
◦ Everything in the world is a random variable → generative
◦ Flexible modeling framework for incorporating prior knowledge
◦ Models are often expressed with graphs → efficient message passing
algorithms
◦ Example: k –means clustering
◦ “Bayesian”
[Breiman: Statistical modeling: the two cultures]

4
4/62
/62
Joint probabilities and
independence
Machine learning applications often involve a large number of variables
(features) X1 , . . . , Xn .
• The conditional probability of Xi given Xj is

p(xi |xj ) = P(Xi = xi | Xj = xj ) p(xi , xj ) = p(xi |xj ) p(xj ).

• Xi and Xj are independent (denoted Xi ⊥

⊥ Xj ) if
p(xi |xj ) is indep of xj ⇐⇒ p(xi , xj ) = p(xi ) p(xj ).

• Xi is conditionally independent of Xj given Xk (denoted

Xi ⊥⊥ Xj |Xk ) if
p(xi , xj |xk ) = p(xi |xk ) p(xj |xk ).

IDEA: When faced with a large number of features, use our prior knowledge
of indepdencies to make learning easier. 5
5/62
/62
Directed graphical models
Also called Bayes nets or Belief Networks. Each vertex v ∈ V corresponds
to a random variable. Graph must be acyclic but not necessarily a tree.

The general form of the joint distribution of all the variables is

Y
p(x) = p(xv |xpa(v) ),
v∈V
where pa(v) are all the parents of v in the graph.

6
6/62
/62
Directed graphical models
Assuming that X1 , . . . , X6 are binary random variables, how many
numbers are need to describe their joint distribution? 26 − 1 = 63 .

Now what if we know that they conform to this Bayes net?

Each p(xi |xj ) corresponds to a 2 × 2 table, but rows sum to 1 , so only 2

numbers required. p(x6 |x2 , x5 ) requires 4 numbers.
Total: 1 + 2 + 2 + 2 + 2 + 4 = 13 . Quite a saving!

7
7/62
/62
Example: Markov chains
• If x1 , x2 , . . . is a series of (discrete or continuous) random variables
corresponding to a process evolving in time, then xt should only depend
on what happened in the past:

p(xt |x1 , . . . , xt−1 , xt+1 , . . .) = p(xt |x1 , . . . , xt−1 ).

• The sequence x1 , x2 , . . . is said to be a k ’th order Markov chain if

p(xt |x1 , . . . , xt−1 , xt+1 , . . .) = p(xt |xt−1 , . . . , xt−k ).

• A (first order) Markov chain is said to be stationary if the p(xt |xt−1 )

transition probabilities are independent of t ,

p(xt |xt−1 ) = Mxt ,xt−1 .

8
8/62
/62
Hidden Markov Models (HMM)

An HMM is a Markov chain of unobserved random variables x1 , x2 , . . . ,

each of which is related to an oberved random variable y1 , y2 , . . . .

Example: Tracking, part of speech tagging, phonemes, physiological states

of babies,...

9
9/62
/62
Applications of HMMs

HMMs and related state space models are widely applied in

• speech recognition (which phoneme/word/etc.)
• part of speech tagging (is it a NP, VP, etc.)?
• biological sequence analysis (intron or extron)?
• time series analysis (finance, climate, etc.)
• robotics (what is the actual location of the robot)?
• tracking

10
10/62
/62
Undirected graphical models
Also called Markov Random Fields. Graph can be any undirected graph.
Common example used for image segmentation:

The general form of the joint distribution over all the variables is

1 Y
p(x) = ϕc (xc )
Z
c∈Cliques(G)

where each ϕc is a potentially different clique potential (just a positive

P Q
function) and Z is the normalizing factor Z = x c∈Cliques(G) ϕc (xc ) .
11
11/62
/62
Example: the Ising model

Imagine an infinite grid of {−1, +1} valued random variables in which

neighboring variables are connected by the potential

ϕ(xi , xj ) = e−β/2(xi −xj ) .

Simple model of ferromagnetism. Exhibits a phase transition.

12
12/62
/62
Example: MRFs for segmentation

13
13/62
/62
Purpose of graphical models

In ML we often have a large number of variables related in complicated ways.

Graphical models
• capture prior knowledge about relationships between variables
• provide a compact representation of distributions over many variables
• define a specific hypothesis class
• help with figuring out causality
• the variables can be either discrete (e.g., “airbag yes/no”), continous (e.g.,
“value”) or a mixture of both types

14
/62
14/62
Tasks for graphical models

• Model selection (i.e., learn the graph itself from data)

• Learn the parameters of the model from data (i.e., the individual
conditionals or clique potentials)
• Deduce conditional independence relations
• Infer marginals and conditional distributions

15
15/62
/62
Inference
Partition V , the set of nodes, into three sets:
1. the set O of observed nodes
2. the set Q of query nodes
3. the set L of latent nodes P
x p(xQ , xL , xO )
Interested in p(xQ |xO ) = P L
xL ,xQ p(xQ , xL , xO )

Essential for both

• Training, when we are trying to learn the distribution of some of the
nodes from data.
• Prediction, when we are trying to predict the values of some nodes (the
output) given the values of some other nodes (the input)
Question: How can we do this in less than m| Q |+| L | time?

16
16/62
/62
Directed graphical models (Bayes nets)
Common cause

XA ̸⊥⊥ XB but XA ⊥⊥ XB | XC
Therefore, if C is observed, then A and B become independent.

Example: Lung cancer ⊥

⊥ Yellow teeth | Smoking

18
18/62
/62
Explaining away

XA ⊥⊥ XB but XA ̸⊥⊥ XB | XC
Therefore, if C is not obsereved (and neither are any of its descendents)
then A and B become independent.

Example: Burglary ̸⊥
⊥ Earthquake | Alarm

19
19/62
/62
D–separation

Is X independent of Y given the set of nodes S ?

An underected path from X to Y is said to be blocked if

1. it includes at least one node Z from S such that the arrows along the
path at Z meet head to tail or tail to tail; or
2. it includes at least one node W such that the arrows along the path at W
meet head to head, and neither W nor any of its descendants are in S .

Theorem
X⊥
⊥ Y | S if and only if all paths from X to Y are blocked.

20
20/62
/62
Learning parameters in Bayes nets

Recall the general form of a discrete Bayes net:

Y
p(x) = p(xv |xpa(v) ) xv ∈ {1, 2, . . . , kv } .
v∈V

Assuming for now that everyone has two parents, (xm(v) , xf (v) ) , the
conditional distributions can be parametrized by 3D arrays θ1 , . . . , θk :

p(xv |xm(v) , xf (v) ) = [θv ]xm(v) ,xf (v) ,xv .

P
To ensure normalization, xv [θv ]xm(v) ,xf (v) ,xv = 1 for all xm(v) , xf (v) .

Given data D = (x1 , . . . , xT ) , what is the MLE setting of (θv )v∈V ?

21
21/62
/62
Simpson’s paradox: word of caution
You are trying to determine whether a particular treatment for a serious
disease is beneficial. Given the following observations would you
recommend it?

Survived Did not survive Survival rate

Treatment 20 20 50%
No treatment 16 24 40%

Now what if you discovered that the breakdown by gender was this?

Males Survived Did not survive Survival rate

Treatment 18 12 60%
No treatment 7 3 70%

Females Survived Did not survive Survival rate

Treatment 2 8 20%
No treatment 9 21 30%

22
22/62
/62
Simpson’s paradox

• A graphical model can never capture all the variables that might possibly
be relevant. In the first case we ignored gender. This can affect what
interpretation the model suggests.
• The fact that there is an arrow from A (treatment) to B (outcome) does
not imply that A causes B . In our case we had a hidden common
cause, gender, of the opposite effect on B .
• To tease out causal structure we need more sophisitcated tools than just
ordinary graphical models: need to introduce interventions.
• Observational studies are not sufficient. The gold standard in medicine is
randomized controlled trials (RCTs).

23
23/62
/62
Learning parameters in Bayes nets
p(xv |xm(v) , xf (v) ) = [θv ]xm(v) ,xf (v) ,xv .

Y
T Y Y
ℓ(θ|D) = [θv ]xtm(v) ,xtf (v) ,xtv = ℓv (θv |D)
t=1 v∈V v∈V

Y
T
ℓv (θv |D) = [θv ]xtm ,xtf ,xtv
t=1
or
YY Na,b ! Na,b,1 Na,b,2 Na,b,v
[θv ]a,b,1 [θv ]a,b,2 . . . [θv ]a,b,vk k
a
Na,b,1 ! Na,b,2 ! . . . Na,b,kv !
b

Na,b,c = t | xtm = a, xtf = b, xtv = c

24
24/62
/62
Learning parameters in Bayes nets
Each

Na,b ! Na,b,1 Na,b,v

ℓv,a,b (θv |D) = [θv ]a,b,1 . . . [θv ]a,b,vk k
Na,b,1 ! Na,b,2 ! . . . Na,b,kv !

is just a multinomial like in Naive Bayes, so we know the MLE is

Na,b,c
[θbv ]a,b,c = P .
c Na,b,c

As before, can also use biased estimator

Na,b,c + γ
[θbv ]a,b,c = P .
c (Na,b,c + γ)

25
25/62
/62
Inference in Bayes nets: example

Is there a general algorithm that allows us to find factorizations like this?

→ Message passing algorithms
26
26/62
/62
Undirected graphical models
Undirected graphical models
Not every type of conditional dependency structure can be represented by a
Bayes net. Example:

XA ⊥
⊥ XC | {XB , XD } , XB ⊥⊥ XD | {XA , XC } .

Exercise: Give an example of a structure that cannot be represented by a

directed model either.

28
28/62
/62
Examples of undirected models

Social Network Protein interaction net

Grid model (e.g., Ising)

29
29/62
/62
Ordinary separation

Recall the general form of the undirected models:

1 Y
p(x) = ϕc (xc )
Z
c∈Cliques(G)

Is X independent of Y given the set of nodes S ?

Theorem
X⊥
⊥ Y | S if and only if all paths from X to Y contain at least one node
in S .
This is simpler than in the directed case.

30
30/62
/62
Parameter estimation and
inference

In undirected models
• Parameter estimation: Not as easy as in the directed case!
• Inference : message passing algorithms.

31
31/62
/62
Bayesian vs. Frequentists
Joint and conditional probability

Joint:
P(A, B) = P(A, B)

Conditional:
P(A, B)
P(A|B) =
P(B)
P(A, B)
P(B|A) =
P(A)

AI is all about conditional probabilities.

33
33/62
/62
Mammography
Sensitivity of screening mammogram p(+|cancer) ≈ 90%

Specificity of screening mammogram p(−|no cancer) ≈ 91%

Probability that a woman age 40 has breast cancer ≈ 1%

If a previously unscreened 40 year old woman’s mammogram is positive,

what is the probability that she has breast cancer?

P(cancer, +) P(+|cancer) P(cancer)

P(cancer|+) = = =
P(+) P(+)

0.01 × .9 0.009 0.009

≈ ≈ ≈ 9%
0.01 × .9 + 0.99 × 0.09 0.009 + 0.09 0.1

Message: P(A|B) ̸= P(B|A) .

34
34/62
/62
Bayes’ rule

p(A|B) p(B)
p(B|A) =
p(A)

Rev. Thomas Bayes (1701–1761)

35
35/62
/62
Prosecutor’s fallacy: Sally Clark
Two kids died with no explanation.

Sir Roy Meadow testified that chance of this

happening due to SIDS is (1/8500)2 ≈
(73 × 106 )−1 .

Sally Clark found guilty and imprisoned.

Later verdict overturned and Meadow struck

off medical register.
Sally Clark (1964–2007)

Fallacy: P(SIDS|2 deaths) ̸= P(SIDS, 2 deaths)

P(guilty|+) = 1 − P(not guilty|+) ̸= 1 − P(+|not guilty)

36
36/62
/62
Convict if ...

P(innocence) < reasonable doubt

P(innocence) < shadow of a doubt

37
37/62
/62
Statistical estimation
The fundamental problem of
statistics

Probability:

parameter the model the sample

z}|{ p
z}|{ sampling(IID) z }| {
θ −→ p(x) −−−−−−−−−→ S = {X1 , X2 , . . . , Xm }

Statistics:
θb
estimation
S = {X1 , X2 , . . . , Xm } −−−−−→

39
39/62
/62
The fundamental problem of
statistics
The problem of inferring θ from X1 , X2 , . . . , Xm is inherently ill defined:
• pθ (x) as a function of x is a probability — OK
• pθ (x) as a function of θ (called the likelihood ℓ(θ) = pθ (x) ) is not a
probability — Panic!

Two religions:
• Turn pθ (x) into a probability by putting a distribution on θ (called the
prior) → Bayesian statistics
• Just come up with a guess θ̂ for θ and then show that it is unlikely to get
a sample that will lead to a θ̂ which is far off. → Frequentist statistics

40
40/62
/62
Bayesian statistics in ML

likelihood prior
z }| { z}|{
posterior
z }| { p(X|θ) p(θ) p(X|θ) p(θ)
p(θ|X) = =R
p(X) p(X|θ) p(θ) dθ
| {z }
evidence

Example: HMM for tracking

Assumptions are steep, but at least we are honest about them.

41
41/62
/62
Frequentist statistics in ML

• Come up with an algorithm to get an estimator

• Try and justify later.

Example: perceptron

Justification itself involves frequentist statistics, because it requires

estimating the probability of error! → Error analysis of Bayesian methods
also requires frequentist statistics.

42
42/62
/62
Frequentist vs. Bayesian estimators

In the more classical setting of parametric models, how do we get an actual

estimator for θ ?

Frequentist:
• Use the maximum likelihood estimator θbMLE = arg max ℓ(θ)
• Just one of many options
Bayesian:
• A true Bayesian always reports the full posterior p(θ|X) .
• When pressed, might give the maximum a posteriori (MAP) estimator
θ̂MAP = arg max p(θ|X)
R
• or the posterior mean θ̂ = θ p(θ|X) dθ .

43
43/62
/62
Naive Bayes
A generative model for documents

Let xi be the number of times that word i occurs in a document D .

Generative model for docs from author A: p A ( x)

Generative model for docs from author B: pB (x)

Given a new document with vector x′ attribute it to A iff

pA (x′ ) > pB (x′ )

What form should p take?

45
45/62
/62
The multinomial model

Mutinomial (Naive Bayes) model for word counts:

n! X
k
p(x) = θx1 θx2 . . . θkxk θi = 1.
x1 ! x2 ! . . . x k ! 1 2
i=1

How do we find the parameters θ1 , θ2 , . . . , θk ?

46
46/62
/62
Naive Bayes — the Frequentist way
n! X
k
p(x) = θx1 θx2 . . . θkxk θi = 1.
x1 ! x2 ! . . . x k ! 1 2
i=1
MLE: maximize
X
k X
k
log ℓ(θ1 , θ2 , . . . , θk ) = constant + xi log θi s.t. θi = 1
i=1 i=1

Introduce the λ Lagrange multiplier:

" #
∂ X
k
log ℓ(θ1 , . . . , θk ) + λ θi = 0
∂θi
i=1

xi xi
=λ −→ θbi = Pk
θbi i=1 xi

47
47/62
/62
Naive Bayes — the Frequentist way

Pk
The MLE θi = xi / i=1 xi makes perfect sense but gives p(x′ ) = 0
whenever D ′ contains a word not seen in training!

Idea: bias the estimator:

xi + γ
θbi = P
kγ + ki=1 xi

where γ is a “pseudocount”.

48
48/62
/62
Naive Bayes — the Bayesian way
1. Take the prior p(θ1 , θ2 , . . . , θk ) to be

Γ(α1 + . . . + αk ) α1 −1
Dirichlet(α1 , α2 , . . . , αk ) = θ . . . θkαk −1
Γ(α1 ) . . . Γ(αk ) 1

where Γ is the Gamma function obeying Γ(n) = (n − 1)! for any

n ∈ N.
2. Apply Bayes’ rule
p(x|θ) p(θ)
p(θ|x) = R
p(x|θ) p(θ) dθ

3. The posterior becomes

p(θ1 , θ2 , . . . , θk |X) = Dirichlet(α1 + x1 , . . . , αk + xk ).

49
49/62
/62
Naive Bayes — the Bayesian way

The maximum a posterior (MAP) is given by

xi + α i − 1
θi = Pk
i=1 xi + αi − 1

Equivalent to frequentist estimator with pseudocounts of αi − 1 !!!

50
50/62
/62
Learning the parameters of Bayes nets
Parameter Learning
Recall that the joint distribution of Bayes net is of the form
Y
p(x) = pv (xv |xpa(v) ).
v∈V

Up to now, we have assumed that each pv is fully specified, and asked

questions about the distribution of certain variables given others.

Now the question is:

• Assuming that each pv (xv |xpa(v) ) is parametrized by some set of

parameters Θv , given a training set x1 , x2 , . . . , xm of all the
variables together, how should set the Θv parameters?

This type of task is central to using graphical models in practice.

52
52/62
/62
A simple model
The simplest case is a model with just two variables, x = (x1 , x2 ) :

p(x) = p(x2 |x1 ) p(x1 ) X1 X2

Assume that:
• x1 can take on k1 different values {1, 2, . . . , k1 } ,
• x2 can take on k2 different values {1, 2, . . . , k2 } .

In this case p(x2 |x1 ) is describred by the matrix Θ ∈ [0, 1]k1 ×k2 of
parameters
θi,j = p(X2 = j|X1 = i).

m
How do we learn this matrix from the training data {(xu u
1 , x2 )}u=1 ?

53
53/62
/62
The Maximum Likelihood Principle
Assume that we are given
• a parametric family of distributions pΘ (x) parametrized some the (set of)
parameters Θ ,
• a sample {x1 , x2 , . . . , xm } from pΘ .

The likelihood of {x1 , x2 , . . . , xm } under pΘ is

Y
m
ℓ(Θ) = pΘ (xu ).
u=1

The Maximum Likelihood Estimator (MLE) Θ b of Θ is then the setting of

the parameters that maximizes this expression.

Finding (estimating) parameters is a general strategy in statistics, not just for

Bayes nets. Often it is slightly more convenient to maximize the
log-likelihood
X
m
log ℓ(Θ) = log pΘ (xu ).
u=1
54
54/62
/62
Maximum Likelihood

In our case, recalling that p(X2 = j|X1 = i) = θi,j ,

Y
m Y
m k1 Y
Y k2
N (j)
ℓ(Θ) = p(xu2 |xu1 ) = θx u
1 ,x u
2
= θi,ji ,
u=1 u=1 i=1 j=1

where Ni (j) is the number of samples where X1 = i and X2 = j .

• The likelihood only depends on how many times we have observed each
combination of (x1 , x2 ) together. This makes sense.
• For each possible i , the conditional p(X2 = j|X1 = i) = θi,j is a
prob. distr. as a function of j . → We can learn each row of Θ
P k2
separately, with the constraint that j=1 θi,j = 1 .

55
55/62
/62
Maximum Likelihood
To learn (θi,1 , θi,2 , . . . , θi,k2 ) only need the likelihood over those training
examples in which x1 = i :

Y
k2
N (j)
ℓi (θi,1 , . . . , θi,k2 ) = θi,ji .
j=1

In logarithmic form, to get the MLE (θbi,1 , θbi,2 , . . . , θbi,k2 ) , solve

X
k2 X
k2
maximize Ni (j) log θi,j subject to θi,j = 1.
j=1 j=1

This is a constrained optimization problem. → Use Lagrange multipliers.

56
56/62
/62
Zero counts
One problem with the MLE

Ni (j)
θbi,j = Pk2
′
j ′ =1 Ni (j )

is that if one (i, j) pair never occurs in the training data, then the
corresponding θbi,j will be zero. → The learned model will not be able to
deal with any example in which X1 = i and X2 = j .

Standard solution: add a small “pseudocount” to each Ni (j) , e.g., γ = 0.1 :

Ni (j) + γ
θbi,j = Pk2 .
′
j ′ =1 (Ni (j ) + γ)

This is a form of regularization. We will see other interpretations later.

57
/62
57/62
Multiple parents
What if the model is

p(x) = p(xr+1 |x1 , x2 , . . . , xr ) ?

Now Θ is a k1 × . . . × kr × kr+1 array (tensor) with

θi1 ,...,ir ,j = p( Xr+1 = j | X1 = i1 , . . . , Xr = ir ).

However, fixing any (i1 , . . . , ip ) we can solve for the corresponding θb...,j ’s
just as before and get (using pseudocounts)

Ni ,...,i (j) + γ
θbi1 ,...,ir ,j = Pkr+1 1 r .
′
j ′ =1 (Ni1 ,...,ir (j ) + γ)

58
58/62
/62
MLE for the whole Bayes Net
Recall that the joint distribution of the whole Bayes net is
Y
p(x) = pv (xv |xpa(v) ),
v∈V

so we need to learn a separate Θv array for each of these factors.

Assuming that each of the variables is discrete, this is not so hard. For each
v , assuming that the parents of v are {p1 , . . . , pr } :
1. Form the corresponding likelihood
Y
m
ℓ(Θv ) = p(xuv |xup1 , . . . , xupr ).
u=1

2. Use the same steps as before to find the corresponding MLE solution
Ni ,...,i (j) + γ
[θbv ]i1 ,...,ir ,j = Pkr+1 1 r .
′
j ′ =1 (Ni1 ,...,ir (j ) + γ)

59
59/62
/62
Expectation maximization

What about when some of the xi ’s are not observed? Use the EM strategy
and iterate until convergence:
1. E-step: Compute the expected log-likelihood (w.r.t. the hidden variables)
b old
under Θ
LΘ
b old(Θ).

b:
2. M-step: Maximize this to get the new estimate for Θ
b = arg max L b (Θ).
Θ
ΘΘold

Just like in probabilistic k –means. Whether or not this is viable for a

complicated model is not obvious.

60
60/62
/62
FURTHER READING

• David Barber: Bayesian Reasoning and Machine Learning (online)

• Daphne Koller and Nir Friedman: Probabilistic Graphical Models
• Tutorial by Sam Roweis:
http://videolectures.net/mlss06tw_roweis_mlpgm/
• Coursera course “Probabilistic Graphical Models” by Daphne Koller

61
61/62
/62

Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
All of Graphical Models
No ratings yet
All of Graphical Models
135 pages
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
100% (1)
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
1,266 pages
Graph Lecture19
No ratings yet
Graph Lecture19
42 pages
Daphne Koller, Nir Friedman Probabilistic Graphical Models Principles and Techniques 2009
100% (10)
Daphne Koller, Nir Friedman Probabilistic Graphical Models Principles and Techniques 2009
1,270 pages
(Adaptive Computation and Machine Learning) Daphne Koller - Nir Friedman - Probabilistic Graphical Models - Principles and PDF
No ratings yet
(Adaptive Computation and Machine Learning) Daphne Koller - Nir Friedman - Probabilistic Graphical Models - Principles and PDF
1,270 pages
Overview of Probabilistic Graphical Models
No ratings yet
Overview of Probabilistic Graphical Models
43 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
63 pages
ML - Unit-V-1
No ratings yet
ML - Unit-V-1
42 pages
Lecture # 2-1 Probabilistic Models
No ratings yet
Lecture # 2-1 Probabilistic Models
40 pages
Unit V - Graphical Models
No ratings yet
Unit V - Graphical Models
43 pages
Bayesian Network Construction Overview
No ratings yet
Bayesian Network Construction Overview
58 pages
1999-Modelling Gene Expression Data Using Dynamic Bayesian Networks
No ratings yet
1999-Modelling Gene Expression Data Using Dynamic Bayesian Networks
12 pages
Bayesian Neworks
No ratings yet
Bayesian Neworks
32 pages
Directed Graphical Models in Python
No ratings yet
Directed Graphical Models in Python
24 pages
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
No ratings yet
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
8 pages
DNN Module 4
No ratings yet
DNN Module 4
20 pages
Probabilistic Graphical Models Guide
No ratings yet
Probabilistic Graphical Models Guide
75 pages
Unit 4 Full PPT (ML)
No ratings yet
Unit 4 Full PPT (ML)
31 pages
Unit 3
No ratings yet
Unit 3
27 pages
Introduction to Graphical Models
No ratings yet
Introduction to Graphical Models
88 pages
Probabilistic Graphical Models Homework Solutions
100% (2)
Probabilistic Graphical Models Homework Solutions
6 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
A Hidden Markov Model
No ratings yet
A Hidden Markov Model
6 pages
AI Unit 5 Notes
No ratings yet
AI Unit 5 Notes
35 pages
Bayesian Learning and Classification Techniques
No ratings yet
Bayesian Learning and Classification Techniques
44 pages
Bayesian Networks for Researchers
No ratings yet
Bayesian Networks for Researchers
30 pages
PR January20 06 PDF
No ratings yet
PR January20 06 PDF
29 pages
Bayesian Networks: Independencies and Inference: Scott Davies and Andrew Moore
No ratings yet
Bayesian Networks: Independencies and Inference: Scott Davies and Andrew Moore
21 pages
Bayes Nets 2016
No ratings yet
Bayes Nets 2016
62 pages
Chapter 9 Data Mining
No ratings yet
Chapter 9 Data Mining
147 pages
ML 9
No ratings yet
ML 9
15 pages
Matrix Properties
No ratings yet
Matrix Properties
53 pages
Prob Inf
No ratings yet
Prob Inf
56 pages
Lecture-8 Machine Learning With Python
No ratings yet
Lecture-8 Machine Learning With Python
35 pages
Bayesian Network Homework
100% (1)
Bayesian Network Homework
5 pages
Machine Learning Technique - Introduction To Graphical Models
No ratings yet
Machine Learning Technique - Introduction To Graphical Models
12 pages
21 BN 20
No ratings yet
21 BN 20
59 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
0% (1)
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
04 Exact Inference
No ratings yet
04 Exact Inference
23 pages
Part 1 Lecture 4 Graphical Models
No ratings yet
Part 1 Lecture 4 Graphical Models
28 pages
ML Unit V
No ratings yet
ML Unit V
32 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
46 pages
Lecture 5 Bayesian Networks
No ratings yet
Lecture 5 Bayesian Networks
12 pages
PGM Theory Notes
No ratings yet
PGM Theory Notes
16 pages
HMMs and Bayes Nets
No ratings yet
HMMs and Bayes Nets
34 pages
Markov Networks: Overview and Comparison
No ratings yet
Markov Networks: Overview and Comparison
22 pages
Unit IV CI PDF
No ratings yet
Unit IV CI PDF
24 pages
Graphical Models - Learning
No ratings yet
Graphical Models - Learning
20 pages
ML Unit 5
No ratings yet
ML Unit 5
65 pages
AML Unit 2
No ratings yet
AML Unit 2
6 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
29 pages
Probability and Bayesian Networks
No ratings yet
Probability and Bayesian Networks
2 pages
Random ( Statistical Stochastic) Vari-Able: V X Rain
No ratings yet
Random ( Statistical Stochastic) Vari-Able: V X Rain
8 pages
Bayesian Networks
No ratings yet
Bayesian Networks
45 pages
4 - Mathematical Foundations You'Ll Actually Use
No ratings yet
4 - Mathematical Foundations You'Ll Actually Use
19 pages
Agrarian Reform Law PRC 1950
No ratings yet
Agrarian Reform Law PRC 1950
58 pages
04SVM
No ratings yet
04SVM
22 pages
00intro 1
No ratings yet
00intro 1
43 pages
2.4 Owen Dulce
No ratings yet
2.4 Owen Dulce
2 pages
11 CNNs
No ratings yet
11 CNNs
64 pages
Laboratory Activity 5
No ratings yet
Laboratory Activity 5
33 pages
Case Study 2.0
No ratings yet
Case Study 2.0
6 pages
AVAPOLY HT Safety Data Sheet
No ratings yet
AVAPOLY HT Safety Data Sheet
4 pages
Authentic Leadership 2
No ratings yet
Authentic Leadership 2
32 pages
Soal Soal English 2001 Simak
No ratings yet
Soal Soal English 2001 Simak
20 pages
Concept Strengthening Sheet (CSS-03) Based On AIATS-03 (TYM) - Phase-03 & 04 - Physics
No ratings yet
Concept Strengthening Sheet (CSS-03) Based On AIATS-03 (TYM) - Phase-03 & 04 - Physics
5 pages
Mahapurusha Srimanta Sankaradeva Viswavidyalaya
No ratings yet
Mahapurusha Srimanta Sankaradeva Viswavidyalaya
5 pages
Welding Journal 1959 5
No ratings yet
Welding Journal 1959 5
134 pages
Datasheet - 77 - Te Tns n043n 00
No ratings yet
Datasheet - 77 - Te Tns n043n 00
1 page
Reflection Paper - Dela Cruz 201414152
No ratings yet
Reflection Paper - Dela Cruz 201414152
2 pages
Effectsof Temperatureonthe Developmentof Ephestiacautella
No ratings yet
Effectsof Temperatureonthe Developmentof Ephestiacautella
8 pages
Bioinformatics Career Profile
No ratings yet
Bioinformatics Career Profile
3 pages
Base 27: A New Numerical System
100% (1)
Base 27: A New Numerical System
11 pages
Understanding Organisational Behaviour
No ratings yet
Understanding Organisational Behaviour
9 pages
Practice On Sinusoidal and Inverse Functions
0% (1)
Practice On Sinusoidal and Inverse Functions
3 pages
12 PiXL Gateway Physics Practical Skills
No ratings yet
12 PiXL Gateway Physics Practical Skills
12 pages
Grade 9 NS English 1732182269
No ratings yet
Grade 9 NS English 1732182269
28 pages
भाषा संगम: बहुभाषिकता का कार्यक्रम
No ratings yet
भाषा संगम: बहुभाषिकता का कार्यक्रम
40 pages
Wet Grinding Ball Mill Feed Chute Cleaning
No ratings yet
Wet Grinding Ball Mill Feed Chute Cleaning
6 pages
6 Histogram Based Information Extraction
No ratings yet
6 Histogram Based Information Extraction
29 pages
Blue Print of Question Paper
No ratings yet
Blue Print of Question Paper
2 pages
Understanding Argumentative Texts
No ratings yet
Understanding Argumentative Texts
5 pages
Animal Habitats Worksheet for Kids
No ratings yet
Animal Habitats Worksheet for Kids
3 pages
Rajshahi University of Engineering & Technology: Lab Report
No ratings yet
Rajshahi University of Engineering & Technology: Lab Report
5 pages
Chinese Remainder Theorem PDF
No ratings yet
Chinese Remainder Theorem PDF
5 pages
Electrochemical Energy Storage Insights
No ratings yet
Electrochemical Energy Storage Insights
25 pages
Chương 2 Cơ sở lý thuyết đúc
No ratings yet
Chương 2 Cơ sở lý thuyết đúc
17 pages
Doppler Worksheet Only 2020
No ratings yet
Doppler Worksheet Only 2020
23 pages
S AP Physics 1
No ratings yet
S AP Physics 1
23 pages
RBTS IEEE-6 BUS System Markov Analysis
No ratings yet
RBTS IEEE-6 BUS System Markov Analysis
12 pages