0% found this document useful (0 votes)

37 views79 pages

PCFGs for Linguistics Students

Uploaded by

purid9991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views79 pages

PCFGs for Linguistics Students

Uploaded by

purid9991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Tutorial on

Probabilistic Context‐Free
Grammars
Raphael Hoffmann
590AI, Winter 2009
Outline
• PCFGs: Inference and Learning
• Parsing English
• Discriminative CFGs
• Grammar Induction
Image Search for “pcfg”

Live Search
Outline
• PCFGs: Inference and Learning
• Parsing English
• Discriminative CFGs
• Grammar Induction
The velocity of the seismic waves rises to …

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
A CFG consists of
• Terminals w1, w2, . . . , wV

• Nonterminals N 1, N 2, . . . , N n

• Start symbol N1

• Rules N i −→ ζ j
where ζ j is a sequence of
terminals and nonterminals

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
A (generative) PCFG consists of
• Terminals w1, w2, . . . , wV

• Nonterminals N 1, N 2, . . . , N n

• Start symbol N1

• Rules N i −→ ζ j
where ζ j is a sequence of
terminals and nonterminals

• Rule probabilities such that

P i j
∀i j P (N −→ ζ )=1

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Notation
sentence: sequence of words w1w2 . . . wm
wab : the subsequence wa . . . wb
i
Nab : nonterminal N i dominates wa . . . wb
Ni

N i =⇒∗ ζ : repeated derivation from N i gives ζ

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Probability of a Sentence

X
P (w1n) = P (w1n, t)
t

where t a parse tree of w1n

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Example

• Terminals with, saw, astronomers, ears, stars,

telescopes
• Nonterminals S, PP, P, NP, VP, V
• Start symbol S

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
astronomers saw stars with ears

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Probabilities

P (t1 ) = 1.0 × 0.1 × 0.7 × 1.0 × 0.4

×0.18 × 1.0 × 1.0 × 0.18
= 0.0009072
P (t2 ) = 1.0 × 0.1 × 0.3 × 0.7 × 1.0
×0.18 × 1.0 × 1.0 × 0.18
= 0.0006804
P (w15) = P (t1 ) + P (t2 ) = 0.0015876

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Assumptions of PCFGs
1. Place invariance (like time invariance in HMMs)
j
∀k P (Nk(k+c) −→ ζ) is the same

2. Context free
j j
P (Nkl −→ ζ| words outside wk . . . wl ) = P (Nkl −→ ζ)

3. Ancestor free
j j j
P (Nkl −→ ζ| ancestor nodes of Nkl ) = P (Nkl −→ ζ)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Some Features of PCFGs
• Partial solution for grammar ambiguity
• Can be learned from positive data alone
(but grammar induction difficult)
• Robustness
(admit everything with low probability)
• Gives a probabilistic language model
• Predictive power better than that for a HMM

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Some Features of PCFGs
• Not lexicalized (probabilities do not factor in
lexical co‐occurrence)
• PCFG is a worse language model for English
than n‐gram models
• Certain biases: smaller trees more probable
(average WSJ sentence 23 words)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Inconsistent Distributions

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Questions
Let w1m be a sentence, G a grammar, t a parse tree.

• What is the probability of a sentence?

P (w1m|G)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Questions
Let w1m be a sentence, G a grammar, t a parse tree.

• What is the probability of a sentence?

P (w1m|G)
• What is the most likely parse of sentence?
arg maxt P (t|w1m, G)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Questions
Let w1m be a sentence, G a grammar, t a parse tree.

• What is the probability of a sentence?

P (w1m|G)
• What is the most likely parse of sentence?
arg maxt P (t|w1m, G)

• What rule probs. maximize probs. of sentences?

Find G that maximizes P (w1m|G)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Chomsky Normal Form
• Any CFG grammar can be represented in CNF
where all rules take the form
N i −→ N j N k

N i −→ wj

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
HMMs and PCFGs
• HMMs: distribution over strings of certain length
P
∀n w1n P (w1n) = 1
• PCFGs: distribution over strings of language L
P
w∈L P (w) = 1

high probability in HMM, low probability in PCFG

• Probabilistic Regular Grammar
N i −→ wj N k
N i −→ wj
Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
HMMs and PCFGs

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Inside and Outside Probabilities
• For HMMs we have

Forwards αi (t) = P (w1(t−1), Xt = i)

Backwards βi (t) = P (wtT |Xt = i)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Inside and Outside Probabilities
• For HMMs we have

Forwards αi (t) = P (w1(t−1), Xt = i)

Backwards βi (t) = P (wtT |Xt = i)

• For PCFGs we have

j
Outside αj (p, q) = P (w1(p−1), Npq , w(q+1)m |G)
j
Inside βj (p, q) = P (wpq |Npq , G)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Inside and Outside Probabilities

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Probability of a sentence
j
Outside αj (p, q) = P (w1(p−1), Npq , w(q+1)m |G)
j
Inside βj (p, q) = P (wpq |Npq , G)

P (w1m |G) = β1 (1, m)

X
P (w1m|G) = αj (k, k)P (N j −→ wk )
j
Inside Probabilities j
βj (p, q) = P (wpq |Npq , G)

• Base case βj (k, k) = j

P (wkk |Nkk , G)
= P (N j −→ wk |G)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Inside Probabilities j
βj (p, q) = P (wpq |Npq , G)

• Base case βj (k, k) = j

P (wkk |Nkk , G)
= P (N j −→ wk |G)

• Induction
Want to find βj (p, q) for p < q

Since we assume Chomsky Normal Form,

the first rule must be of the form N j −→ N r N s

So we can divide the sentence in two in

various places and sum the result

q−1
XX
βj (p, q) = P (N j −→ N r N s )βr (p, d)βs (d + 1, q)
r,s d=p

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
CYK Algorithm

astronomers saw stars with ears

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
CYK Algorithm

βNP = 0.1 βV = 1.0 βNP = 0.18 βP = 1.0 βNP = 0.18

? ?
βNP = 0.04 ?
astronomers saw stars with ears

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
CYK Algorithm

βV P = 0.126 βP P = 0.18
? ? ? ?
βNP = 0.1 βV = 1.0 βNP = 0.18 βP = 1.0 βNP = 0.18
βNP = 0.04
astronomers saw stars with ears

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
CYK Algorithm

βS = 0.015876

βV P = 0.015876

βS = 0.0126 βNP = 0.01296

?
βV P = 0.126 βP P = 0.18

βNP = 0.1 βV = 1.0 βNP = 0.18 βP = 1.0 βNP = 0.18

βNP = 0.04
astronomers saw stars with ears

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
CYK Algorithm
Worst case: O(m3r)
m = length of sentence
βS = 0.015876 r = number of rules in grammar
n = number of non‐terminals
If we consider all possible CNF rules: O(m3n3)
βV P = 0.015876

βS = 0.0126 βNP = 0.01296

βV P = 0.126 βP P = 0.18

βNP = 0.1 βV = 1.0 βNP = 0.18 βP = 1.0 βNP = 0.18

βNP = 0.04
astronomers saw stars with ears

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Outside Probabilities
• Compute top‐down (after inside probabilities)
• Base case
α1 (1, m) = 1

αj (1, m) = 0, for j 6= 1

• Induction ⎛ ⎞
m
X X
αj (p, q) = ⎝ αf (p, e)P (N f −→ N j N g )βg (q + 1, e)⎠
f,g e=q+1
⎛ ⎞
X p−1
X
+⎝ αf (e, q)P (N f −→ N g N j )βg (e, p − 1)⎠
f,g e=1

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Probability of a node existing
• As with a HMM, we can form a product of the
inside and outside probabilities.
j
αj (p, q)βj (p, q) = P (w1m , Npq |G)

• Therefore, X
P (w1m , Npq |G) = αj (p, q)βj (p, q)
j

• Just in the cases of the root node and the

preterminals, we know there will be some
such constituent.
Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Training
• If have data Æ count
j C(N j −→ ζ)
P̂ (N −→ ζ) = P j
γ C(N −→ γ)

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Training
• If have data Æ count
j C(N j −→ ζ)
P̂ (N −→ ζ) = P j
γ C(N −→ γ)

• else use EM (Inside‐Outside‐Algorithm)

repeat
compute αj ‘s and βj ‘s
compute P̂ ‘s
P̂ (N j −→ N r N s ) = . . .
P̂ (N j −→ wk ) = . . .
end
two really long formulas with αj ‘s and βj ‘s

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
EM Problems
• Slow: O(m3n3) for each sentence and each
iteration
• Local maxima (Charniak: 300 trials led to 300
different max.)
• In practice, need >3 times more non‐terminals
than are theoretically needed
• No guarantee that learned non‐terminals
correspond to NP, VP, …

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Bracketing helps
Pereira/Schabes ’92:
• Train on sentences:
37% of predicted brackets correct
• Train on sentences + brackets:
90% of predicted brackets correct

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Grammar Induction
• Rules typically selected by linguist
• Automatic induction is very difficult for
context‐free languages
• It is easy to find some form of structure, but
little resemblance to that of linguistics/NLP

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Outline
• PCFGs: Inference and Learning
• Parsing English
• Discriminative CFGs
• Grammar Induction
Parsing for Disambiguation
The post office will hold out discounts and
service concessions as incentives.

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Parsing for Disambiguation
• There are typically many syntactically
possible parses
• Want to find the most likely parses

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Treebanks
• If grammar induction does not work, why not
count expansions in many parse trees?
• Penn Treebank

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
PCFG weaknesses
• No Context
– (immediate prior context, speaker, …)
• No Lexicalization
– “VP NP NP” more likely if verb is “hand” or “tell”
– fail to capture lexical dependencies (n‐grams do)
• No Structural Context
– How NP expands depends on position

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
PCFG weaknesses
Expansion % as Subj % as Obj
NP −→ PRP 13.7% 2.1%
NP −→ NNP 3.5% 0.9%
NP −→ DT NN 5.6% 4.6%
NP −→ NN 1.4% 2.8%
NP −→ NP SBAR 0.5% 2.6%
NP −→ NP PP 5.6% 14.1%
Expansion % as 1st Obj % as 2nd Obj
NP −→ NNS 7.5% 0.2%
NP −→ PRP 13.4% 0.9%
NP −→ NP PP 12.2% 14.4%
NP −→ DT NN 10.4% 13.3%
NP −→ NNP 4.5% 5.9%
NP −→ NN 3.9% 9.2%
NP −→ JJ NN 1.1% 10.4%
NP −→ NP SBAR 0.3% 5.1%
Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Other Approaches
• Challenge: use lexical and structural context,
without too many parameters, sparse data
• Other Grammars
– Probabilistic Left‐Corner Grammars
– Phrase Structure Grammars
– Dependency Grammars
– Probabilistic Tree Substitution Grammars
– History‐based Grammars

Slide based on “Foundations of Statistical Natural Language Processing” by Christopher Manning and Hinrich Schütze
Outline
• PCFGs: Inference and Learning
• Parsing English
• Discriminative CFGs
• Grammar Induction
Generative vs Discriminative
• An HMM (or PCFG) is a generative model
P (y, w)

• Often sufficient is a discriminative model

P (y|w)

• Easier, because does not contain P (w)

• Cannot model dependent features in HMM,
so one only picks one feature: word’s identity
Generative and Discriminative Models
Sequence General Graphs

HMMs Generative
Naïve Bayes
directed models

Conditional Conditional Conditional

Sequence General Graphs

Logistic Regression Linear‐chain CRFs General CRFs

Slide based on “An introduction to Conditional Random Fields for Relational Learning” by Charles Sutton and Andrew McCallum
Generative and Discriminative Models
General
Sequence Tree Graphs

HMMs PCFGs Generative

Naïve Bayes
directed models

Conditional Conditional Conditional

General
Sequence Graphs

Logistic Regression Linear‐chain CRFs General CRFs

Generative and Discriminative Models
General
Sequence Tree Graphs

HMMs PCFGs Generative

Naïve Bayes
directed models

Conditional Conditional Conditional Conditional

General
Sequence Tree Graphs

Logistic Regression Linear‐chain CRFs

?
General CRFs
Discriminative
Context‐Free Grammars
• Terminals w1, w2, . . . , wV

• Nonterminals N 1, N 2, . . . , N n

• Start symbol N1

• Rules N i −→ ζ j
where ζ j is a sequence of
terminals and nonterminals

• Rule scores
F
X
S(N i −→ ζ j , p, q) = λk (N i −→ ζ j )fk (w1 w2 . . . wm , p, q, N i −→ ζ j )
k=1

Slide based on “Learning to Extract Information from Semi‐Structured Text using a Discriminative Context Free Grammar” by Paul Viola and Mukund Narasimhan
Features
F
X
S(N i −→ ζ j , p, q) = λk (N i −→ ζ j )fk (w1 w2 . . . wm , p, q, N i −→ ζ j )
k=1

• Features can depend on all tokens + span

• Consider feature “AllOnTheSameLine”
Mavis Wood Mavis Wood Products
Products

[compare to linear CRF fk (st , tt−1, w1 w2 . . . wm , t) ]

• No independence between features necessary

• Can create features based on words,
dictionaries, digits, capitalization, …
• Can still do efficient Viterbi inference in
O(m3r)

Slide based on “Learning to Extract Information from Semi‐Structured Text using a Discriminative Context Free Grammar” by Paul Viola and Mukund Narasimhan
Example

BizContact −→ BizName Address BizPhone

PersonalContact −→ BizName Address HomePhone

Slide based on “Learning to Extract Information from Semi‐Structured Text using a Discriminative Context Free Grammar” by Paul Viola and Mukund Narasimhan
Example

Slide based on “Learning to Extract Information from Semi‐Structured Text using a Discriminative Context Free Grammar” by Paul Viola and Mukund Narasimhan
Training
• Train feature weight vector for each rule
• Have labels, but not parse trees;
efficiently create trees by ignoring leaves

Slide based on “Learning to Extract Information from Semi‐Structured Text using a Discriminative Context Free Grammar” by Paul Viola and Mukund Narasimhan
Collins’ Averaged Perceptron

Slide based on “Learning to Extract Information from Semi‐Structured Text using a Discriminative Context Free Grammar” by Paul Viola and Mukund Narasimhan
Results

Linear CRF Discriminative CFG Improvement

Word Error Rate 11.57% 6.29% 45.63%

Record Error Rate 54.71% 27.13% 50.41%

Slide based on “Learning to Extract Information from Semi‐Structured Text using a Discriminative Context Free Grammar” by Paul Viola and Mukund Narasimhan
Outline
• PCFGs: Inference and Learning
• Parsing English
• Discriminative CFGs
• Grammar Induction
Gold’s Theorem ‘67
“Any formal language which has hierarchical
structure capable of infinite recursion is
unlearnable from positive evidence alone.”

Slide based on Wikipedia

Empirical Problems
• Even finite search spaces can be too big
• Noise
• Insufficient data
• Many local optima

Slide based on “Unsupervised grammar induction with Minimum Description Length” by Roni Katzir
Common Approach
• Minimize total description length
• Simulated Annealing

Slide based on “Unsupervised grammar induction with Minimum Description Length” by Roni Katzir
random_neighbor(G)
• Insert:

• Delete

• New Rule

• Split

• Substitute

Slide based on “Unsupervised grammar induction with Minimum Description Length” by Roni Katzir
Energy

Define binary representation for G, code(D|G)

Slide based on “Unsupervised grammar induction with Minimum Description Length” by Roni Katzir
Experiment 1
• Word segmentation by 8‐month old infants
• Vocabulary: pabiku, golatu, daropi, tibudo
• Saffran ’96: use speech synthesizer, no word
breaks, 2 minutes = 180 words
• Infants can distinguish words from non‐words
• Now try grammar induction (60 words)

Slide based on “Unsupervised grammar induction with Minimum Description Length” by Roni Katzir
Experiment 1

Slide based on “Unsupervised grammar induction with Minimum Description Length” by Roni Katzir
Experiment 2

• Accurate segmentation
• Inaccurate structural learning
Slide based on “Unsupervised grammar induction with Minimum Description Length” by Roni Katzir
Prototype‐Driven Grammar Induction
• Semi‐supervised approach
• Give only a few dozen prototypical examples
(for NP e.g. determiner noun, pronouns, …)
• On English Penn Treebank: F1 = 65.1
(52% reduction over naïve PCFG induction)
Aria Haghighi and Dan Klein.
Prototype-Driven Grammar Induction.
ACL 2006
Dan Klein and Chris Manning.
A Generative Constituent-Context Model for Improved Grammar Induction.
ACL 2001
That’s it!

PCFG
No ratings yet
PCFG
79 pages
PCFG
No ratings yet
PCFG
31 pages
Statistical Constituency Pars-Ing: C.1 Probabilistic Context-Free Grammars
No ratings yet
Statistical Constituency Pars-Ing: C.1 Probabilistic Context-Free Grammars
21 pages
SCFG PCFG LCFG
No ratings yet
SCFG PCFG LCFG
25 pages
CFG & PCFG
No ratings yet
CFG & PCFG
15 pages
Parsing and Ambiguity in NLP
No ratings yet
Parsing and Ambiguity in NLP
18 pages
Probabilistic Parsing Techniques
No ratings yet
Probabilistic Parsing Techniques
46 pages
Lecture 07
No ratings yet
Lecture 07
35 pages
Constituency Parsing in NLP
No ratings yet
Constituency Parsing in NLP
33 pages
NLP Unit-4
No ratings yet
NLP Unit-4
6 pages
Advanced NLP: CFG Parsing Guide
No ratings yet
Advanced NLP: CFG Parsing Guide
28 pages
NLP Unit-5
No ratings yet
NLP Unit-5
43 pages
Lec15 CL1-f11
No ratings yet
Lec15 CL1-f11
5 pages
NLP Unit 3
No ratings yet
NLP Unit 3
17 pages
NLP Unit-Iii
No ratings yet
NLP Unit-Iii
26 pages
Slp14 Handout s17hw
No ratings yet
Slp14 Handout s17hw
71 pages
Week 3 - Probablistic Context Free Grammars
No ratings yet
Week 3 - Probablistic Context Free Grammars
18 pages
Chapter 9 V 2
No ratings yet
Chapter 9 V 2
18 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
CFG and PCFG
No ratings yet
CFG and PCFG
7 pages
Unit 3
No ratings yet
Unit 3
4 pages
Formal Languages, Automata and Computability
No ratings yet
Formal Languages, Automata and Computability
29 pages
14 Syntax 1
No ratings yet
14 Syntax 1
22 pages
Parameter Estimation For PCFGS: Julia Hockenmaier
No ratings yet
Parameter Estimation For PCFGS: Julia Hockenmaier
90 pages
NLP 2 Internal
No ratings yet
NLP 2 Internal
39 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
Probabilistic Context-Free Grammar
No ratings yet
Probabilistic Context-Free Grammar
13 pages
CFG and Probabilistic Parsing Guide
No ratings yet
CFG and Probabilistic Parsing Guide
45 pages
19 Parsing
No ratings yet
19 Parsing
122 pages
Lexicalized PCFGs for Parsing
No ratings yet
Lexicalized PCFGs for Parsing
22 pages
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
No ratings yet
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
22 pages
Ambiguity A4
No ratings yet
Ambiguity A4
24 pages
NLP Parsing Techniques
No ratings yet
NLP Parsing Techniques
54 pages
Natural Language Processing UNIT 2
No ratings yet
Natural Language Processing UNIT 2
32 pages
Unit 3
No ratings yet
Unit 3
19 pages
Parsing Probabilistic
No ratings yet
Parsing Probabilistic
59 pages
Inducing Tree-Substitution Grammars: Trevor Cohn
No ratings yet
Inducing Tree-Substitution Grammars: Trevor Cohn
44 pages
Adaptor Nips
No ratings yet
Adaptor Nips
8 pages
4.chapter5 - Syntactic and Semantic Representations
No ratings yet
4.chapter5 - Syntactic and Semantic Representations
47 pages
NLP Question
No ratings yet
NLP Question
4 pages
Context Free Grammars
No ratings yet
Context Free Grammars
40 pages
NLP Unit 3 (Part 1)
No ratings yet
NLP Unit 3 (Part 1)
7 pages
The Expectation Maximization (EM) Algorithm: Continued!
No ratings yet
The Expectation Maximization (EM) Algorithm: Continued!
67 pages
CS242 - Module 5
No ratings yet
CS242 - Module 5
42 pages
Final Practice
No ratings yet
Final Practice
12 pages
A Probabilistic Earley Parser As A Psycholinguistic Model 2001 N01-1021
No ratings yet
A Probabilistic Earley Parser As A Psycholinguistic Model 2001 N01-1021
8 pages
Context-Free Languages & Grammars Explained
No ratings yet
Context-Free Languages & Grammars Explained
40 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
Pract Q
No ratings yet
Pract Q
6 pages
Module - 2 - Test Portion
No ratings yet
Module - 2 - Test Portion
33 pages
PCFG Example
No ratings yet
PCFG Example
4 pages
NLP Sem 3 Unit
No ratings yet
NLP Sem 3 Unit
12 pages
Thuật toán NLP
No ratings yet
Thuật toán NLP
57 pages
NLP Parsing Techniques Explained
No ratings yet
NLP Parsing Techniques Explained
11 pages
CFG for Binary Palindromes
No ratings yet
CFG for Binary Palindromes
24 pages
mockExamWS21 With Solution
No ratings yet
mockExamWS21 With Solution
35 pages
13-Shallow Parsing-05-09-2024
No ratings yet
13-Shallow Parsing-05-09-2024
62 pages
UPDATED Kanishq Conference 1
No ratings yet
UPDATED Kanishq Conference 1
8 pages
General Studies I
No ratings yet
General Studies I
4 pages
Essay
No ratings yet
Essay
2 pages
CYK Algorithm
No ratings yet
CYK Algorithm
6 pages
N-Gram Models in NLP Explained
100% (1)
N-Gram Models in NLP Explained
4 pages
Informacion PLM III
No ratings yet
Informacion PLM III
6 pages
POS Solutions for Tech Developers
No ratings yet
POS Solutions for Tech Developers
14 pages
CX Designer
No ratings yet
CX Designer
116 pages
Kirka - Io ? Play On CrazyGames 4
No ratings yet
Kirka - Io ? Play On CrazyGames 4
1 page
Ame - Met 306 - M 2 - V1
No ratings yet
Ame - Met 306 - M 2 - V1
88 pages
Introduction To Communication Lab Manual Using Multisim
No ratings yet
Introduction To Communication Lab Manual Using Multisim
40 pages
Chapter 08
No ratings yet
Chapter 08
22 pages
Gsk983t-h Version Tv3.0ab-8 PLC User Manual
No ratings yet
Gsk983t-h Version Tv3.0ab-8 PLC User Manual
28 pages
MGE Galaxy 300
No ratings yet
MGE Galaxy 300
4 pages
Enterprise IT Asset Management Guide
No ratings yet
Enterprise IT Asset Management Guide
8 pages
CNC GRBL Settings Tuning Guide
No ratings yet
CNC GRBL Settings Tuning Guide
15 pages
Guide To Listening Comprehension: Part A: Dialogs (Short Conversation) Questions Types
No ratings yet
Guide To Listening Comprehension: Part A: Dialogs (Short Conversation) Questions Types
2 pages
Sorenson - Recursive Fading Memory Filtering (1971)
No ratings yet
Sorenson - Recursive Fading Memory Filtering (1971)
19 pages
The Effect of Computer Technology On Academic Achievement Author
No ratings yet
The Effect of Computer Technology On Academic Achievement Author
5 pages
Guidelines in HCI
No ratings yet
Guidelines in HCI
60 pages
Transistor as a Switch Experiment
100% (1)
Transistor as a Switch Experiment
5 pages
651a36d211b4ed7aad6955a7 46184252463
No ratings yet
651a36d211b4ed7aad6955a7 46184252463
2 pages
Difference Between 2-D and 3-D Animation - GeeksforGeeks
No ratings yet
Difference Between 2-D and 3-D Animation - GeeksforGeeks
1 page
Asco LV Ats & PCS
No ratings yet
Asco LV Ats & PCS
59 pages
Manual TCG220
No ratings yet
Manual TCG220
61 pages
XXXXXXXXXXXXXXXXXX: This Manual Was Downloaded From BAMA's Boatanchor Site
No ratings yet
XXXXXXXXXXXXXXXXXX: This Manual Was Downloaded From BAMA's Boatanchor Site
67 pages
Key Questions in Computer Organization
No ratings yet
Key Questions in Computer Organization
3 pages
Networks (Second Edition) Mark Newman Digital Download
No ratings yet
Networks (Second Edition) Mark Newman Digital Download
104 pages
Employment 12th Class Part Notes by Tariq Sir
No ratings yet
Employment 12th Class Part Notes by Tariq Sir
28 pages
Communication Process Explained
No ratings yet
Communication Process Explained
2 pages
Arabic Origins of Cryptology Vol. 1
100% (7)
Arabic Origins of Cryptology Vol. 1
206 pages
Robot: Safety Guide
No ratings yet
Robot: Safety Guide
22 pages
Sample Paper Questions - NLP (Part 1)
No ratings yet
Sample Paper Questions - NLP (Part 1)
7 pages
Calculation Cover Sheet: File Calc No Project Title Client Proj No Phase/CTR
No ratings yet
Calculation Cover Sheet: File Calc No Project Title Client Proj No Phase/CTR
26 pages
9788advancing Into Analytics From Excel To Python and R 1st Edition Mount George Download
100% (2)
9788advancing Into Analytics From Excel To Python and R 1st Edition Mount George Download
62 pages

PCFGs for Linguistics Students

Uploaded by

PCFGs for Linguistics Students

Uploaded by

Tutorial on

• Rule probabilities such that

N i =⇒∗ ζ : repeated derivation from N i gives ζ

where t a parse tree of w1n

• Terminals with, saw, astronomers, ears, stars,

P (t1 ) = 1.0 × 0.1 × 0.7 × 1.0 × 0.4

• What is the probability of a sentence?

• What is the probability of a sentence?

• What is the probability of a sentence?

• What rule probs. maximize probs. of sentences?

high probability in HMM, low probability in PCFG

high probability in HMM, low probability in PCFG

Forwards αi (t) = P (w1(t−1), Xt = i)

Forwards αi (t) = P (w1(t−1), Xt = i)

• For PCFGs we have

P (w1m |G) = β1 (1, m)

• Base case βj (k, k) = j

• Base case βj (k, k) = j

Since we assume Chomsky Normal Form,

So we can divide the sentence in two in

astronomers saw stars with ears

βNP = 0.1 βV = 1.0 βNP = 0.18 βP = 1.0 βNP = 0.18

βS = 0.0126 βNP = 0.01296

βNP = 0.1 βV = 1.0 βNP = 0.18 βP = 1.0 βNP = 0.18

βS = 0.0126 βNP = 0.01296

βNP = 0.1 βV = 1.0 βNP = 0.18 βP = 1.0 βNP = 0.18

• Just in the cases of the root node and the

• else use EM (Inside‐Outside‐Algorithm)

• Often sufficient is a discriminative model

• Easier, because does not contain P (w)

Conditional Conditional Conditional

Sequence General Graphs

Logistic Regression Linear‐chain CRFs General CRFs

HMMs PCFGs Generative

Conditional Conditional Conditional

Logistic Regression Linear‐chain CRFs General CRFs

HMMs PCFGs Generative

Conditional Conditional Conditional Conditional

Logistic Regression Linear‐chain CRFs

• Features can depend on all tokens + span

[compare to linear CRF fk (st , tt−1, w1 w2 . . . wm , t) ]

• No independence between features necessary

BizContact −→ BizName Address BizPhone

Linear CRF Discriminative CFG Improvement

Word Error Rate 11.57% 6.29% 45.63%

Record Error Rate 54.71% 27.13% 50.41%

Slide based on Wikipedia

Define binary representation for G, code(D|G)

You might also like