0% found this document useful (0 votes)
24 views91 pages

Unit 3 - Syntax Level Analysis

The document discusses part-of-speech (POS) tagging, outlining the classification of words into categories such as nouns, verbs, and adjectives, and the processes involved in tagging them. It covers various tagging approaches, including rule-based, stochastic, and transformation-based methods, as well as challenges like ambiguity and unknown words. Additionally, it touches on context-free grammar and parsing techniques used in syntactic analysis.

Uploaded by

manav patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views91 pages

Unit 3 - Syntax Level Analysis

The document discusses part-of-speech (POS) tagging, outlining the classification of words into categories such as nouns, verbs, and adjectives, and the processes involved in tagging them. It covers various tagging approaches, including rule-based, stochastic, and transformation-based methods, as well as challenges like ambiguity and unknown words. Additionally, it touches on context-free grammar and parsing techniques used in syntactic analysis.

Uploaded by

manav patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SYNTAX ANALYSIS

- Dr. Kiran Bhowmick


Unit 2 - Syllabus
Part Of Speech Tagging

• Words are traditionally grouped into equivalence classes called parts of speech (POS; Latin parsorationis),

word classes, morphological classes, or lexical tags


• In traditional grammars there were generally only a few parts of speech (noun, verb, adjective, preposition,

adverb, conjunction, etc.).


• Annotate each word in a sentence with a part-of-speech marker.

• Lowest level of syntactic analysis.

• Useful for subsequent syntactic parsing and word sense disambiguation.

The boy put the keys on the table.


English Parts of Speech
• Noun (person, place or thing) • Adjective (modify nouns)
– Singular (NN): dog, fork – Basic (JJ): red, tall
– Plural (NNS): dogs, forks – Comparative (JJR): redder, taller
– Proper (NNP, NNPS): John, Springfields – Superlative (JJS): reddest, tallest
– Personal pronoun (PRP): I, you, he, she, it
– Wh-pronoun (WP): who, what • Adverb (modify verbs)
– Basic (RB): quickly
• Verb (actions and processes)
– Comparative (RBR): quicker
– Superlative (RBS): quickest
– Base, infinitive (VB): eat
– Past tense (VBD): ate
• Preposition (IN): on, in, by, to, with
– Gerund (VBG): eating
– Past participle (VBN): eaten
• Determiner:
– Non 3rd person singular present tense
– Basic (DT) a, an, the
(VBP): eat – WH-determiner (WDT): which, that
– 3rd person singular present tense: (VBZ):
eats • Coordinating Conjunction (CC):
– Modal (MD): should, can and, but, or,
– To (TO): to (to eat) • Particle (RP): off (took off), up (put
up)

John saw the saw and decided to take it to the table.


NNP VBD DT NN CC VBD TO VB PRP IN DT NN
Closed vs. Open Class

• Closed class categories are composed of a small, fixed set of grammatical function words for a
given language.
• Pronouns, Prepositions, Modals, Determiners, Particles, Conjunctions

• Open class categories have large number of words and new ones are easily invented.
• Nouns (Googler, textlish), Verbs (Google), Adjectives (geeky), Abverb (automagically)

Textlish is the use of text based acronyms such as LOL/"Laugh out Loud" and OMG/"Oh My God"! in spoken
conversation automatically and in a way that seems ingenious, inexplicable, or magical.
Choosing a tagset
• To do POS tagging, a standard set needs to be chosen
• Could pick very coarse tagsets
N, V, Adj, Adv
• Could pick a fine grained tagsets
NN, NNS, VBD, MD,

English POS tagsets

Original Brown corpus used a large set of 87 POS tags.


Most common in NLP today is the Penn Treebank set of 45 tags.
Tagset used in these slides.
Reduced from the Brown set for use in the context of a parsed corpus (i.e.
treebank).
The C5 tagset used for the British National Corpus (BNC) has 61 tags.
"Universal Dependencies" Tagset
Nivre et al. 2016
Penn Treebank POS tags

The grand jury commented on a number of other topics.

The / DT grand / JJ jury / NN commented / VBD on / IN a / DT number / NN of / IN other / JJ


topics / NNS ./.
Ambiguity in POS Tagging
Ambiguous word types in the Brown Corpus
• “Like” can be a verb or a preposition

• I like candy  like/VBP


• Time flies like an arrow.  like/IN

• “Around” can be a preposition, particle, or adverb

• I bought it at the shop around/IN the corner.

• I never got around/RP to getting a car.

• A new Prius costs around/RB $25K.


POS Tagging Process
• Usually assume a separate initial tokenization process that separates and/or disambiguates punctuation,

including detecting sentence boundaries.


• Degree of ambiguity in English (based on Brown corpus)

• 11.5% of word types are ambiguous.

• 40% of word tokens are ambiguous.

• Average POS tagging disagreement amongst expert human judges for the Penn treebank was 3.5%

• Based on correcting the output of an initial automated tagger, which was deemed to be more accurate than tagging from

scratch.

• Baseline: Picking the most frequent tag for each specific word type gives about 90% accuracy

• 93.7% if use model for unknown words for Penn Treebank tagset.
POS Tagging Approaches
• Rule-Based: Human crafted rules based on lexical and other linguistic knowledge.

• Stochastic-Based: stochastic tagger that use simple generalization approach to pick the most likely tag for

a word
• Transformation-Based: uses both rule based as well as stochastic based taggers.
Rule-base POS Tagging Approach
• Assign each word in the input a list of potential POS tags

• Then winnow down this list to a single tag using hand-written rules.

• Steps: e.g.
The can was rusted.
1. Input sentence
2. Refer dictionary / lexicon
• The/DT can/MD/NN was/VBD rusted/VBD.
3. Handwritten rules
4. Output word and tag
• Here can is also NN
• Example:

I want to act in a play Rules:


Here play is given VBD / NN tag. • MD →NN: DT_
• VBD→VBN: VBD_
• Now the tagger will refer to its set of rules and find a perfect rule
that fits this scenario

VBD -> NN: DT_ context sensitive rule


Stochastic-based POS Tagging Approach

• Probability based approach

• Generative (Joint) Models

Generate the observed data from hidden stuff, i.e. put a probability over the
observations given the class: P(d, c) in terms of P(d|c)
e.g. Naïve Bayes’ classifiers, Hidden Markov Models etc.

• Discriminative (Conditional) Models

Take the data as given, and put a probability over hidden structure given the
data: P(c|d)
e.g. Logistic regression, maximum entropy models, conditional random fields
Stochastic-based POS Tagging Approach
• Problem
• We have n words in a corpus and have to find corresponding tags.
• The problem is to assign the tag with maximum probability to the given word.
Transformation-based POS Tagging Approach
• uses both rule based as well as stochastic based taggers

• Like the rule-based taggers, TBL is based on rules that specify what tags should be assigned to what words.

• But like the stochastic taggers, TBL is a machine learning technique, in which rules are automatically induced

from the data.


• is a supervised learning technique; it assumes a pre-tagged training corpus.

• The TBL algorithm has a set of tagging rules.

➢ A corpus is first tagged using the broadest rule, i.e. the one that applies to the most cases.

➢ Then a slightly more specific rule is chosen, which changes some of the original tags.

➢ Next an even narrower rule, which changes a smaller number of tags (some of which might be previously-

changed tags).
Transformation-based POS Tagging Approach
1. Secretariat is expected to race tomorrow
2. People continue to inquire the reason for the race for outer space

• In the Brown corpus, race is most likely to be a noun:

1. Secretariat/NNP is/VBZ expected/VBN to/TO race/NN tomorrow/NN

2. People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN

• Brill’s tagger learned a rule that applies exactly to this mistagging of race
Change NN to VB when the previous tag is TO
Issues in POS Tagging
1. Multiple tags:

▪ Tag indeterminacy arises when a word is ambiguous between multiple tags and it is impossible or very difficult to disambiguate.

▪ In this case, some taggers allow the use of multiple tags. E.g. in the Penn Treebank and in the British National Corpus.

▪ Common tag indeterminacies include adjective versus preterite versus past participle (JJ/VBD/VBN), and adjective versus noun as prenominal modifier

(JJ/NN).

2. Multi-part words:

▪ Another tokenization issue concerns multi-part words. The Treebank tagset assumes that tokenization of words like New York is done at whitespace.

▪ The phrase a New York City firm is tagged in Treebank notation as five separate words:

a/DT New/NNP York/NNP City/NN firm/NN.


▪ The C5 tagset, by contrast, allow prepositions like “in terms of” to be treated as a single word by adding numbers to each tag, as in

in/II31 terms/II32 of/II33


▪ Some tagged corpora split certain words, e.g. the Penn Treebank and the British National Corpus splits contractions and the 's genitive from their

stems:
would/MD n't/RB
children/NNS 's/POS
Issues in POS Tagging
3. Unknown words
▪ All the tagging algorithms require a dictionary that lists the possible parts of speech of every word.

▪ But the largest dictionary will still not contain every possible word.

▪ Proper names and acronyms are created very often, and even new common nouns and verbs enter the language at a surprising rate.

▪ Therefore, to build a complete tagger we need some method for guessing the tag of an unknown word

▪ The most powerful unknown-word algorithms make use of information about how the word is spelled.

▪ For example,

o words that end in the letter-s are likely to be plural nouns (NNS),

o while words ending with-ed tend to be past participles (VBN).

o Words starting with capital letters are likely to be nouns.

▪ Weischedel et al. (1993) used four specific kinds of orthographic features:

o 3 inflectional endings (-ed,-s,-ing),

o 32 derivational endings (such as-ion,-al,-ive, and-ly),

o 4 values of capitalization (capitalized initial, capitalized non initial, etc.), and hyphenation.

▪ They used the following equation to compute the likelihood of an unknown word:
Issues in POS Tagging
4. Part-of-Speech Tagging for Other Languages

▪ part-of-speech tagging algorithms have all been applied to many other languages like Chinese (Chinese TreeBank 5.0), German (NEGRA corpus
)
▪ But several augmentations and changes become necessary when dealing with highly inflected or agglutinative languages (Turkish or Hungarian)
which result in a large vocabulary for these languages.

▪ The large vocabulary size seems to cause a significant degradation in tagging performance when the HMM algorithm is applied directly to
agglutinative languages. However, the performance of these taggers is hugely improved by adding a dictionary which essentially gives a better
model of unknown words
▪ A second, related issue with such languages is the vast amount of information that is coded in the morphology of the word. In English, lots of
information about syntactic function of a word is represented by word order, or neighboring function words. A part-of-speech tagging output for
Turkish or Czech needs to include information about the case and gender of each word in order to be as useful as parts-of-speech without case
or gender are in English
Issues in POS Tagging
5. Combining Taggers

▪ The various part-of-speech tagging algorithms can also be combined.

▪ The most common approach to tagger combination is to run multiple taggers in parallel on the same sentence, and then combine

their output, either by voting or by training another classifier to choose which tagger to trust in each context.
▪ In general, this kind of combination is only useful if the taggers have complementary errors, and so research on combination often

begins by checking to see if the errors are indeed different from different taggers.
▪ Another option is to combine taggers in series
Context-free grammar
Constituency

• The idea: Groups of words may behave as a single unit or phrase, called a constituent.
• E.g. Noun Phrase
• Kermit the frog
• they
• December twenty-sixth
• the reason he is running for president

• Sentences have parts, some of which appear to have subparts. These groupings of words that go together we will call constituents.

• I hit the man with a cleaver


• I hit [the man with a cleaver]
• I hit [the man] with a cleaver

• You could not go to her party


• You [could not] go to her party
• You could [not go] to her party
Constituency Phrases
• For constituents, we usually name them as phrases based on the word that heads the constituent:

• Sometimes words also act as phrases. Eg:


• Joe grew potatoes.
• Joe and potatoes are both nouns and noun phrases.

• Compare with:
• The man from Amherst grew beautiful russet potatoes.
• We say Joe counts as a noun phrase because it appears in a place that a larger noun phrase could have been.
Formal definition of context-free grammar
A context-free grammar G (“is a 4-tuple”) is defined by four parameters N, Σ, R, S
Example of CFG

Parsing is the process of taking


A “derivation” is a sequence of
a string and a grammar and
rules applied to a string that
returning a (or multiple) parse
accounts for that string.
tree(s) for that string
Grammar rules for English
• Sentence-Level Constructions
• Clauses and Sentences
• The Noun Phrase
• Agreement
• The verb phrase and sub categorization
• Auxiliaries
• Coordination
Normal forms for Grammar
• It is sometimes useful to have a normal form for grammars, in which each of the productions takes a particular

form
• A context-free grammar is in Chomsky Normal Form (CNF) if it is ϵ-free and if in addition each production is

either of the form


A → B C or A → a
• The right-hand side of each rule either has two non-terminal symbols or one terminal symbol.

• Chomsky normal form grammars are binary branching, i.e. have binary trees

• Any grammar can be converted into a weakly-equivalent Chomsky normal form grammar.

• For example, a rule of the form

A → BCD
• Can be converted into a CNF as follows:

A → BX
X → CD
Parsing with CFG

Top-down parsing, Bottom-up parsing, Basic top-down parser


Parsing
• Parsing is a process of taking string and grammar and generating all possible parse trees for that string.

• Types of parsing
• Top down parsing (Goal Oriented)
• Bottom up parsing (Data Oriented)

• A parser is sound if every parse it returns is valid/correct.


• A parser terminates if it is guaranteed not to go off into an infinite loop.
• A parser is complete if for any given grammar and sentence it is sound, produces every valid parse for that
sentence, and terminates.
Top-down parser

A top-down parser searches for a parse tree by trying to build from the root node S down to the leaves.

Method:
Initial Trees: Identify all trees that can start with S by checking grammar rules with S on the left hand side.
Grow Trees: Expand trees downward using these rules until they reach the POS categories at the bottom.
Match Input: Reject trees whose leaves do not match the input words.

Sentence - Book that flight

1st iteration

2nd iteration
Sentence - Book that flight

2nd iteration

4th iteration

3rd iteration
Top down parser – example 2

“The dog saw a man in the park”


Use the following grammar rules to create the parse tree:

S → NP VP
NP → Det N | Det N PP | PNoun | N
VP → Verb NP | Verb NP PP
PP → Prep NP
Verb → saw | ate | walked
PNoun → Ramesh | Raj | Anita
Det → a | an | the
N → man | dog | cat | telescope | park
Prep → in | on | by | with
Bottom up parser (data directed)
• The parser starts with the input words and builds trees upward.
• Method:
• Start with words: Begin with the words of the input
• Apply Rules: Build trees by applying grammar rules one at a time.
• Fit Rules: Look for places in the current parse where the RHS of a rule can fit.

Sentence - Book that flight

1st iteration

2nd iteration
Bottom up parser – example 1
Sentence - Book that flight

1st iteration

2nd iteration
4th iteration

3rd iteration
Bottom up parser – example 2
Sentence - “Rahul is eating an apple”
Use the following grammar rules to
create the parse tree:

S → NP VP
NP → Det N | N
VP → Aux VP
VP → V NP
Aux → is
V→ eating
N → Rahul | apple
Det → an
Top-down/Bottom-up Parsing
Top-down (recursive decent parser) Bottom-up (shift-reduce parser)

Starts from S (goal) Words (input)

Algorithm a. Pick non-terminals a. Match sequence of input symbols with the


(Parallel) b. Pick rules from the grammar to expand the RHS of some rule
non-terminals b. Replace the sequence by the LHS of the
matching rule

Termination Success: When the leaves of a tree match the Success: When “S” is reached
input Failure: No more rewrites possible
Failure: No more non-terminals to expand in
any of the trees

Pros/Cons Pro: Goal-driven, starts with “S” Pro: Constrained by the input string
Con: Constructs trees that may not match Con: Constructs constituents that may not
input lead to the goal “S”

• Control strategy -- how to explore search space?


• Pursuing all parses in parallel or backtrack or …?
• Which rule to apply next?
• Which node to expand next?
• Look at how the Top-down and Bottom-up parsing works on the
board for “Book that flight”
Top-down, Depth First, Left-to-Right parser

• Systematic, incremental expansion of the search space.


• In contrast to a parallel parser
• Start State: (•S,0)
• End State: (•,n) n is the length of input to be parsed
• Next State Rules
• (•wj+1b,j)  (•b,j+1)
• (•Bb,j)  (•gb,j) if Bg (note B is left-most non-terminal)
• Agenda: A data structure to keep track of the states to be expanded.
• Depth-first expansion, if Agenda is a stack.
Top-down, Depth First, Left-to-Right parser
• Expands the search space incrementally by systematically exploring one state
at a time.
• Always chooses a state that is most recently generated
• On reaching a tree inconsistent with the input, the search returns to most
recently generated unexplored tree.
• Which rule to apply next? -> first applicable rule from grammar
• Which node to expand next? -> leftmost unexpanded node
• This algorithm maintains an agenda of search-states.
• Each search-state consists of partial trees together with a pointer to the next input word in the
sentence.
Top-down, Depth First, Left-to-Right parser
• The main loop of the parser takes a state from the front of the agenda and
produces a new set of states by applying all the applicable grammar rules to
the left-most unexpanded node of the tree associated with that state.
• This set of new states is then added to the front of the agenda in accordance
with the textual order of the grammar rules that were used to generate them.
• This process continues until either a successful parse tree is found or the
agenda is exhausted indicating that the input can not be parsed.
the node currently being expanded
is shown in a box, while the current
input word is bracketed.
the node currently being expanded
is shown in a box, while the current
input word is bracketed.
Adding bottom up filters

• Current input word must serve as the first word in the derivation of unexpanded node that the parser
is currently processing

• The first word along left edge is called left corner of the tree

• Eg.
A→Bα
Here B is left-corner of A
• For the i/p statement “Does this flight include a meal”? And the following grammar

• Only viable option is S -> Aux NP VP rule


Left Corners
• Can we help top-down parsers with some bottom-up information?
– Unnecessary states created if there are many Bg rules.
– If after successive expansions B * w d; and w does not match the input, then the
series of expansion is wasted.
• Build table with left-corners of all non-terminals in grammar and consult before
applying rule
Category Left Corners
S Det, PropN, Aux, V
NP Det, PropN
Nom N
VP V

• At a given point in state expansion (•Bb,j)


– Pick the rule B C g if left-corner of C matches the input wj+1
Limitation of Top-down Parsing:
1. Left Recursion

• A grammar is left recursive if it contains a non-terminal category that has a derivation that
includes itself anywhere along its leftmost branch.
• Left recursive rules
• A Aβ
• e.g.
• NP  NP PP
• VP  VP NP
• S  NP VP

• Depth-first search will never terminate if grammar is left recursive


• Dealing with left recursion
✓ Rewrite grammar
✓ Explicitly manage the depth of parse tree
• Rewrite the grammar to a weakly equivalent one which is not left-recursive
A Aβ | α
Rewrite to:
A  α A’
A’  β A’ | ε

However, this transformation changes left recursion to right recursion


Fix depth of search explicitly
Limitation of Top-down Parsing:
2. Ambiguity
• Lexical ambiguity (words which may have more than one part of speech)
• and disambiguation (choosing the correct part of speech for a word).
• Structural ambiguity. Structural ambiguity occurs when the grammar assigns more than one possible parse to a sentence
• Structural ambiguity are attachment ambiguity, coordination ambiguity, and noun-phrase bracketing ambiguity.
• Attachment ambiguity: A sentence has an attachment ambiguity if a particular constituent can be attached to the parse
tree at more than one place.
• coordination ambiguity, in which there are different sets of phrases that can be conjoined by a conjunction like and.

• For example, the phrase

• old men and women


• can be bracketed [old [men and women]],
• referring to old men and old women,

• or as [old men] and [women],


• in which case it is only the men who are old.

• local ambiguity.

• Even if a sentence isn’t ambiguous, it can be inefficient to parse due to Local ambiguity.

• It occurs when some part of a sentence is ambiguous, i.e. has more than one parse, even if the whole sentence is not ambiguous.

• For example the sentence Book that flight is unambiguous, but when the parser sees the first word Book, it cannot know if it is a verb or a noun until later.

• Thus it must use backtracking or parallelism to consider both possible parses.


• Repeated Parsing of Subtrees
• The ambiguity problem is related to another inefficiency of the top-down parser.
• The parser often builds valid trees for portions of the input, then discards them during
backtracking, only to find that it has to rebuild them again.
• For example
• a flight from Indianapolis to Houston on TWA

Because of the way the rules are consulted in our top-down,


depth first, left-to-right approach, the parser is led first to
small parse trees that fail because they do not cover all of the
input.

These successive failures trigger backtracking events which


lead to parses that incrementally cover more and
more of the input.
Earley Parser

• Top down parsing


• Table length n+1 , n: no. of words (called : Chart)
• For each word position, chart contains set of states representing all partial parse trees
generated to date.
• E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence
• Chart entries represent three type of constituents:
• subtree corresponding to a single grammar rule,
• Information about the progress made in completing this subtree, and
• the position of the subtree with respect to the input.
• Entries are referred to as states that include structures called Dotted Rules
• A • (dot) on right hand of rule indicates progress made.
• A state’s position with respect to the input will be represented by two numbers
indicating where the state begins and where its dot lies.
Earley Parser

• The first 0 indicates that the constituent predicted by this state should begin at the start of the input; the second 0
reflects the fact that the dot lies at the beginning as well.
• The second state, created at a later stage in the processing of this sentence, indicates that an NP begins at
position 1, that a Det has been successfully parsed and that a Nominal is expected next.
• The third state, with its dot to the right of all its two constituents, represents the successful discovery of a tree
corresponding to a VP that spans the entire input.
• These states can also be represented graphically, in which the states of the parse are
edges, or arcs, and the chart as a whole is a directed acyclic graph,

• 0 Book 1 that 2 flight 3


(0,S  • VP, 0) (predicting VP)
(1,NP  Det • Nom, 2) (finding NP)
(0,VP  V NP •, 3) (found VP)
Earley Parser: Parse Success

• Final answer is found by looking at last entry in chart


Earley Parsing Steps
• Start State: (0, S’ •S, 0)
• End State: (0, Sα•, n) n is the input size
• Next State Rules
• Scanner: read input
➢(i, Aα•wj+1β, j)  (i, Aαwj+1•β, j+1)
• Predictor: add top-down predictions
➢(i, Aα•Bβ, j)  (j, B•γ, j) if Bγ (note B is left-most non-terminal)
• Completer: move dot to right when new constituent found
➢(i, Bα•Aβ, k) (k, Aγ•, j)  (i, BαA•β, j)
• No backtracking and no states removed: keep complete history of parse
• Why is this useful?
Earley Parser Steps
Scanner Predictor Completer

When does it Applied when Applied when non- Applied when dot
apply terminals are to the terminals are to reaches the end of a
right of a dot the right of a dot rule
VP  • Verb NP, S  • VP , [0,0] NP  Det Nom •, [1,3]
[0,0]
What chart cell New states are New states are New states are added
is affected added to the next added to current to current cell
cell cell
What contents Move the dot over One new state for One state for each rule
in the chart the terminal each expansion of “waiting” for the
cell Verb  VP • NP, the non-terminal in constituent such as
[0,1] the grammar VP  V • NP, [0,1]
VP  • V, [0,0] VP  V NP •, [0,3]
VP  • V NP, [0,0]
CFG for Fragment of English

S  NP VP Det  that | this | a


S  Aux NP VP N  book | flight | meal | money
S  VP V  book | include | prefer
NP  Det Nom Aux  does
Nom  N
Nom  N Nom Prep from | to | on
NP PropN PropN  Houston | TWA
VP  V Nom  Nom PP
VP  V NP PP  Prep NP
Book that flight (Chart [0])
• Seed chart with top-down predictions for S from grammar

γ• [0,0] Dummy start state


S  • NP VP [0,0] Predictor
S  • Aux NP VP [0,0] Predictor
S  • VP [0,0] Predictor
NP  • Det Nom [0,0] Predictor
NP  • PropN [0,0] Predictor
VP  • V [0,0] Predictor
VP  • V NP [0,0] Predictor
Chart[1]

V book • [0,1] Scanner


VP  V • [0,1] Completer
VP  V • NP [0,1] Completer
S  VP • [0,1] Completer
NP  • Det Nom [1,1] Predictor
NP  • PropN [1,1] Predictor

V book • passed to Completer, which finds 2


states in Chart[0] whose left corner is V and
adds them to Chart[1], moving dots to right
Retrieving the parses
• Augment the Completer to add pointer to prior states it advances as a field in the
current state
• i.e. what states combined to arrive here?
• Read the pointers back from the final state
• What if the final cell does not have the final state? – Error handling.
• Is it a total loss? No...
• Chart contains every constituent and combination of constituents possible for
the input given the grammar
• Useful for partial parsing or shallow parsing used in information extraction
Augment the Completer to add pointer to
prior states it advances as a field in the
current state
• i.e. what states combined to arrive here?
• Read the pointers back from the final
state
Alternative Control Strategies
• Change Earley top-down strategy to bottom-up or ...
• Change to best-first strategy based on the probabilities of constituents
• Compute and store probabilities of constituents in the chart as you parse
• Then instead of expanding states in fixed order, allow probabilities to control order of expansion
Probabilistic CFGs
• A probabilistic CFG is a 5-tuple grammar G = (N, ∑, P, S, D)
• Where
1. a set of nonterminal symbols (or ‘variables’) N
2. a set of terminal symbols ∑ (disjoint from N)
3. a set of productions P of the form Aβ [p]
4. a designated start symbol S
5. D is a function assigning probabilities to each rule in P. This function expresses the
probability p that the given nonterminal A will be expanded to the sequence β

This function expresses the proba bility p that the given


nonterminal A will be expanded to the sequence β
Or
Formally this is conditional probability of a given expansion
given the left-hand-size nonterminal A
PCFG example

For all the possible expansions of a nonterminal, the sum of their probabilities
must be 1
How to use the probabilities

• The probability of a particular parse T is defined as the product of the probabilities of all the rules r
used to expand each node n in the parse tree:

The resulting probability P(T|S) is both the joint probability of the parse and the sentence, and also the
probability of the parse P(T).

How can this be true?

First, by the definition of joint probability:

P(T, S) = P(T) . P(S|T)

But since a parse tree includes all the words of the sentence, P(S|T) is 1. Thus:

P(T, S) = P(T) . P(S|T) = P(T)

The probability for all different parse trees is computed and the algorithm picks the best tree for a sentence
S out of the set of parse trees for S.
We want the parse tree T which is most likely given the sentence S
• By definition, the probability P(T|S) = P(T, S)/P(S), thus

• For the same sentence P(S) is constant, therefore

• Furthermore, since we showed above that P(T, S) = P(T), the final equation for
choosing the most likely parse simplifies to choosing the parse with the highest
probability
• We can see that the right tree in Figure 12.2(b) has a higher probability.
• Thus this parse would correctly be chosen by a disambiguation
algorithm which selects the parse with the highest PCFG probability.
Cocke-Younger-Kasami Parser
• Bottom-up parser with top-down filtering
• Start State(s): (A, i, i+1) for each Awi+1
• End State: (S, 0,n) n is the input size
• Next State Rules
• (B, i, k) (C, k, j)  (A, i, j) if ABC
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

We begin by finding a single phrase match in the grammar rules.


We look for all the words in the sentence one at a time and fill the corresponding table entry
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det

4
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det

1 N

4
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det

1 N

2 V

3 Det

4 N

Now, we have filled the single word entries and move ahead with the two word entries e.g "the
price"
We look for a rule that has Det N on the right hand and find it in rule 2. So we make a note of it
in cell (0,1)
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP

1 N

2 V

3 Det

4 N

Now, we have filled the single word entries and move ahead with the two word entries e.g "the
price"
We look for a rule that has Det N on the right hand and find it in rule 2. So we make a note of it
in cell (0,1)
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP

1 N X

2 V X

3 Det NP

4 N

We then look for a N V match, however we don’t get any


Moving ahead, we look for V Det, again no match
Then Det N, and like earlier get a match
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP

1 N X

2 V X

3 Det NP

4 N

Moving on to "the price includes"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP

1 N X

2 V X

3 Det NP

4 N

Moving on to "the price includes"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X

2 V X

3 Det NP

4 N

Moving on to "the price includes"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X

2 V X

3 Det NP

4 N

Moving on to "the price includes a"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X

2 V X

3 Det NP

4 N

Moving on to "the price includes a"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X

2 V X

3 Det NP

4 N

Moving on to "the price includes a"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X X

2 V X

3 Det NP

4 N

Moving on to "the price includes a"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X X

2 V X

3 Det NP

4 N

Moving on to last three constituents "includes a facemask"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X X

2 V X

3 Det NP

4 N

Moving on to last three constituents "includes a facemask"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X X

2 V X VP

3 Det NP

4 N

Moving on to last three constituents "includes a facemask"


We find a match for V NP
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X

1 N X X

2 V X VP

3 Det NP

4 N

"the price includes a" all the highlighted options are impossible
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X X

1 N X X

2 V X VP

3 Det NP

4 N

"the price includes a" all the highlighted options are impossible
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X X

1 N X X

2 V X VP

3 Det NP

4 N

"the price includes a facemask"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X X

1 N X X

2 V X VP

3 Det NP

4 N

"the price includes a facemask" all highlighted matches are impossible


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X X

1 N X X X

2 V X VP

3 Det NP

4 N

"the price includes a facemask"


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X X

1 N X X X

2 V X VP

3 Det NP

4 N

For the entire input , yellow shows impossible match


The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det NP X X S

1 N X X X

2 V X VP

3 Det NP

4 N

For the entire input , we get a match and calculate the probabilities
The price includes a facemask
1-the 2-price 3-includes 4-a 5-facemask

0 Det (0.40) NP (0.3 * X X S (0.8 * 0.0012 *


0.4 * 0.01 = 0.000024 =
0.0012) 2.304*10e-8

1 N X X X
(0.01)

2 V X VP (0.2 * 0.05 *
(0.05) 0.0024 =
0.000024)

3 Det NP (0.3 * 0.4


(0.40) * 0.02 =
0.0024)

4 N
(0.02)

For the entire input , we get a match and calculate the probabilities
a pilot likes flying planes

S → NP VP a pilot likes flying planes


VP → VBG NNS
VP → VBZ VP
0 Det NP X X S1, S2
VP → VBZ NP
NP → DT NN
NP → JJ NNS
DT → a 1 NN X X X
NN → pilot
VBZ → likes 2 VBZ X VP1, VP2
VBG → flying
JJ → flying
NNS → planes
3 VBG, JJ VP, NP

4 NNS

You might also like