0% found this document useful (0 votes)
3 views171 pages

Module 3

Module 3 discusses transformational grammar, emphasizing how sentences can be altered while maintaining meaning, and addresses syntactic ambiguities through parsing techniques. It introduces context-free grammar (CFG) and the Cocke-Kasami-Younger (CKY) algorithm for parsing, highlighting the importance of dependency parsing and its advantages in handling languages with flexible word orders. Additionally, the module covers shallow parsing, chunking, and the use of Conditional Random Fields (CRFs) for sequence labeling in natural language processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views171 pages

Module 3

Module 3 discusses transformational grammar, emphasizing how sentences can be altered while maintaining meaning, and addresses syntactic ambiguities through parsing techniques. It introduces context-free grammar (CFG) and the Cocke-Kasami-Younger (CKY) algorithm for parsing, highlighting the importance of dependency parsing and its advantages in handling languages with flexible word orders. Additionally, the module covers shallow parsing, chunking, and the use of Conditional Random Fields (CRFs) for sequence labeling in natural language processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MODULE 3

Transformational

• Transformational breaking up involves


examining how a sentence can be transformed
or altered to convey the same meaning.

• It focuses on the underlying transformations that


can be applied to a sentence while preserving
its semantics.
Ambiguities in syntax analysis

The syntactic analysis involves analyzing the


grammatical syntax of a sentence to understand its
meaning.

[Link]: "I saw the man with the telescope" is


syntactically ambiguous.

It could mean either "I used a telescope to see the


man" or "I saw the man who had a telescope."
Parsing
• Parsing is the process of taking a string and a
grammar and returning parse tree(s) for that
string

15-Aug-19
Rules
Noun Phrase
Rules
Verb Phrase
Rules
Adjective Phrase

AdjP -> Deg + Adj + PP


quite happy with the result

Deg: Degree adverb(optional)


Adj: Adjective
PP : Preposition Phase(optional)
Rules

Adverb Phrase
AdvP - > Deg + Adv
Rules

Prepositional Phrase
PP - > Deg + NP / P +NP
Syntactic structure
Parsing
• Parsing with CFGs refers to the task of
assigning proper trees to input strings
• Proper: a tree that covers all and only the
elements of the input and has an S at the top

Speech and Language


15-Aug-19 Processing - Jurafsky and Martin 4
Search Framework
• Think about parsing as a form of search…
– A search through the space of possible trees
given an input sentence and grammar

Speech and Language


15-Aug-19 Processing - Jurafsky and Martin 6
How to parse
• Top-down: Start at the top of the tree with an S
node, and work your way down to the words.

• Bottom-up: Look for small pieces that you know


how to assemble, and work your way up to larger
pieces.
Top-Down Search
• Builds from the root S node to the leaves
• Expectation-based
• Common top-down search strategy
– Top-down, left-to-right, with
backtracking
– Try first rule s.t. LHS is S
– Next expand all constituents on RHS
– Iterate until all leaves are POS
– Backtrack when candidate POS does
not match POS of current word in
input string

Speech and Language


15-Aug-19 Processing - Jurafsky and Martin 8
Bottom-Up Parsing
• Of course, we also want trees that cover the
input words. So we might also start with trees
that link up with the words in the right way.
• Then work your way up from there to larger
and larger trees.

Speech and Language


15-Aug-19 Processing - Jurafsky and Martin 10
Context Free Grammer
• A context-free grammar consists of a set of rules or productions, each of which
expresses the ways that symbols of the language can be grouped and ordered
together, and a lexicon of words and symbols.

[Source - Speech and Language Processing: An introduction to natural language


processing, computational linguistics, and speech recognition by Daniel Jurafsky and
James H. Martin]
Context Free Grammer
Context Free Grammer
The child ate a cake
Syntactic Ambiguity
Syntactic Ambiguity / Structural Ambiguity:

• Attachment Ambiguity:
• A particular constraint can be attached at more than one
place in a parse tree.
Eg: PP-attachment ambiguity

• Coordination Ambiguity:
• Phrases conjoined by conjunction (eg: and) cause ambiguity
• Eg: Old men and women ran down the theatre aisle.
• [old [men and women]] OR [old men] and [women]
I shot an elephant in my pajamas

We saw Eiffel Tower flying to Paris


Ambiguity in CFG
Produce CFG
The spy saw the cop with the
binoculars
The spy saw the cop with the
binoculars
CKY Parsing: A Dynamic
Programming Approach
• Dynamic programming provides a powerful
framework for addressing the problems caused
by ambiguity in grammar

• Dynamic programming presents Cocke-Kasami


Younger (CKY) algorithm

• The CKY algorithm requires grammars to first


be in Chomsky Normal Form (CNF).
CKY Parsing: Conversion to CNF
• grammars in CNF are restricted to rules of the
form:
• right-hand side of each rule must expand either
to two non-terminals: A → B C
Example:

• right-hand side of each rule must expand a


single terminal A → w
• Example: CNF
CFG

where
CKY Parsing
CKY Parsing: Conversion to CNF
CKY Parsing: Conversion to CNF
CKY Algorithm
0 1 2 3 4 5
1 2 3 4 5
0

4
Span-Based Neural Constituency
Parsing / Neural CKY
How to get the best parse tree?
Span Score computation of ‘the flight’ span with the label NP
Neural CKY
Split encoder output yt into two halves: and

Span Vector:

Span Vector Score:


Individual Parse Tree:

Individual Parse Tree Score:

Best Parse Tree:


Parser Evaluation
Labelled Precision:
# of correct constituents among the total hypothesis parse constituents.
Labelled Recall:
# of correct hypothesis parse constituents among the total reference parse
constituents.
Shallow parsing
• Many language processing tasks do not require
complex, complete parse trees for all partial
parse inputs.

• For these tasks, a partial parse, or shallow


parse, of input sentences may be shallow parse
sufficient.

• For example, information extraction systems


generally do not extract all the possible
information from a text: they simply identify and
classify the segments in a text that are likely to
contain valuable information
Shallow parsing
• One kind of partial parsing is known as chunking.

• Chunking is the process of identifying and


classifying the flat, non-overlapping segments of a
sentence that constitute the basic non-recursive
phrases corresponding to the major content-word
parts-of-speech: noun phrases, verb phrases,
adjective phrases, and prepositional phrases.

• The task of finding all the base noun phrases in a


text is particularly common.
Shallow parsing
[NP The morning flight] [PP from] [NP Denver] [VP
has arrived.]

• Bracketing notation: segmenting (finding the


non-overlapping extents of the chunks) and
labeling (assigning the correct tag to the
discovered chunks).

• Some input words may not be part of any


chunk, particularly in tasks like base NP:

• [NP The morning flight] from [NP Denver] has


arrived.
Shallow parsing- Chunking
Algorithms
Chunking is generally done via supervised learning, training a BIO
sequence labeler
B- Beginning
I- inside of each chunk type
O- tokens outside any chunk.
Conditional Random Field
• While Hidden Markov Models (HMMs) are
powerful for sequence modeling, incorporating
arbitrary features, such as capitalization,
morphology, or contextual information, proves
challenging due to the inherent limitations of
generative models.

• Challenges in arbitrary features(capitalization or


new proper nouns)
Conditional Random Field
• A Conditional Random Field (CRF) is a type of
probabilistic graphical model used in machine learning,
particularly for sequence labeling and structured
prediction tasks.

• It models the conditional probability distribution over a


set of random variables, where each variable
represents an element in a sequence, given the values
of other variables in the sequence.

• CRFs are often employed in natural language


processing tasks like named entity recognition, part-of-
speech tagging, and syntactic parsing, where the
labeling of each element is dependent on the context of
the entire sequence.
Conditional Random Field
• X and Y as the input and output sequences.

• A CRF is a log-linear model that assigns a probability to


an entire output (tag) sequence Y, out of all possible
sequences Y’, given the entire input (word) sequence
X.
K functions Fk(X,Y) global features

weight wk
Conditional Random Field

We compute them by decomposing into a sum of local features for each


position i in Y

fk local features in a linear-chain CRF is allowed to make use of the current


-
output token yi , the previous output token yi−1, the entire input string X (or any
subpart of it), and the current position i.
Features in a CRF POS Tagger

• we’ll assume all CRF features take on the value 1


or 0
• 1{x} to mean “1 if x is true, and 0 otherwise”.
• The features to be used can be decided by the
system designer
• Example feature template
Features in a CRF POS Tagger

Feature Template:

Janet/NNP will/MD back/VB the/DT bill/NN

xi is the word back, the following features would be generated and have the
value 1
Features in a CRF POS Tagger –
word shape
• Unknown words
• word shape features: which represent the
abstract letter pattern of the word by mapping
lower-case letters to ‘x’, upper-case to ‘X’,
numbers to ’d’, and retaining punctuation.
• Ex: I.M.F would map to X.X.X.
DC10-30 would map to XXdd-dd
Features for CRF Named Entity
Recognizers

• One feature that is especially useful for locations is a gazetteer, a list of


place names, often providing millions of entries for locations with detailed
geographical and political information.
• This can be implemented as a binary feature indicating a phrase appears in
the list.
• Other related resources like name-lists, for example from the United States
Census Bureau4 , can be used, as can other entity dictionaries like lists of
corporations or products, although they may not be as helpful as a
gazetteer
Features for CRF Named Entity
Recognizers
The sample named entity token L’Occitane would
generate the following nonzero valued feature
values (assuming that L’Occitane is neither in the
gazetteer nor the census).
Inference and Training for CRFs
Dependency Parsing
• In dependency grammar, phrasal constituents
and phrase-structure rules do not play a direct
role.

• Instead, the syntactic structure of a sentence is


described solely in terms of the words (or
lemmas) in a sentence and an associated set of
directed binary grammatical relations that hold
among the word
Dependency Parsing
Typed dependency structure : the labels are
drawn from a fixed inventory of grammatical
relations
Dependency Parsing
• A major advantage of dependency grammars is
their ability to deal with languages that are
morphologically rich and have a relatively free
word order.
• For example, word order in Czech can be much
more flexible than in English; a grammatical
object might occur before or after a location
adverbial. A phrase-structure grammar would
need a separate rule for each possible place in
the parse tree where such an adverbial phrase
could occur.
Dependency Parsing
• head-dependent relations provide an
approximation to the semantic relationship
between predicates and their arguments that
makes them directly useful for many
applications such as coreference resolution,
question answering and information extraction
Dependency Parsing – Dependency
relations
• The arguments to these relations consist of a head and
a dependent.

• the head word of a constituent was the central


organizing word of a larger constituent (verb in a verb
phrase).

• The remaining words in the constituent are either direct,


or indirect, dependents of their head.
• In dependency-based approaches, the head-dependent
relationship is made explicit by directly linking heads to
the words that are immediately dependent on them,
bypassing the need for constituent structures.
Dependency Parsing – Dependency
relations
nsubj: nominal subject A nominal subject is a noun
phrase which is the syntactic subject of a clause.
The baby is cute” nsubj(cute, baby)

dobj:The direct object of a VP is the noun phrase


which is the (accusative) object of the verb.
Eg: “She gave me a raise” dobj(gave, raise)

iobj:
The indirect object of a VP is the noun phrase which
is the (dative) object of the verb.
“She gave me a raise” iobj(gave, me)
ccomp:A clausal complement of a verb or adjective is a dependent clause
which is a core argument. That is, it functions like an object of the verb, or
adjective.
eg:He says that you like to swim
ccomp(says, like)

xcomp: open clausal complement


An open clausal complement (xcomp) of a verb or an adjective is a
predicative or clausal complement without its own subject.
eg: He says that you like to swim” xcomp(like, swim)

conj: conjunct
A conjunct is the relation between two elements connected by a
coordinating conjunction, such as “and”, “or”, etc. We treat conjunctions
asymmetrically: The head of the relation is the first conjunct and other
conjunctions depend on it via the conj relation.
Eg: “Bill is big and honest” conj(big, honest)
cc: coordination
A coordination is the relation between an element of a conjunct and the
coordinating conjunction word of the conjunct. (Note: different
dependency grammars have different treatments of coordination. We take
one conjunct of a conjunction (normally the first) as the head of the
conjunction.) A conjunction may also appear at the beginning of a
sentence. This is also called a cc, and dependent on the root predicate of
the sentence.
“Bill is big and honest” cc(big, and)
aux: auxiliary
An auxiliary of a clause is a non-main verb of the clause, e.g., a modal
auxiliary, or a form of “be”, “do” or “have” in a periphrastic tense.
Reagan has died eg has, died

nmod: nominal modifier


The nmod relation is used for nominal modifiers. They depend either on
another noun (group “noun dependents”) or on a predicate (group “non-
core dependents of clausal predicates”).
amod: adjectival modifier
An adjectival modifier of a noun (or pronoun) is any adjectival phrase
that serves to modify the noun (or pronoun). The relation applies
whether the meaning of the noun is modified in a compositional way
(e.g., large house) or an idiomatic way (hot dogs)

nummod: numeric modifier


A numeric modifier of a noun is any number phrase that serves to
modify the meaning of the noun with a quantity.
eg 3 sheep

appos: appositional modifier


An appositional modifier of an NP is an NP immediately to the right of
the first NP that serves to define or modify that NP. It includes
parenthesized examples, as well as defining abbreviations in one of
these structures.
case: case marking
The case relation is used for any case-marking
element which is treated as a separate syntactic
word (including prepositions, postpositions, and
clitic case markers).
Dependency tree is a directed graph that satisfies
the following constraints:
1. There is a single designated root node that has
no incoming arcs.
2. With the exception of the root node, each
vertex has exactly one incoming arc.

3. There is a unique path from the root node to


each vertex in V.
Projectivity
• A dependency tree is projective if it can be
drawn with no crossing edges.
Dependency tree
Four possible transitions from a configuration
[Link]
3s

You might also like