0% found this document useful (0 votes)

14 views35 pages

Discourse Structure Parsing

The document discusses discourse structure parsing and coherence relations, highlighting the importance of Rhetorical Structure Theory (RST) and the Penn Discourse TreeBank (PDTB) in analyzing coherence between text spans. It explains how coherence relations can be represented graphically and how discourse parsing can be automated through techniques like EDU segmentation and neural syntactic parsing. Additionally, it introduces Centering Theory, which emphasizes the salience of entities in discourse for maintaining coherence.

Uploaded by

hetrathod206

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views35 pages

Discourse Structure Parsing

Uploaded by

hetrathod206

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Discourse Structure

parsing
Coherence Relations
• Take the following example

• Jane took a train from Paris to Istanbul. She likes spinach.

• Jane took a train from Paris to Istanbul. She had to attend a conference.

• The reason in the second example is more coherent ,the reader can form a connection
between the two sentences, in which the second sentence provides a potential REASON for
the first sentences.

• This link is harder to form for example 1.These connections between text spans in a
discourse can be specified as a set of coherence relations.

• There are two commonly used models of coherence relations and associated corpora:
• Rhetorical Structure Theory (RST),
Rhetorical Structure Theory

• The most commonly used model of discourse organization is Rhetorical Structure

Theory (RST) (Mann and Thompson, 1987).

• In RST relations are defined between two spans of text, generally a nucleus and a
satellite.

• The nucleus is the unit that satellite is more central to the writer’s purpose and that
is interpretable independently;

• the satellite is less central and generally is only interpretable with respect to the
nucleus. Some symmetric relations, however, hold between two nuclei.

• Below are a few examples of RST coherence relations, with definitions adapted from
the RST Treebank Manual
• Reason: The nucleus is an action carried out by an animate agent and the
satellite is the reason for the nucleus.
• [NUC Jane took a train from Paris to Istanbul.] [SAT She had to attend a
conference.]

• Elaboration: The satellite gives additional information or detail about the

situation presented in the nucleus.
• [NUC Dorothy was from Kansas.] [SAT She lived in the midst of the great Kansas
prairies.]

• Evidence: The satellite gives additional information or detail about the situation
presented in the nucleus. The information is presented with the goal of convince
the reader to accept the information presented in the nucleus.
• [NUC Kevin must be here.] [SAT His car is parked outside.]
• Attribution: The satellite gives the source of attribution for an instance of
reported speech in the nucleus.
• [SAT Analysts estimated] [NUC that sales at U.S. stores declined in the quarter,
too]

• List: In this multinuclear relation, a series of nuclei is given, without contrast

or explicit comparison:
• [NUC Billy Bones was the mate; ] [NUC Long John, he was quartermaster]
Graphical representation of RST
relations
• RST relations are traditionally represented graphically;

• the asymmetric Nucleus Satellite relation is represented with an

arrow from the satellite to the nucleus:
Complete Discourse example

• We can also consider the coherence of a larger text by the hierarchical

structure between coherence relations.

• Figure below shows the rhetorical structure of a paragraph from Marcu (2000a)
for the text in from the Scientific American magazine

Example Paragraph:
With its distant orbit–50 percent farther from the sun than Earth–and slim atmospheric blanket, Mars experiences frigid
weather conditions. Surface temperatures typically average about -60 degrees Celsius (-76 degrees Fahrenheit) at the
equator and can dip to -123 degrees C near the poles. Only the midday sun at tropical latitudes is warm enough to thaw
ice on occasion, but any liquid water formed in this way would evaporate almost instantly because of the low
atmospheric pressure.
• The leaves in the Fig. above tree correspond to text spans of a sentence, clause
or EDU phrase that are called elementary discourse units or EDUs in RST; these
units can also be referred to as discourse segments.

• Because these units may correspond to arbitrary spans of text, determining the
boundaries of an EDU is an important task for extracting coherence relations.

• Roughly speaking, one can think of discourse segments as being analogous to

constituents in sentence syntax,
• There are corpora for many discourse coherence models; the RST Discourse
TreeBank (Carlson et al., 2001) is the largest available discourse corpus.

• It consists of 385 English language documents selected from the Penn

Treebank, with full RST parses for each one, using a large set of 78 distinct
relations, grouped into 16 classes.

• RST treebanks exist also for Spanish, German, Basque, Dutch and Brazilian
Portuguese
Penn Discourse TreeBank (PDTB
• The Penn Discourse TreeBank (PDTB) is a second commonly used
dataset that embodies another model of coherence relations

• PDTB labeling is lexically grounded.

• Instead of asking annotators to directly tag the coherence relation

between text spans, they were given a list of discourse connectives,
words that signal discourse relations, like because, discourse
connectives although, when, since, or as a result.
• In a part of a text where these words marked a coherence relation between two
text spans, the connective and the spans were then annotated, as shown below

• Jewelry displays in department stores were often cluttered and uninspired. And
the merchandise was, well, fake. As a result, marketers of faux gems steadily
lost space in department stores to more fashionable rivals—cosmetics makers.

• In July, the Environmental Protection Agency imposed a gradual ban on virtually

all uses of asbestos. (implicit=as a result) By 1997, almost all remaining uses of
cancer-causing asbestos will be outlawed.

• where the phrase as a result signals a causal relationship between what PDTB
calls Arg1 (the first two sentences, here in italics) and Arg2 (the third sentence,
here in bold).
• Not all coherence relations are marked by an explicit discourse connective,
and so the PDTB also annotates pairs of neighboring sentences with no
explicit signal, like example 2 .

• The annotator first chooses the word or phrase that could have been its signal
(in this case as a result), and then labels its sense.

• For example for the ambiguous discourse connective since annotators marked
whether it is using a CAUSAL or a TEMPORAL sense.
The four high-level semantic
distinctions in the PDTB sense
hierarchy
Discourse Structure Parsing
• Given a sequence of sentences, how can we automatically determine
the coherence relations between them?

• This task is often called discourse parsing (even though discourse

parsing for PDTB we are only assigning labels to leaf spans and not
building a full parse tree as we do for RST)
EDU segmentation for RST parsing
• RST parsing is generally done in two stages. The first stage, EDU
segmentation, extracts the start and end of each EDU. The output of
this stage would be a labeling like the following: (22.16) [Mr. Rambo
says]e1 [that a 3.2-acre property]e2 [overlooking the San Fernando
Valley]e3 [is priced at $4 million]e4 [because the late actor Erroll
Flynn once lived there.]e5
RST parsing Tools for building
RST
• coherence structure for a discourse have long been based on syntactic
parsing algorithms like shift-reduce parsing (Marcu, 1999).

• Many modern RST parsers are build on the neural syntactic parsers
which we have seen in previous Chapters

• using representation learning to build representations for each span,

and training a parser to choose the correct shift and reduce actions
based on the gold parses in the training set.
• The parser state consists of a stack and a queue, and produces this
structure by taking a series of actions on the states. Actions include:

• shift: pushes the first EDU in the queue onto the stack creating a single-
node subtree.

• reduce(l,d): merges the top two subtrees on the stack, where l is the
coherence relation label, and d is the nuclearity direction, d ∈ {NN,NS,SN}.

• As well as the pop root operation, to remove the final tree from the stack
Example RST discourse tree
showing four EDUs:
• Fig. below shows the actions the parser takes to build the structure in
Fig. above
Centering Theory
• At any point in the discourse, one of the entities in the discourse
model is salient (being “centered” on)

• Discourses in which adjacent sentences continue to maintain the

same salient entity are more coherent than those which shift back and
forth between multiple entities
Centering Theory: Intuition
• The following two texts from Grosz et al. (1995) which have exactly the same
propositional content but different saliences, can help in understanding the main
Centering intuition.
• a. John went to his favorite music store to buy a piano.
• b. He had frequented the store for many years.
• c. He was excited that he could finally buy a piano.
• d. He arrived just as the store was closing for the day.

• a. John went to his favorite music store to buy a piano.

• b. It was a store John had frequented for many years.
• c. He was excited that he could finally buy a piano.
• d. It was closing just as John arrived.
Centering Theory: Intuition
• While these two texts differ only in how the two entities (John and
the store) are realized in the sentences, the discourse in First example
is intuitively more coherent than the one in second example .
How does Centering Theory
realize this intuition?

Coherence Relationship
No ratings yet
Coherence Relationship
27 pages
Carlson 2001
No ratings yet
Carlson 2001
10 pages
RST Signalling Corpus Annotation Manual
No ratings yet
RST Signalling Corpus Annotation Manual
35 pages
A Novel Dependency Framework For Enhancing DiscourSE
No ratings yet
A Novel Dependency Framework For Enhancing DiscourSE
29 pages
Rhetorical Structure in Dialog
No ratings yet
Rhetorical Structure in Dialog
6 pages
Joint Syntacto-Discourse Parsing and The Syntacto-Discourse Treebank
No ratings yet
Joint Syntacto-Discourse Parsing and The Syntacto-Discourse Treebank
5 pages
Understanding Rhetorical Structure Theory
No ratings yet
Understanding Rhetorical Structure Theory
4 pages
Introduction to Rhetorical Structure Theory
No ratings yet
Introduction to Rhetorical Structure Theory
19 pages
Kobayashi, N., Et Al. (2020) - Top-Down RST Parsing Utilizing Granularity Levels in Documents. AAAI
No ratings yet
Kobayashi, N., Et Al. (2020) - Top-Down RST Parsing Utilizing Granularity Levels in Documents. AAAI
8 pages
Ji and Smith - 2017 - Neural Discourse Structure For Text Categorization
No ratings yet
Ji and Smith - 2017 - Neural Discourse Structure For Text Categorization
10 pages
Naacl2015 Discourse
No ratings yet
Naacl2015 Discourse
5 pages
Webber, B., Stone, M., Joshi, A., & Knott, A. (2003) - Anaphora and Discourse Structure. Computational Linguistics, 29 (4), 545-587.
No ratings yet
Webber, B., Stone, M., Joshi, A., & Knott, A. (2003) - Anaphora and Discourse Structure. Computational Linguistics, 29 (4), 545-587.
43 pages
Discourse Coherence
No ratings yet
Discourse Coherence
25 pages
23 Jurafsky
No ratings yet
23 Jurafsky
25 pages
Coherence 1
No ratings yet
Coherence 1
26 pages
NLP Discourse Analysis Guide
No ratings yet
NLP Discourse Analysis Guide
66 pages
NLP UNIT 5 Part A
No ratings yet
NLP UNIT 5 Part A
40 pages
Mann Thompson 1988
No ratings yet
Mann Thompson 1988
20 pages
Linguistic Signals in Discourse
No ratings yet
Linguistic Signals in Discourse
35 pages
Explicit Connectives - Implicit Relations
No ratings yet
Explicit Connectives - Implicit Relations
3 pages
Rhetorical Structure Theory: Toward A Functional Theory of Text Organization
No ratings yet
Rhetorical Structure Theory: Toward A Functional Theory of Text Organization
39 pages
Unit 5
No ratings yet
Unit 5
13 pages
Automatic Slide Generation Based On Discourse Structure Analysis
No ratings yet
Automatic Slide Generation Based On Discourse Structure Analysis
13 pages
Pragmatics of Discourse Coherence
No ratings yet
Pragmatics of Discourse Coherence
16 pages
Discourse Analysis and Reference Resolution
100% (2)
Discourse Analysis and Reference Resolution
14 pages
PDTB3 Annotation Manual
No ratings yet
PDTB3 Annotation Manual
81 pages
Generating Questions For Reading Comprehension Using Coherence Relations
No ratings yet
Generating Questions For Reading Comprehension Using Coherence Relations
10 pages
Coreference Resolution in NLP Explained
No ratings yet
Coreference Resolution in NLP Explained
13 pages
Discourse Segmentation & Coherence
No ratings yet
Discourse Segmentation & Coherence
66 pages
Taboada & Mann (2006)
No ratings yet
Taboada & Mann (2006)
38 pages
Discourse Markers As Signals (Or Not) of Rhetorical Relations
No ratings yet
Discourse Markers As Signals (Or Not) of Rhetorical Relations
26 pages
Discourse Semantics and Pragmatics
No ratings yet
Discourse Semantics and Pragmatics
103 pages
Disclosure
No ratings yet
Disclosure
7 pages
Du Bois 2014towards A Dialogic Syntax
No ratings yet
Du Bois 2014towards A Dialogic Syntax
52 pages
Article: "Integrative Semantics"
No ratings yet
Article: "Integrative Semantics"
21 pages
Emnlp05 Textinferencegraphmatching
No ratings yet
Emnlp05 Textinferencegraphmatching
8 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
57 pages
Discourse Structure Analysis by Pinkal
No ratings yet
Discourse Structure Analysis by Pinkal
57 pages
Unit - 5
No ratings yet
Unit - 5
21 pages
Dialogo Coherence
No ratings yet
Dialogo Coherence
12 pages
NLP Vi6
No ratings yet
NLP Vi6
11 pages
11-Article Text-48-1-10-20090114 PDF
No ratings yet
11-Article Text-48-1-10-20090114 PDF
20 pages
Implicit & Explicit Group 6
No ratings yet
Implicit & Explicit Group 6
26 pages
Module 5
No ratings yet
Module 5
27 pages
Sanders&Noormand 2000
No ratings yet
Sanders&Noormand 2000
25 pages
Discourse Linguistics: Discourse Structure Text Coherence and Cohesion Reference Resolution
100% (1)
Discourse Linguistics: Discourse Structure Text Coherence and Cohesion Reference Resolution
41 pages
Review Part 2 So Now, On To What Kind of Meanings?
No ratings yet
Review Part 2 So Now, On To What Kind of Meanings?
30 pages
Discourse Parsing for NLP Experts
No ratings yet
Discourse Parsing for NLP Experts
12 pages
Concept of Coherence: Coherence Relation Between Utterances
No ratings yet
Concept of Coherence: Coherence Relation Between Utterances
5 pages
Sanders e Noordman - 2000 - The Role of Coherence Relations and Their Linguistic Markers in Text Processing
No ratings yet
Sanders e Noordman - 2000 - The Role of Coherence Relations and Their Linguistic Markers in Text Processing
25 pages
NLP Module 5
No ratings yet
NLP Module 5
53 pages
Phrase
No ratings yet
Phrase
6 pages
Wellner Dissertation
No ratings yet
Wellner Dissertation
227 pages
Text by Halliday
No ratings yet
Text by Halliday
8 pages
Coherence, Reference, and The Theory of Grammar: Andrew Kehler
No ratings yet
Coherence, Reference, and The Theory of Grammar: Andrew Kehler
12 pages
Relation Detection Between Named Entities: Report of A Shared Task
No ratings yet
Relation Detection Between Named Entities: Report of A Shared Task
9 pages
Freita Set Al Sew 2009
No ratings yet
Freita Set Al Sew 2009
9 pages
Discourse Segmentation
No ratings yet
Discourse Segmentation
5 pages
Some Good Projects in NLP
No ratings yet
Some Good Projects in NLP
3 pages
MiPACQ: Advanced Clinical QA System
No ratings yet
MiPACQ: Advanced Clinical QA System
10 pages
NLP Text Preprocessing Techniques
No ratings yet
NLP Text Preprocessing Techniques
59 pages
Automatic Analysis of Syntactic Complexity in Second Language Writing
No ratings yet
Automatic Analysis of Syntactic Complexity in Second Language Writing
24 pages
TLT11 Proceedings
No ratings yet
TLT11 Proceedings
198 pages
Handling Unknown Words in Statistical Latent-Variable Parsing Models For Arabic, English and French
No ratings yet
Handling Unknown Words in Statistical Latent-Variable Parsing Models For Arabic, English and French
9 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
61 pages
Verse (12:1) - Word by Word: Quranic Arabic Corpus
No ratings yet
Verse (12:1) - Word by Word: Quranic Arabic Corpus
232 pages
Overview of Arabic NLP Techniques
No ratings yet
Overview of Arabic NLP Techniques
11 pages
Structural Probe for Syntax in Word Representations
No ratings yet
Structural Probe for Syntax in Word Representations
10 pages
Murenei - Natural Language Processing With Python and NLTK
No ratings yet
Murenei - Natural Language Processing With Python and NLTK
2 pages
Evaluating Part-Of-speech Tagging and Parsing
No ratings yet
Evaluating Part-Of-speech Tagging and Parsing
26 pages
Introduction To Natural Language Processing
100% (3)
Introduction To Natural Language Processing
111 pages
Parsing of Myanmar Sentences With Function Tagging
No ratings yet
Parsing of Myanmar Sentences With Function Tagging
19 pages
Ai in Natural Language Processing
No ratings yet
Ai in Natural Language Processing
4 pages
Master Thesis Philips
75% (4)
Master Thesis Philips
6 pages
Sandiway Fong: Education
No ratings yet
Sandiway Fong: Education
10 pages
Algorithm of Word Order and Syntactic Analysis in Uzbek Language Sentences
No ratings yet
Algorithm of Word Order and Syntactic Analysis in Uzbek Language Sentences
12 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
45 pages
NLP - Unit 3 Notes
No ratings yet
NLP - Unit 3 Notes
73 pages
Grammar Extraction From Treebanks For Hindi and Te
No ratings yet
Grammar Extraction From Treebanks For Hindi and Te
9 pages
Verse (13:1) - Word by Word: Quranic Arabic Corpus
No ratings yet
Verse (13:1) - Word by Word: Quranic Arabic Corpus
110 pages
Banarescu Et Al. - 2005 - Abstract Meaning Representation For Sembanking
No ratings yet
Banarescu Et Al. - 2005 - Abstract Meaning Representation For Sembanking
9 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Quranic Arabic Corpus - Data Download
No ratings yet
Quranic Arabic Corpus - Data Download
1 page
Natural Language Processing Question Bank
No ratings yet
Natural Language Processing Question Bank
4 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages

Discourse Structure Parsing

Uploaded by

Discourse Structure Parsing

Uploaded by

Discourse Structure

• Jane took a train from Paris to Istanbul. She likes spinach.

• The most commonly used model of discourse organization is Rhetorical Structure

• Elaboration: The satellite gives additional information or detail about the

• List: In this multinuclear relation, a series of nuclei is given, without contrast

• the asymmetric Nucleus Satellite relation is represented with an

• We can also consider the coherence of a larger text by the hierarchical

• Roughly speaking, one can think of discourse segments as being analogous to

• It consists of 385 English language documents selected from the Penn

• PDTB labeling is lexically grounded.

• Instead of asking annotators to directly tag the coherence relation

• In July, the Environmental Protection Agency imposed a gradual ban on virtually

• This task is often called discourse parsing (even though discourse

• using representation learning to build representations for each span,

• Discourses in which adjacent sentences continue to maintain the

• a. John went to his favorite music store to buy a piano.

You might also like