0% found this document useful (0 votes)
34 views14 pages

10-Text Summarization

Uploaded by

thatsarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views14 pages

10-Text Summarization

Uploaded by

thatsarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS463 – Natural Language Processing

Text Summarization
 Problem of Information Overload
 Text Summarization & Types of Summaries
 Automatic Text Summarization (ATS)
• Categories
• Stages
• Approaches
 Summarization System Evaluation Techniques
NLP Pipeline
speech text

Phonetic Analysis OCR/ Tokenization

Morphological analysis Sentence Meaning


Representation
(Semantic Analysis)

Syntactic analysis Word Meaning


(Lexical Semantics)

Information Extraction
Semantic Interpretation
Text Summarization

Text Categorization
Discourse Processing
Text Clustering
2
Problem of Information Overload
• The problem of information overload – is a state of being
overwhelmed by the amount of data presented by one’s
attention or processing.
– Too much information from a tremendous variety of sources
– Hundreds of billions of URLs indexed by Google
– Hundreds of petabytes of data
– Large volumes of training datasets
– Cause increased stress, less productivity and poor decision-making

• Possible solution approaches


– Information retrieval
– Information extraction
– Document categorization and clustering
– Test summarization

3
Text Summarization & Types of Summaries
• Text summarization - is the process of creating a shorter
version of a text document that contains the most important
information from the original document.
• Three types of summaries are purposed by text summarization:
– Indicative summary – provides a general overview of the text
• Identifies the main points and overall message

– Informative summary – provides more detailed information about the


text
• Goes beyond main points to provide more context and explanation

– Critical summary – provides with author’s perspective on the text and


give the chance to critically develop into it
• Help the reader to think more critically about the text and develop their own
understanding of it
4
Text Summarization & Types of Summaries
• Example – An article:
A new study finds that eating Chocolate can help you lose weight
• Indicative summary - This article reports on a new study that has found that
eating chocolate can help you lose weight. The study found that participants who
ate chocolate on a regular basis lost more weight and had smaller waistlines than
those who did not eat chocolate.

• Informative summary – The study involved 102 participants who were


randomly assigned to either a group that ate chocolate on a regular basis or a
group that did not eat chocolate. The participants in the chocolate group ate 70
grams of dark chocolate per day for 12 weeks. The participants in the control
group did not eat chocolate for the 12-week period. At the end of the study, the
participants in the chocolate group had lost an average of 5 pounds and had
smaller waistlines than the participants in the control group.

• Critical summary – Overall, this is a good article that provides valuable


information about the potential benefits of eating chocolate for weight loss.
However, it is important to note that more research is needed to confirm these
findings. 5
Automatic Text Summarization (ATS) - Categories
• Automatic Text summarization (ATS) approaches are classified
under two categories:
– Extractive summarization approaches - select the most important
sentences from the input text and concatenate them to form the
summary.
• Indicative summaries
– Abstractive summarization approaches - generate the summary from
scratch using their own language model.
• Generate sentences describing the content of the text
• Are more complex than extractive summarization approaches, but they can
produce more informative and coherent summaries.
• Informative and critical summaries
• Dimensions:
– Single document vs multi-document
• Context:
– Query-specific vs query-independent 6
Automatic Text Summarization (ATS) - Categories
• Mani and Maybury (1999) proposed a taxonomy of text
summarization genres into four categories:
– Generic summarization – generating summaries for general-purpose text
documents.
• e.g. Biographies, Abridgments, Movie summaries, News articles, Contracts, ..
– Query-focused summarization - generating summaries of text documents
that are relevant to a specific query.
• e.g. Headlines, Movie summaries, News articles, Research papers, Contracts, ..
– Update summarization - generating summaries of text documents that
contain new information relevant to a previously generated summary.
• e.g. Minutes, TV Series summaries, News articles, ..
– Abstractive summarization - generating summaries that are faithful to
the meaning of the original document, but which may not contain any of
the original sentences.
• e.g. Headlines, Minutes, Biographies, Movie/TV Series, Chronologies, ..

7
Automatic Text Summarization (ATS) - Stages
• ATS involves three stages:
– Content identification – involves identifying important
information in the input text.
• i.e. Extracting keywords and phrases, named entities, main topic
– Conceptual organization - involves organizing the
identified content into a coherent structure.
• i.e. Finding relations between pieces of information and grouping
related ones
– Realization - involves generating the summary text based on
the conceptual organization.
• i.e. Selecting existing sentences and/or generating new ones

8
Automatic Text Summarization (ATS) - Approaches
• Human summarization and abstracting
– What professional abstractors do
– Ashworth (1973):
• “To take an original article, understand it and pack it
neatly into a nutshell without loss of substance or clarity
presents a challenge which many have felt worth taking
up for the joys of achievement alone. These are the
characteristics of an art form”.

9
Automatic Text Summarization (ATS) - Approaches
• Automatic abstracting is a method of generating summaries of
text using computer algorithms.
• Borko and Bernier (1975) outlined the six uses of abstracts where
automatic abstracting can help:
1. Current awareness – Automatic abstracting provide a way for researchers to stay
up-to-date on the latest research in their field.
2. Saving reading time – Automatic abstracting provide much shorter than the full
text of an article, so they can save researchers a lot of time.
3. Selection – Automatic abstracting can help researchers to select relevant articles
for review or citation.
4. Literature searches – Automatic abstracting can be used to generate summaries
of articles that are relevant to a researcher's search query.
5. Indexing efficiency – Automatic Abstracting can be used to improve the
efficiency of indexing by keywords.
6. Review preparation – Automatic abstracting can be used to help researchers to
prepare reviews of articles. 10
Summarization System Evaluation Techniques
• Compression Ratio = size of summary / size of original
|𝑺|
𝑪𝒐𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒐𝒏 𝑹𝒂𝒕𝒊𝒐 =
|𝑫|
• Example:
• Summary (S) = 1 mg, original document (D) = 10 mg
• Compression Ratio = 1 : 10

• Compression Ratio can be an ideal evaluation method for


information content
– depending on situation of application
• A search engine might prefer summaries with very high compression ratios
so that they can be displayed more quickly.
• On the other hand, a human user might prefer summaries with lower
compression ratios if they are more informative and readable.
15
Summarization System Evaluation Techniques
• We use Extrinsic and/or Intrinsic techniques to evaluate
automatic text summarization
– Extrinsic techniques (task-based) - focus on assessing the quality of a
summary based on its performance on a specific task.
• Answering the question: Can a person make the same decision or have the
same knowledge with the summary as with the entire document?
• e.g. answering questions or identifying relevant information.

– Intrinsic techniques - focus on assessing the quality of a summary


compared to a human generated summary as a reference summary or
ideal summary or gold standard.

16
Summarization System Evaluation Techniques

17
Summarization System Evaluation Techniques
• Co-selection: Precision, Recall and F-score
• Precision metric – is about asking the question: Out of the
sentences the machine has selected, how many of them were
selected by the human summary?
– Precision (P) is the number of sentences occurring in both candidate
and reference summaries divided by the number of sentences in the
candidate summary
• Recall metric – is about asking the question: Out of the sentences
the human has selected, how many of them were selected by the
machine as well?
– Recall (R) is the number of matched sentences in both candidate and
reference summaries divided by the number of sentences in the
reference summary
• F-score metric – is a harmonic average of Precision and Recall
18

You might also like