0% found this document useful (0 votes)
51 views21 pages

NLP Text Summarization Techniques

Uploaded by

Kezhan Shi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views21 pages

NLP Text Summarization Techniques

Uploaded by

Kezhan Shi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SALAÜN Mathilde

KARUNATHASAN Nilany
JEGATHEESWARAN Janany
SAMBATH Sïndoumady

TEXT SUMMARIZATION
STRATEGIES
Theme Analysis and Evolution of NLP Techniques
OVERVIEW

About Us

Context

List of techniques

References
ABOUT US

Mathilde Salaün Janany Jegatheeswaran


Data Developper Big Data Engineer
https://www.linkedin.com/in/ https://www.linkedin.com/in/
mathilde-salaun-13378b252/ janany-jegatheeswaran-
a729661ba/

Nilany Karunathasan Sïndoumady Sambath


Data Scientist Software Engineer
https://www.linkedin.com/ https://www.linkedin.com/
in/nilany-karunathasan- in/s%C3%AFndoumady-
7b49691ba/ sambath-a7519a209/
CONTEXT

Issue Purpose Demand


Information Overload due to Simplifying abundant Need for complex and
Internet growth material for accessibility powerful summarization
tools

Objective Analysis
Machine-generated Summarization concepts,
summaries aligned with techniques, metrics, and
human-created future scopes
TECHNIQUES

Extractive Summarization

Text Hybrid Summarization


Summarization
Abstractive Summarization
PARADIGM I : EXTRACTIVE
SUMMARIZATION

An approach that involves selecting and combining crucial sentences or phrases directly from the original text to construct a summary.

Focuses on identifying and extracting the most pertinent information while preserving the exact wording from the source material.

KEY INFORMATION SENTENCE ORIGINAL WORDING


TEXT INPUT IDENTIFICATION
COMBINATION SUMMARY OUTPUT
SELECTION PRESERVATION
SPECIFIC METHOD : TF-IDF WEIGHTING
OF MULTI-WORD TERMS
Multi-word Terms
Classic TF-IDF for single-word terms
Introduction of maximal word limit | Preprocessing
Recognize document-specific phrases
Utilize Python nltk library
Text splitting, tokenization, and symbol
removal
Custom stopword list

Creating the TF-IDF Matrix


Define Maximal Term Length (TL)
Generate Multi-word Terms
Calculate TF and IDF

Most Important Sequence


Find Sequences (up to 1000 words)
Calculate TF-IDF Scores
Rank Sequences
Select Highest-Ranking Sequence as
Summary
SPECIFIC METHOD : TF-IDF WEIGHTING
OF MULTI-WORD TERMS
Pipeline of the Approach

DOCUMENT CORPUS PREPROCESSING MULTI-WORD TERMS COMPUTE TF-IDF

GENERATE CANDIDATE SEQUENCES

BEST SCORED SUMMARY


TF-IDF SCORES FOR SEQUENCES
PARADIGM II : ABSTRACTIVE
SUMMARIZATION

Techniques of
Abstractive text
summarization

Uses natural language techniques to interpret


Structure based Semantic based
and understand the important aspects of a text approach
approach
and generate a more “human” friendly summary

Needs a deeper analysis of the text.


Tree Template Ontology Semantic Graph
Ability to generate new sentences. Based Method
based based based

Abstractive methods classified into two Information item Multimodal


categories namely : structured based approach Rule based Graph based based methods semantic model
and semantic based approach.
EXAMPLE : PEGASUS
PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models)
text summary template specially designed for abstractive summarization
uses deep learning in combination with natural language processing (NLP)
built on Transformer architecture

Architecture Schema
EXAMPLE : PEGASUS
The rows of the table represent the different models evaluated, while the columns represent the RED metrics for each
dataset.

PEGASUS is highlighted with two different configurations, PEGASUS_LARGE (C4) and PEGASUS_LARGE (HugeNews),
which likely indicate two variants of the PEGASUS model trained with different datasets or hyperparameters.

ROUGE scores are generally higher for PEGASUS compared to other models, suggesting that PEGASUS performs better
for the automatic text summarization task on these specific datasets. This may be due to PEGASUS' specialized pre-
training method that is optimized for the summary task.

Models Performance
ALTERNATIVE APPROACH :
HYBRID SUMMARIZATION
Hybrid text summarization methods combine elements of both extractive and abstractive approaches into a more nuanced
approach. The aim is to leverage the factual accuracy of extractive techniques and the flexibility of abstractive methods.

Typically, a hybrid model first selects important sentences or phrases from the source text using extractive techniques and
then generates a concise and coherent summary by paraphrasing and rephrasing the extracted content in an abstractive
manner.

Example of Hybrid Summarization :

Link :
SPECIFIC METHOD : GRAPH BASED
SUMMARIZATION

Text Pre-processing
Input Processing
Word Tokenization
POS Taging
Lemmatization
Graph Generation
Node=> Sentence
Weighted-Edge => Similarity Measur
Semantic inclusion using doc2vec

Example
SPECIFIC METHOD : GRAPH BASED
SUMMARIZATION

Processing Output
Post-Processing
Ranking Generate Summary
TextRank Algorithm
Vertex Voting
Clustering & Selection
Score per sentence Generate Clusters
Cluster based Rank Calculation
Topic per Cluster
SPECIFIC METHOD : GRAPH BASED
SUMMARIZATION

Results on English and Persian document based on ROUGE score


SPECIFIC METHOD : NEURAL NETWORK
BASED SUMMARIZATION
Neural network-based summarization methods use artificial neural networks to automatically generate concise and
coherent summaries of text. These methods can fall either into extractive or abstractive summarization.

Extractive summarization using neural networks involves training a model to select and rank important sentences
or phrases directly from the input text. Here's a basic outline of how a neural network for extractive summarization
can be structured:

Example of Extractive Summarization using DL


SPECIFIC METHOD : NEURAL NETWORK
BASED SUMMARIZATION
Overview of Abstractive Summarization using Deep Learning

RNN, LSTM, GRU Attention Mechanisms Transformer Models


(formerly used)

Took precedent input into account, but had Address limitations by allowing model to focus Self-attention mechanism allows considering the
difficulties handling long-term dependencies on different parts of the input text while entire context of the input text, facilitating better
and forgot information from the beginning of generating each word of the summary. capture of long-range dependencies
the document
SPECIFIC METHOD : NEURAL NETWORK
BASED SUMMARIZATION
Pre-trained Models
Pointer-Generator Networks
(BERT, GPT...) Metrics

Hard to evalutate due to


subjectiveness.
Most common metrics are :
ROUGE (Recall-Oriented
Understudy for Gisting
Evaluation)
BLEU (Bilingual
Evaluation Understudy)
METEOR

Fine-tuned for summarization tasks, they have Handles out-of-vocabulary words, incorporating
shown impressive performance in NLP a mechanism to copy words directly from the
applications, including abstractive summarization. source document into the summary
COMPARAISON / PROS AND CONS

EXTRACTIVE ABSTRACTIVE HYBRIDE

Respect for grammar


Preservation of Information Preservation of Information
Adaptability
Interpretability Reduced Redundancy
Human-like summary
Reduced Risk of Information Improved Coherence
PROS Loss
Ability to grasp the context and
Handling Ambiguity
its subtleties
Language Fluency Domain Adaptability
Non-Structural Information
Customization and Flexibility
Processing

Increased Complexity
Limited Creativity Costly in terms of time and
Training Data Challenges
Redundancy equipment
Computational Resources
CONS Difficulty with Incoherent Texts Information loss risk
Evaluation Challenges
Dependency on Sentence Technical complexity
Risk of Redundancy
Importance Metrics Potential biases
Interpretability Unsure
FUTURE CHALLENGES

Handling Multiple Real-time Summarization Domain-specific


Document Summarization Summarization
REFERENCES
General Overview
Yadav, D., Desai, J., & Yadav, A. K. (Year). Automatic Text Summarization Methods : A Comprehensive Review.
https://arxiv.org/ftp/arxiv/papers/2204/2204.01849.pdf?fbclid=IwAR0zVHc1Be5Usggg5TI7_VUMO8LpyCHDwc8dIh16iqsW-
WCiCXTcOIZHIdg

On Extractive Summarization
Krimberg, S., Vanetik, N., & Litvak, M. (2021). Summarization of financial documents with TF-IDF weighting of multi-word terms, FNP,
Computer Science, Business, https://doi.org/10.1016/j.mlwa.2022.100324

On Abstractive Summarization
Zhang, J., Zhao, Y., Saleh, M., & Liu, P. J. (2020). PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Rawat, P., Ganpatrao, N. G., & Gupta, D. (2017). Text Summarization Using Abstractive Methods. Journal of Network Communications
and Emerging Technologies (JNCET)

On Hybrid Summarization
Elsaid, A., Mohammed, A., Fattouh, L., & Sakre, M. (2020). A Hybrid Arabic Text Summarization Approach Based on Seq-to-Seq and
Transformer

On Graph based Summarization


Mihalcea, R. (2004, 1 juillet). TextRank : Bringing order into text. ACL Anthology. https://aclanthology.org/W04-3252/
Bichi, A. A., Samsudin, R., Hassan, R., Hasan, L., & Rogo, A. A. (2023). Graph-based Extractive Text summarization Method for Hausa
Text. PLOS ONE, 18(5), e0285376. https://doi.org/10.1371/journal.pone.0285376

You might also like