0% found this document useful (0 votes)

51 views21 pages

NLP Text Summarization Techniques

Uploaded by

Kezhan Shi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views21 pages

NLP Text Summarization Techniques

Uploaded by

Kezhan Shi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SALAÜN Mathilde

KARUNATHASAN Nilany
JEGATHEESWARAN Janany
SAMBATH Sïndoumady

TEXT SUMMARIZATION
STRATEGIES
Theme Analysis and Evolution of NLP Techniques
OVERVIEW

About Us

Context

List of techniques

References
ABOUT US

Mathilde Salaün Janany Jegatheeswaran

Data Developper Big Data Engineer
https://www.linkedin.com/in/ https://www.linkedin.com/in/
mathilde-salaun-13378b252/ janany-jegatheeswaran-
a729661ba/

Nilany Karunathasan Sïndoumady Sambath

Data Scientist Software Engineer
https://www.linkedin.com/ https://www.linkedin.com/
in/nilany-karunathasan- in/s%C3%AFndoumady-
7b49691ba/ sambath-a7519a209/
CONTEXT

Issue Purpose Demand

Information Overload due to Simplifying abundant Need for complex and
Internet growth material for accessibility powerful summarization
tools

Objective Analysis
Machine-generated Summarization concepts,
summaries aligned with techniques, metrics, and
human-created future scopes
TECHNIQUES

Extractive Summarization

Text Hybrid Summarization

Summarization
Abstractive Summarization
PARADIGM I : EXTRACTIVE
SUMMARIZATION

An approach that involves selecting and combining crucial sentences or phrases directly from the original text to construct a summary.

Focuses on identifying and extracting the most pertinent information while preserving the exact wording from the source material.

KEY INFORMATION SENTENCE ORIGINAL WORDING

TEXT INPUT IDENTIFICATION
COMBINATION SUMMARY OUTPUT
SELECTION PRESERVATION
SPECIFIC METHOD : TF-IDF WEIGHTING
OF MULTI-WORD TERMS
Multi-word Terms
Classic TF-IDF for single-word terms
Introduction of maximal word limit | Preprocessing
Recognize document-specific phrases
Utilize Python nltk library
Text splitting, tokenization, and symbol
removal
Custom stopword list

Creating the TF-IDF Matrix

Define Maximal Term Length (TL)
Generate Multi-word Terms
Calculate TF and IDF

Most Important Sequence

Find Sequences (up to 1000 words)
Calculate TF-IDF Scores
Rank Sequences
Select Highest-Ranking Sequence as
Summary
SPECIFIC METHOD : TF-IDF WEIGHTING
OF MULTI-WORD TERMS
Pipeline of the Approach

DOCUMENT CORPUS PREPROCESSING MULTI-WORD TERMS COMPUTE TF-IDF

GENERATE CANDIDATE SEQUENCES

BEST SCORED SUMMARY

TF-IDF SCORES FOR SEQUENCES
PARADIGM II : ABSTRACTIVE
SUMMARIZATION

Techniques of
Abstractive text
summarization

Uses natural language techniques to interpret

Structure based Semantic based
and understand the important aspects of a text approach
approach
and generate a more “human” friendly summary

Needs a deeper analysis of the text.

Tree Template Ontology Semantic Graph
Ability to generate new sentences. Based Method
based based based

Abstractive methods classified into two Information item Multimodal

categories namely : structured based approach Rule based Graph based based methods semantic model
and semantic based approach.
EXAMPLE : PEGASUS
PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models)
text summary template specially designed for abstractive summarization
uses deep learning in combination with natural language processing (NLP)
built on Transformer architecture

Architecture Schema
EXAMPLE : PEGASUS
The rows of the table represent the different models evaluated, while the columns represent the RED metrics for each
dataset.

PEGASUS is highlighted with two different configurations, PEGASUS_LARGE (C4) and PEGASUS_LARGE (HugeNews),
which likely indicate two variants of the PEGASUS model trained with different datasets or hyperparameters.

ROUGE scores are generally higher for PEGASUS compared to other models, suggesting that PEGASUS performs better
for the automatic text summarization task on these specific datasets. This may be due to PEGASUS' specialized pre-
training method that is optimized for the summary task.

Models Performance
ALTERNATIVE APPROACH :
HYBRID SUMMARIZATION
Hybrid text summarization methods combine elements of both extractive and abstractive approaches into a more nuanced
approach. The aim is to leverage the factual accuracy of extractive techniques and the flexibility of abstractive methods.

Typically, a hybrid model first selects important sentences or phrases from the source text using extractive techniques and
then generates a concise and coherent summary by paraphrasing and rephrasing the extracted content in an abstractive
manner.

Example of Hybrid Summarization :

Link :
SPECIFIC METHOD : GRAPH BASED
SUMMARIZATION

Text Pre-processing
Input Processing
Word Tokenization
POS Taging
Lemmatization
Graph Generation
Node=> Sentence
Weighted-Edge => Similarity Measur
Semantic inclusion using doc2vec

Example
SPECIFIC METHOD : GRAPH BASED
SUMMARIZATION

Processing Output
Post-Processing
Ranking Generate Summary
TextRank Algorithm
Vertex Voting
Clustering & Selection
Score per sentence Generate Clusters
Cluster based Rank Calculation
Topic per Cluster
SPECIFIC METHOD : GRAPH BASED
SUMMARIZATION

Results on English and Persian document based on ROUGE score

SPECIFIC METHOD : NEURAL NETWORK
BASED SUMMARIZATION
Neural network-based summarization methods use artificial neural networks to automatically generate concise and
coherent summaries of text. These methods can fall either into extractive or abstractive summarization.

Extractive summarization using neural networks involves training a model to select and rank important sentences
or phrases directly from the input text. Here's a basic outline of how a neural network for extractive summarization
can be structured:

Example of Extractive Summarization using DL

SPECIFIC METHOD : NEURAL NETWORK
BASED SUMMARIZATION
Overview of Abstractive Summarization using Deep Learning

RNN, LSTM, GRU Attention Mechanisms Transformer Models

(formerly used)

Took precedent input into account, but had Address limitations by allowing model to focus Self-attention mechanism allows considering the
difficulties handling long-term dependencies on different parts of the input text while entire context of the input text, facilitating better
and forgot information from the beginning of generating each word of the summary. capture of long-range dependencies
the document
SPECIFIC METHOD : NEURAL NETWORK
BASED SUMMARIZATION
Pre-trained Models
Pointer-Generator Networks
(BERT, GPT...) Metrics

Hard to evalutate due to

subjectiveness.
Most common metrics are :
ROUGE (Recall-Oriented
Understudy for Gisting
Evaluation)
BLEU (Bilingual
Evaluation Understudy)
METEOR

Fine-tuned for summarization tasks, they have Handles out-of-vocabulary words, incorporating
shown impressive performance in NLP a mechanism to copy words directly from the
applications, including abstractive summarization. source document into the summary
COMPARAISON / PROS AND CONS

EXTRACTIVE ABSTRACTIVE HYBRIDE

Respect for grammar

Preservation of Information Preservation of Information
Adaptability
Interpretability Reduced Redundancy
Human-like summary
Reduced Risk of Information Improved Coherence
PROS Loss
Ability to grasp the context and
Handling Ambiguity
its subtleties
Language Fluency Domain Adaptability
Non-Structural Information
Customization and Flexibility
Processing

Increased Complexity
Limited Creativity Costly in terms of time and
Training Data Challenges
Redundancy equipment
Computational Resources
CONS Difficulty with Incoherent Texts Information loss risk
Evaluation Challenges
Dependency on Sentence Technical complexity
Risk of Redundancy
Importance Metrics Potential biases
Interpretability Unsure
FUTURE CHALLENGES

Handling Multiple Real-time Summarization Domain-specific

Document Summarization Summarization
REFERENCES
General Overview
Yadav, D., Desai, J., & Yadav, A. K. (Year). Automatic Text Summarization Methods : A Comprehensive Review.
https://arxiv.org/ftp/arxiv/papers/2204/2204.01849.pdf?fbclid=IwAR0zVHc1Be5Usggg5TI7_VUMO8LpyCHDwc8dIh16iqsW-
WCiCXTcOIZHIdg

On Extractive Summarization
Krimberg, S., Vanetik, N., & Litvak, M. (2021). Summarization of financial documents with TF-IDF weighting of multi-word terms, FNP,
Computer Science, Business, https://doi.org/10.1016/j.mlwa.2022.100324

On Abstractive Summarization
Zhang, J., Zhao, Y., Saleh, M., & Liu, P. J. (2020). PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Rawat, P., Ganpatrao, N. G., & Gupta, D. (2017). Text Summarization Using Abstractive Methods. Journal of Network Communications
and Emerging Technologies (JNCET)

On Hybrid Summarization
Elsaid, A., Mohammed, A., Fattouh, L., & Sakre, M. (2020). A Hybrid Arabic Text Summarization Approach Based on Seq-to-Seq and
Transformer

On Graph based Summarization

Mihalcea, R. (2004, 1 juillet). TextRank : Bringing order into text. ACL Anthology. https://aclanthology.org/W04-3252/
Bichi, A. A., Samsudin, R., Hassan, R., Hasan, L., & Rogo, A. A. (2023). Graph-based Extractive Text summarization Method for Hausa
Text. PLOS ONE, 18(5), e0285376. https://doi.org/10.1371/journal.pone.0285376

Sample Research
No ratings yet
Sample Research
29 pages
Text Summarisation Method in NLP
No ratings yet
Text Summarisation Method in NLP
38 pages
An Extractive Approach For English Text
No ratings yet
An Extractive Approach For English Text
11 pages
FALLSEM2024-25 BCSE409L TH VL2024250101879 2024-11-14 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE409L TH VL2024250101879 2024-11-14 Reference-Material-I
13 pages
Advances in Text Summarization Techniques
No ratings yet
Advances in Text Summarization Techniques
7 pages
Automatic Text Summarization in Python
No ratings yet
Automatic Text Summarization in Python
8 pages
IEEE Conference Template 1 PDF
No ratings yet
IEEE Conference Template 1 PDF
3 pages
Automatic Text Recognisation
No ratings yet
Automatic Text Recognisation
4 pages
Sma U-4
No ratings yet
Sma U-4
25 pages
9 JCS 3
No ratings yet
9 JCS 3
6 pages
Towards Efficient Knowledge Extraction Natural Lan
No ratings yet
Towards Efficient Knowledge Extraction Natural Lan
12 pages
Towards Efficient Knowledge Extraction: Natural Language Processing-Based Summarization of Research Paper Introductions
No ratings yet
Towards Efficient Knowledge Extraction: Natural Language Processing-Based Summarization of Research Paper Introductions
12 pages
22mca025 22mca032 22mca034
No ratings yet
22mca025 22mca032 22mca034
14 pages
Abstractive Text Summarization Using Transformer Based Approach
No ratings yet
Abstractive Text Summarization Using Transformer Based Approach
10 pages
(Group-12) NLP Project File
No ratings yet
(Group-12) NLP Project File
23 pages
IEEE Conference Template 3 PDF
No ratings yet
IEEE Conference Template 3 PDF
4 pages
Text Summarisation Method in NLP
No ratings yet
Text Summarisation Method in NLP
13 pages
Rare Words in Text Summarization
No ratings yet
Rare Words in Text Summarization
11 pages
Implementation of NLP Based Automatic Text Summarization Using Spacy
No ratings yet
Implementation of NLP Based Automatic Text Summarization Using Spacy
15 pages
Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
5 pages
Project Final Presentation
No ratings yet
Project Final Presentation
30 pages
Text Summarization - Articles - Weights & Biases
No ratings yet
Text Summarization - Articles - Weights & Biases
16 pages
A Hybrid Approach For Text Summarization Using Semantic Latent Dirichlet Allocation and Sentence Concept Mapping With Transformer
No ratings yet
A Hybrid Approach For Text Summarization Using Semantic Latent Dirichlet Allocation and Sentence Concept Mapping With Transformer
10 pages
Ir Case Study
No ratings yet
Ir Case Study
8 pages
Comparative Analysis of Modern Text Summarization Techniques
No ratings yet
Comparative Analysis of Modern Text Summarization Techniques
16 pages
IEEE Conference Template 3
No ratings yet
IEEE Conference Template 3
4 pages
NLP Text Summarization Survey
No ratings yet
NLP Text Summarization Survey
23 pages
The Impact of Rule-Based Text Generation On The Quality of Abstractive Summaries
No ratings yet
The Impact of Rule-Based Text Generation On The Quality of Abstractive Summaries
10 pages
ASWIN TS Summarisation of NLP Simplified Notes Unit 3
No ratings yet
ASWIN TS Summarisation of NLP Simplified Notes Unit 3
4 pages
Synopsis Creation For Research Paper Using Text Summarization Models
No ratings yet
Synopsis Creation For Research Paper Using Text Summarization Models
5 pages
Text Summarization Using NLP Technique
No ratings yet
Text Summarization Using NLP Technique
7 pages
Arabic Text Summarization
No ratings yet
Arabic Text Summarization
3 pages
Summerization Presentation
No ratings yet
Summerization Presentation
9 pages
Research Paper 7
No ratings yet
Research Paper 7
8 pages
Module 7
No ratings yet
Module 7
44 pages
Summarization of Unstructured Text Data Methodology and Pre Processing Approach IJERTV14IS010028
No ratings yet
Summarization of Unstructured Text Data Methodology and Pre Processing Approach IJERTV14IS010028
5 pages
Seminar Text Summarization 1
No ratings yet
Seminar Text Summarization 1
21 pages
AI Report
No ratings yet
AI Report
15 pages
Extractive Text Summarization Project
No ratings yet
Extractive Text Summarization Project
8 pages
Paper 1
No ratings yet
Paper 1
23 pages
Research Paper 8
No ratings yet
Research Paper 8
4 pages
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
No ratings yet
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
8 pages
Text Summarization Techniques Survey
No ratings yet
Text Summarization Techniques Survey
15 pages
NLP Text Summarization Techniques
100% (1)
NLP Text Summarization Techniques
8 pages
Abstractive Survey
No ratings yet
Abstractive Survey
8 pages
Text Summarisation and Document Understanding Report
No ratings yet
Text Summarisation and Document Understanding Report
50 pages
Research Paper Summarizer Using NLP Techniques
No ratings yet
Research Paper Summarizer Using NLP Techniques
9 pages
Abstractive Summarization Insights
No ratings yet
Abstractive Summarization Insights
38 pages
Moawad 2012
No ratings yet
Moawad 2012
7 pages
Hybrid Summarization for Scientific Texts
No ratings yet
Hybrid Summarization for Scientific Texts
11 pages
Conceptual Framework For Abstractive Text Summarization
No ratings yet
Conceptual Framework For Abstractive Text Summarization
11 pages
Project File
No ratings yet
Project File
23 pages
Research Final
No ratings yet
Research Final
6 pages
Deep Learning Powered Text Summarization Framework For Creating A Highly Accurate Summary
No ratings yet
Deep Learning Powered Text Summarization Framework For Creating A Highly Accurate Summary
19 pages
TC6 PROJECT SYNOPSIS KrishShetty VedantLandge 231106 101402
No ratings yet
TC6 PROJECT SYNOPSIS KrishShetty VedantLandge 231106 101402
13 pages
Data Representation For Deep Learning - Based Arabic Text Summarization Performance Using Python Results
No ratings yet
Data Representation For Deep Learning - Based Arabic Text Summarization Performance Using Python Results
18 pages
NLP Text Summarization Techniques
No ratings yet
NLP Text Summarization Techniques
17 pages
Research Paper - Text Summarization
No ratings yet
Research Paper - Text Summarization
1 page
Blender Eevee Guide
100% (1)
Blender Eevee Guide
77 pages
Check List Mid Life DT & Ab
No ratings yet
Check List Mid Life DT & Ab
127 pages
Eulerian Lagrangian Representations
No ratings yet
Eulerian Lagrangian Representations
91 pages
Listening Part 1&2&3
No ratings yet
Listening Part 1&2&3
5 pages
UMC UNI13-13 - April 2 2013
No ratings yet
UMC UNI13-13 - April 2 2013
24 pages
Whey and Whey Powders Production and Uses
No ratings yet
Whey and Whey Powders Production and Uses
10 pages
Ls4 Sustainable Lifestyle
No ratings yet
Ls4 Sustainable Lifestyle
5 pages
Botanical Research Centre Survey Insights
No ratings yet
Botanical Research Centre Survey Insights
10 pages
CONFIDENTIAL - CV of Mr. Bahtra Insan Tarigan 25 June 2024
No ratings yet
CONFIDENTIAL - CV of Mr. Bahtra Insan Tarigan 25 June 2024
5 pages
Amplitude Modulation &demodulation: Experiment No. 3
No ratings yet
Amplitude Modulation &demodulation: Experiment No. 3
3 pages
Seguin Form Board Test (SFB Seguin 1907) What Does SFB Measure?
100% (1)
Seguin Form Board Test (SFB Seguin 1907) What Does SFB Measure?
3 pages
CAT C32 RNY Series Maintenance Interval Schedule
0% (1)
CAT C32 RNY Series Maintenance Interval Schedule
3 pages
Cost and Management Accounting Overview
No ratings yet
Cost and Management Accounting Overview
352 pages
21 Guns Guitar Tab by Green Day
No ratings yet
21 Guns Guitar Tab by Green Day
6 pages
Atopic Dermatitis: Aaron Justin D. Tingzon
No ratings yet
Atopic Dermatitis: Aaron Justin D. Tingzon
23 pages
William Ophuls - Immoderate Greatness Why Civilizations Fail-CreateSpace Independent Publishing Platform (2012)
No ratings yet
William Ophuls - Immoderate Greatness Why Civilizations Fail-CreateSpace Independent Publishing Platform (2012)
114 pages
Dk18-Work Table With Over Head Shelf
No ratings yet
Dk18-Work Table With Over Head Shelf
1 page
Presentation 1
No ratings yet
Presentation 1
37 pages
Twe Eec 23 24 116 R 02
No ratings yet
Twe Eec 23 24 116 R 02
19 pages
BI 1 Test
100% (1)
BI 1 Test
3 pages
Shahodatnoma: O Zbekiston Respublikasi Republic of Uzbekistan
No ratings yet
Shahodatnoma: O Zbekiston Respublikasi Republic of Uzbekistan
1 page
FOOD ADDITIVES MONTHLY REPORT - Jan., 2025
No ratings yet
FOOD ADDITIVES MONTHLY REPORT - Jan., 2025
9 pages
Understanding Emulsion Paints
No ratings yet
Understanding Emulsion Paints
10 pages
Metabolic-Related Outcomes After Switching From TDF To TAF in Adults With HIV
No ratings yet
Metabolic-Related Outcomes After Switching From TDF To TAF in Adults With HIV
9 pages
2
No ratings yet
2
3 pages
Akai AT-K22
No ratings yet
Akai AT-K22
80 pages
Hormone Receptors & Signal Transduction
No ratings yet
Hormone Receptors & Signal Transduction
2 pages
BCA 2nd Sem - Prog in C
No ratings yet
BCA 2nd Sem - Prog in C
25 pages
01 Resource Material
No ratings yet
01 Resource Material
67 pages
Chavani Company Folder
No ratings yet
Chavani Company Folder
11 pages