0% found this document useful (0 votes)

31 views22 pages

Exploring The Extractive Method of Text Summarization

The document explores the extractive method of text summarization. It discusses extractive summarization, which uses a ranking algorithm to select important sentences from the original text to include in the summary. It also briefly mentions abstractive summarization but focuses on explaining extractive summarization with an example using Python code.

Uploaded by

RAPTER GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views22 pages

Exploring The Extractive Method of Text Summarization

Uploaded by

RAPTER GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Exploring the Extractive

Method of Text Summarization
In this article, we will explore the two
main approaches of NLP text
summarization, namely extractive and
abstractive.

By Shilpi Mazumdar

15 min. read

Introduction
Often there are many situations where we don’t
have/get enough time to read and understand
lengthy documents, research papers, or news
articles. Similarly, summarizing a large volume of
text while retaining essential information is crucial
in many fields, such as journalism, research, and
business. This is where NLP text summarization
comes into play, which is a technique that
automatically generates a condensed version of a
given text while preserving its essential meaning.
In this article, we will explore the two main
approaches of NLP text summarization, namely

[Link] 1/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

extractive and abstractive, and examine their

applications, strengths, and weaknesses.

Learning Objectives

In this article, you will:

1. Understand the different categories of text

vectorization.
2. Understanding extractive and abstraction
approach through examples.
3. Learn the difference between both vectorization
techniques.
4. And the future aspects of text summarization.

Table of Contents
1. Types of Text Summarization
2. Extractive Summarization
3. Abstractive Summarization
4. Understanding with Code
5. Comparison of Extractive and Abstractive Text
Summarization
6. Future Outlook of Text Summarization
7. Conclusion

Types of Text Summarization

Broadly, the NLP text summarization can be
divided into two main categories.

Extractive Approach
Abstractive Approach

Let’s dive a little deeper into each of the above-

mentioned categories.

[Link] 2/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Extractive Summarization
So, what exactly happens in the extractive
summarization method? It simply takes out the
important sentences or phrases from the original
text and joins them to form a summary.

Now, the question that comes is, exactly on what

basis are those sentences termed as important?
So, basically, a ranking algorithm is used, which
assigns scores to each of the sentences in the
text based on their relevance to the overall
meaning of the document. The most relevant
sentences are then chosen to be included in the
summary.

There are various ways through which the ranking

of sentences can be performed.
TF-IDF (term frequency-inverse document
frequency)
Graph-based methods such as TextRank
Machine learning-based methods such as Support
Vector Machines (SVM) and Random Forests.

[Link] 3/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

The main motive of the extractive method is to

maintain the original meaning of the text. Also,
this method works well when the input
text/content is already in a well-structured
manner, both physically and logically, just like the
content in newspapers.

Abstractive Summarization
Okay, now let’s come to the abstractive
summarization method. The name itself implies
that it has arrived from the root form of the word
abstract, which means outline/summary or the
basic idea of a voluminous thing(text). Now unlike
the extractive method, it simply doesn’t pick out
the important sentences, rather, it analyses the
input text and generates new phrases or
sentences that capture the essence of the original
text and convey the same meaning as the original
text but more concisely and coherently.

Again, how exactly is the summary generated in

this method? So, in brief, the input text is
analyzed by a neural network model that learns to
generate new phrases and sentences that capture
the essence of the original text. The model is
trained on large amounts of text data and learns
to understand the relationships between words
and sentences, and generates new text that
conveys the same meaning as the original text in
a more understandable manner.

[Link] 4/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

This method uses advanced NLP techniques such

as natural language generation (NLG) and deep
learning to understand the context and generate
the summary. The resulting summaries are usually
shorter and more readable than the ones
generated by the extractive method, but they can
sometimes contain errors or inaccuracies.

Note that, here in this article, we’ll only deal with

the extractive text summarization method.

Understanding with Code

Here, we’ll focus on the extractive method and
understand it more with an example.

But, before that, let’s quickly understand it with a

flowchart.

Here, we will use a Python library called NLTK

(Natural Language Toolkit) to implement the
extractive method. NLTK provides a wide range of
functionalities for natural language processing,
including text tokenization, stopword removal, and
sentence scoring.

[Link] 5/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Let’s take a look at the following code that

demonstrates how to use NLTK to generate a
summary from a given text:

Frequency-based Approach

# import the required libraries

import nltk
[Link]('punkt') # punkt tokenizer for sentence
tokenization
[Link]('stopwords') # list of stop words, such as
'a', 'an', 'the', 'in', etc, which would be dropped
from collections import Counter # Imports the Counter
class from the collections module, used for counting the
frequency of words in a text.
from [Link] import stopwords # Imports the stop words
list from the NLTK corpus
# corpus is a large collection of text or speech data used
for statistical analysis

from [Link] import sent_tokenize, word_tokenize #

Imports the sentence tokenizer and word tokenizer from the
NLTK tokenizer module.
# Sentence tokenizer is for splitting text into sentences
# word tokenizer is for splitting sentences into words

# this function would take 2 inputs, one being the text,

and the other being the summary which would contain the
number of lines
def generate_summary(text, n):
# Tokenize the text into individual sentences
sentences = sent_tokenize(text)

# Tokenize each sentence into individual words and remove

stopwords
stop_words = set([Link]('english'))
# the following line would tokenize each sentence from
sentences into individual words using the word_tokenize
function of [Link] module
# Then removes any stop words and non-alphanumeric
characters from the resulting list of words and converts

[Link] 6/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

them all to lowercase

words = [[Link]() for word in word_tokenize(text) if
[Link]() not in stop_words and [Link]()]

# Compute the frequency of each word

word_freq = Counter(words)

# Compute the score for each sentence based on the

frequency of its words
# After this block of code is executed, sentence_scores
will contain the scores of each sentence in the given
text,
# where each score is a sum of the frequency counts of its
constituent words

# empty dictionary to store the scores for each sentence

sentence_scores = {}

for sentence in sentences:

sentence_words = [[Link]() for word in
word_tokenize(sentence) if [Link]() not in stop_words
and [Link]()]
sentence_score = sum([word_freq[word] for word in
sentence_words])
if len(sentence_words) < 20:
sentence_scores[sentence] = sentence_score

# checks if the length of the sentence_words list is less

than 20 (parameter can be adjusted based on the desired
length of summary sentences)
# If condition -> true, score of the current sentence is
added to the sentence_scores dictionary with the sentence
itself as the key
# This is to filter out very short sentences that may not
provide meaningful information for summary generation

# Select the top n sentences with the highest scores

summary_sentences = sorted(sentence_scores,
key=sentence_scores.get, reverse=True)[:n]
summary = ' '.join(summary_sentences)

return summary

[Link] 7/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Using a Sample Text From Wikipedia to Generate

Summary

text = '''
Weather is the day-to-day or hour-to-hour change in the
atmosphere.
Weather includes wind, lightning, storms, hurricanes,
tornadoes (also known as twisters), rain, hail, snow, and
lots more.
Energy from the Sun affects the weather too.
Climate tells us what kinds of weather usually happen in
an area at different times of the year.
Changes in weather can affect our mood and life. We wear
different clothes and do different things in different
weather conditions.
We choose different foods in different seasons.
Weather stations around the world measure different parts
of weather.
Ways to measure weather are wind speed, wind direction,
temperature and humidity.
People try to use these measurements to make weather
forecasts for the future.
These people are scientists that are called
meteorologists.
They use computers to build large mathematical models to
follow weather trends.'''

summary = generate_summary(text, 5)
summary_sentences = [Link]('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

Output

The following output is what we would be getting

as a summary. This summary would contain 5
sentences.

[Link] 8/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

We wear different clothes and do different things

in different weather conditions.
Weather stations around the world measure
different parts of weather.
Climate tells us what kinds of weather usually
happen in an area at different times of the year.
Weather includes wind, lightning, storms,
hurricanes, tornadoes (also known as twisters),
rain, hail, snow, and lots more.
Ways to measure weather are wind speed, wind
direction, temperature and humidity.

What’s happening in the above code?

So, the above code takes a text and a desired
number of sentences for the summary as input
and returns a summary generated using the
extractive method. The method first tokenizes the
text into individual sentences and then tokenizes
each sentence into individual words. Stopwords
are removed from the words, and then the
frequency of each word is computed.

Then the score for each sentence is computed

based on the frequency of its words, and the top n
sentences with the highest scores are selected to
form the summary. Finally, the summary is
generated by joining the selected sentences
together.

In the next section, we will explore how the

extractive method can be further improved using

[Link] 9/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

advanced techniques such as TF-IDF.

TF-IDF Approach

# importing the required libraries

# importing TfidfVectorizer class to convert a collection

of raw documents to a matrix of TF-IDF features.
from sklearn.feature_extraction.text import
TfidfVectorizer

# importing cosine_similarity function to compute the

cosine similarity between two vectors.
from [Link] import cosine_similarity

# importing nlargest to return the n largest elements from

an iterable in descending order.
from heapq import nlargest

def generate_summary(text, n):

# Tokenize the text into individual sentences
sentences = sent_tokenize(text)

# Create the TF-IDF matrix

vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(sentences)

# Compute the cosine similarity between each sentence and

the document
sentence_scores = cosine_similarity(tfidf_matrix[-1],
tfidf_matrix[:-1])[0]

# Select the top n sentences with the highest scores

summary_sentences = nlargest(n,
range(len(sentence_scores)),
key=sentence_scores.__getitem__)

summary_tfidf = ' '.join([sentences[i] for i in

sorted(summary_sentences)])

return summary_tfidf

[Link] 10/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Using a Sample Text to Check the Summary

summary = generate_summary(text, 5)
summary_sentences = [Link]('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

The following output is what we would be getting

as a summary. This summary would contain 5
sentences.

Energy from the Sun affects the weather too.

Changes in weather can affect our mood and life.
We wear different clothes and do different things
in different weather conditions.
[Link] 11/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Weather stations around the world measure

different parts of the weather.
People try to use these measurements to make
weather forecasts for the future.

The above code generates a summary for a given

text using a tf idf approach. A function to generate
a summary that takes a text parameter and an n
parameter(number of sentences in summary). The
function tokenizes the text into individual
sentences, creates a TF-IDF matrix using the
TfidfVectorizer class, and computes the cosine
similarity between each sentence and the
document using the cosine_similarity function.
Next, the function selects the top n sentences with
the highest scores using the nlargest function
from the heapq library and joins them into a string
using the join method.

Okay, before moving further, let’s quickly

understand the cosine similarity. You can jump to
the next part if you are already familiar with this.

So, the cosine similarity considers the angle

between the vectors of word frequencies for each
document rather than just their magnitudes. This
means that documents with similar word
frequencies and distributions will have a smaller
angle between their vectors and, thus a higher
cosine similarity score. Let’s understand this with
a simple example.

[Link] 12/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

We have two sentences.

1. “I love cats and dogs.”

2. “I love only cats.”

We first need to convert each sentence into a

vector representation to calculate the similarity
between these two sentences using cosine
similarity with TF-IDF. Here’s how we can do that:

1. “I love cats and dogs.” -> [1, 1, 1, 1, 0, 0]

2. “I love only cats.” -> [1, 1, 1, 0, 1, 0]

How are we getting the vector representation? We

need to perform the following steps.
1. Break the sentence into individual words ->
tokenization:

“I love cats and dogs.” -> [‘I’, ‘love’, ‘cats’, ‘and’,

‘dogs’, ‘.’]
“I love only cats.” -> [‘I’, ‘love’, ‘only’, ‘cats’, ‘.’]

2. Now, Create a vocabulary of unique words from

both sentences:
[‘I’, ‘love’, ‘cats’, ‘and’, ‘dogs’, ‘.’, ‘only’] 3. Now
convert each sentence into a binary vector of size
equal to the vocabulary, where 1 represents the
presence of the word in the sentence and 0
represents its absence.
“I love cats and dogs.” -> [1, 1, 1, 1, 1, 1, 0]
Explanation:
‘I’ is present, hence 1
‘love’ is present, hence 1
‘cats’ is present, hence 1

[Link] 13/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

‘and’ is present, hence 1

‘dogs’ is present, hence 1
‘.’ is present, hence 1
‘only’ is absent, hence 0
“I love only cats.” -> [1, 1, 1, 0, 0, 1, 1]
Explanation:
‘I’ is present -> 1
‘love’ is present -> 1
‘cats’ is present -> 1
‘and’ is absent -> 0
‘dogs’ is absent -> 0
‘.’ is present -> 1
‘only’ is present -> 1
Each vector has six elements corresponding to the
six unique words in the sentences. The values in
each vector represent the frequency of each word
in its respective sentence.

Next, we compute the TF-IDF weights for each

word in both sentences. Let’s assume all words’
inverse document frequency (IDF) is the same for
simplicity. Then, the weights are:

“I love cats and dogs.” -> [0.0, 0.0, 0.0, 0.0, 0.0,
0.0] “I love only cats.” -> [0.0, 0.0, 0.0, 0.0, 0.0,
0.0]

Since each word occurs in both sentences, their

IDF values are zero, making the TF-IDF weights
for each word also zero.

[Link] 14/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Finally, we compute the cosine similarity between

the two vectors using the formula:

cosine_similarity = (v1 . v2) / (||v1|| * ||v2||)

where v1 and v2 are the vector representations of

the sentences, and ‘.’ denotes the dot product of
two vectors. ||v1|| and ||v2|| are the Euclidean
norms of the two vectors.

Using the vector representations and the formula

above, the cosine similarity between the two
sentences is:

The dot product of the vectors [1, 1, 1, 1, 1, 1, 0] and

[1, 1, 1, 0, 0, 1, 1] is:

11 + 11 + 11 + 10 + 10 + 11 + 0*1 = 4

The magnitude (or Euclidean length) of the first

vector [1, 1, 1, 1, 1, 1, 0] is:
sqrt(1^2 + 1^2 + 1^2 + 1^2 + 1^2 + 1^2 + 0^2) =
sqrt(6) -> 2.44

Similarly, the magnitude for the second vector [1,

1, 1, 0, 0, 1, 1] is:
sqrt(1^2 + 1^2 + 1^2 + 0^2 + 0^2 + 1^2 + 1^2) =
sqrt(5) -> 2.23

Therefore, the cosine similarity between the two

sentences is:

cosine_similarity = 4 / (2.44 * 2.23) => 4 / 5.4412

= 0.74 (approx)
[Link] 15/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

This indicates that the two sentences are

somewhat similar but not very similar.

Evaluation Metrics

Let’s now check how well our approach is working.

I got this particular text from this link.
Following is the text.

Weather is the day-to-day or hour-to-hour change

in the atmosphere. Weather includes wind,
lightning, storms, hurricanes, tornadoes (also
known as twisters), rain, hail, snow, and lots more.
Energy from the Sun affects the weather too.
Climate tells us what kinds of weather usually
happen in an area at different times of the year.
Changes in weather can affect our mood and life.
We wear different clothes and do different things
in different weather conditions. We choose
different foods in different seasons.

Weather stations around the world measure

different parts of the weather. Ways to measure
weather are wind speed, wind direction,
temperature and humidity. People try to use these
measurements to make weather forecasts for the
future. These people are scientists that are called
meteorologists. They use computers to build large
mathematical models to follow weather trends.

How can we check the accuracy of the above text’s

summary when we generate one? So, one way is to

[Link] 16/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

use human evaluation as the ground truth. In this

approach, we can generate summaries using each
method (frequency-based, TF-IDF), and then ask
human evaluators to rate the quality of each
summary based on different criteria such as
coherence, readability, and relevance to the
original text. We can then calculate the average
score for each method based on the ratings given
by the evaluators. This will give us a quantitative
measure of the performance of each method.

Another approach is to use ROUGE (Recall-

Oriented Understudy for Gisting Evaluation), which
is a commonly used metric for evaluating text
summarization models. ROUGE measures the
overlap between the generated and reference
summaries (i.e., the ground truth).

Let’s first go with the human evaluation method.

We got the following summary(5 sentences) as the

output using the frequency-based approach.

We wear different clothes and do different things

in different weather conditions.
Weather stations around the world measure
different parts of the weather.
Climate tells us what kinds of weather usually
happen in an area at different times of the year.
Weather includes wind, lightning, storms,
hurricanes, tornadoes (also known as twisters),
rain, hail, snow, and lots more.
[Link] 17/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Wind speed, direction, temperature, and humidity

are ways to measure weather.

We got the following summary(5 sentences) as the

output using the TF-IDF approach.

Energy from the Sun affects the weather too.

Changes in weather can affect our mood and life.
We wear different clothes and do different things
in different weather conditions.
Weather stations around the world measure
different parts of the weather.
People try to use these measurements to make
weather forecasts for the future.

The average rating human evaluators rated the

frequency-based approach as ⅘ and the TF-IDF
approach as ⅗

So, as per human evaluation, the frequency-based

approach works better.

Now, let’s see how the machine evaluates.

Let’s see the evaluation using ROUGE. The

following has a reference summary, which is
human-generated, and we will check how well the
artificially generated summary is as compared to
the human-generated summary.

# in case it's not installed onto your system.

! pip install rouge

import rouge

[Link] 18/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

from rouge import Rouge

# a defined function called evaluate_rouge taking two
arguments,
# one being reference text and the other summary text,
# and uses the ROUGE metric to evaluate the quality of the
summary text compared to the reference text.
# The function uses the rouge library to compute the ROUGE
scores and returns the F1 score of the ROUGE-1 metric.
def evaluate_rouge(reference_text, summary_text):
rouge = Rouge()
scores = rouge.get_scores(reference_text, summary_text)
return scores[0]['rouge-1']['f']

# the following is a human generated summary

reference_summary = '''
Weather is a gradual slow change through days and hours in
the atmosphere and can vary from wind to snow.
Climate tells a lot about the weather in an area.
The livelihood of people changes according to the change
in weather.
Weather stations measure different parts of weather.
People who use measurements to make weather forecasts for
the future are called meteorologists, and are
scientists.'''

# the sample text from Wikipedia

[Link] 19/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Ways to measure weather are wind speed, wind direction,

temperature and humidity.
People try to use these measurements to make weather
forecasts for the future.
These people are scientists that are called
meteorologists.
They use computers to build large mathematical models to
follow weather trends.'''

# Generate summary using frequency-based/TF-IDF approach

summary = generate_summary(text, 5)

# Evaluate the summary using ROUGE

rouge_score = evaluate_rouge(reference_summary, summary)

print(f"ROUGE score: {rouge_score}")

# For frequency based approach we are getting a score of

0.336
# For TF-IDF approach we are getting a score of 0.465

Here, a reference summary and a text are defined.

Then, a summary is generated from the text using
the frequency-based approach and then the tf-idf
approach. Next, the ROUGE score of the
generated summary is evaluated against the
reference summary using the evaluate_rouge()
function. The ROUGE score measures the
similarity between the generated and reference
summaries. The higher the ROUGE score, the
more similar the two summaries are.

Now, here for the frequency-based approach, we

get a score of 0.336; using the TF-IDF approach,
we get a score of 0.465. So, in this evaluation
method, the TF-IDF approach works better.

[Link] 20/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Comparison of Extractive and Abstractive

Text Summarization

Future Outlook of Text Summarization

The future of this particular field finds its way on
the higher steps of the technology ladder as every
day, new techniques and ways are being explored
by the R&D teams. The use of machine learning
and NLP will gradually improve the quality and
accuracy of the summaries that will be generated.

This field also includes the usage of deep learning

models, such as recurrent neural networks and
transformers, hence leading to a better
understanding of what exactly the text is about.
Additionally, more advancements in language
generation techniques will lead to the
development of more sophisticated abstractive
summarization methods.

Ultimately the advanced solutions would help us

save time, increase productivity, and make
information more accessible and easily digestible.

Conclusion

[Link] 21/22
10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Text summarization is a fast-growing field in

natural language processing, and it has the
potential to revolutionize the way we consume and
process information. In this article, we covered

Extractive summarization techniques select and

combine existing sentences from a text to
create a summary. In contrast, abstractive
techniques generate new sentences while
keeping the essence of the original text intact.
Extractive summarization has advantages over
abstractive summarization, where some of them
have higher accuracy, lower computational
complexity, and better preservation of factual
information.
Abstractive summarization has advantages over
extractive summarization, including the ability
to create more concise and coherent
summaries and also the potential to capture
the overall meaning of a text.
Text summarization has many real-world
applications, including journalism, finance,
healthcare, and the legal industry.
As the amount of digital information grows, text
summarization will become an essential tool for
efficient processing and making sense of large
volumes of text.

[Link] 22/22

A Graph Based Approach On Extractive Summarization
No ratings yet
A Graph Based Approach On Extractive Summarization
9 pages
An Extractive Approach For English Text
No ratings yet
An Extractive Approach For English Text
11 pages
Advances in Text Summarization Techniques
No ratings yet
Advances in Text Summarization Techniques
7 pages
Paper A Survey On ETS
No ratings yet
Paper A Survey On ETS
6 pages
Deep Learning Powered Text Summarization Framework For Creating A Highly Accurate Summary
No ratings yet
Deep Learning Powered Text Summarization Framework For Creating A Highly Accurate Summary
19 pages
Sample Research
No ratings yet
Sample Research
29 pages
Text Summarisation and Document Understanding Report
No ratings yet
Text Summarisation and Document Understanding Report
50 pages
Text Summarisation Method in NLP
No ratings yet
Text Summarisation Method in NLP
38 pages
22mca025 22mca032 22mca034
No ratings yet
22mca025 22mca032 22mca034
14 pages
Extractive Text Summarization Method
No ratings yet
Extractive Text Summarization Method
3 pages
An Overview of Extractive Based Automati
No ratings yet
An Overview of Extractive Based Automati
12 pages
Text Summarizer Using NLP (Natural Language Processing) : © JUL 2022 - IRE Journals - Volume 6 Issue 1 - ISSN: 2456-8880
No ratings yet
Text Summarizer Using NLP (Natural Language Processing) : © JUL 2022 - IRE Journals - Volume 6 Issue 1 - ISSN: 2456-8880
6 pages
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
No ratings yet
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
8 pages
NLP Text Summarization Techniques
No ratings yet
NLP Text Summarization Techniques
21 pages
Analysis of Abstractive and Extractive Summarizati
No ratings yet
Analysis of Abstractive and Extractive Summarizati
11 pages
Sma U-4
No ratings yet
Sma U-4
25 pages
Abstractive Summarization Insights
No ratings yet
Abstractive Summarization Insights
38 pages
Unravel News: An Efficient Summarization Approach: Ankan Saha Abdullah Al Shafi
No ratings yet
Unravel News: An Efficient Summarization Approach: Ankan Saha Abdullah Al Shafi
6 pages
Research Paper 7
No ratings yet
Research Paper 7
8 pages
Summarization of Unstructured Text Data Methodology and Pre Processing Approach IJERTV14IS010028
No ratings yet
Summarization of Unstructured Text Data Methodology and Pre Processing Approach IJERTV14IS010028
5 pages
Text Summarization
No ratings yet
Text Summarization
38 pages
IEEE Conference Template 3 PDF
No ratings yet
IEEE Conference Template 3 PDF
4 pages
Overview of Text Summarization Techniques
No ratings yet
Overview of Text Summarization Techniques
6 pages
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
No ratings yet
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
29 pages
Extractive Text Summarization Using Word Vector Embedding
No ratings yet
Extractive Text Summarization Using Word Vector Embedding
5 pages
Conceptual Framework For Abstractive Text Summarization
No ratings yet
Conceptual Framework For Abstractive Text Summarization
11 pages
Assessing Sentence Scoring Techniques Fo
No ratings yet
Assessing Sentence Scoring Techniques Fo
10 pages
Project Final Presentation
No ratings yet
Project Final Presentation
30 pages
1 Extractive Text Summarization Technique Based Fuzzy Membership Calculation Using Roughsets
No ratings yet
1 Extractive Text Summarization Technique Based Fuzzy Membership Calculation Using Roughsets
15 pages
Automatic Text Document Summarization Based On Machine Learning
No ratings yet
Automatic Text Document Summarization Based On Machine Learning
4 pages
Automatic Text Summarization in Python
No ratings yet
Automatic Text Summarization in Python
8 pages
Summerization Presentation
No ratings yet
Summerization Presentation
9 pages
Automatic Summarisation II: Methods
No ratings yet
Automatic Summarisation II: Methods
84 pages
Rare Words in Text Summarization
No ratings yet
Rare Words in Text Summarization
11 pages
Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
5 pages
Types of Extractive Methods
No ratings yet
Types of Extractive Methods
22 pages
A Hybrid Approach For Text Summarization Using Semantic Latent Dirichlet Allocation and Sentence Concept Mapping With Transformer
No ratings yet
A Hybrid Approach For Text Summarization Using Semantic Latent Dirichlet Allocation and Sentence Concept Mapping With Transformer
10 pages
Moawad 2012
No ratings yet
Moawad 2012
7 pages
Automatic Text Summarization Using Natural Language Processing PDF
No ratings yet
Automatic Text Summarization Using Natural Language Processing PDF
54 pages
Automatic Text Summarization Techniques
No ratings yet
Automatic Text Summarization Techniques
54 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
198 pages
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
No ratings yet
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
13 pages
Ir Practical 10
No ratings yet
Ir Practical 10
3 pages
150 Poster
No ratings yet
150 Poster
1 page
Extractive Text Summarization Project
No ratings yet
Extractive Text Summarization Project
8 pages
Tsreport
No ratings yet
Tsreport
25 pages
Synopsis Creation For Research Paper Using Text Summarization Models
No ratings yet
Synopsis Creation For Research Paper Using Text Summarization Models
5 pages
Text Summarisation Method in NLP
No ratings yet
Text Summarisation Method in NLP
13 pages
Extractive Arabic Text Summarization-Graph-Based Approach
No ratings yet
Extractive Arabic Text Summarization-Graph-Based Approach
17 pages
Hybrid Summarization for Scientific Texts
No ratings yet
Hybrid Summarization for Scientific Texts
11 pages
Malayalam 2
No ratings yet
Malayalam 2
4 pages
Text Summarization
No ratings yet
Text Summarization
3 pages
NLP Text Summarization Techniques
No ratings yet
NLP Text Summarization Techniques
17 pages
Research Final
No ratings yet
Research Final
6 pages
(Group-12) NLP Project File
No ratings yet
(Group-12) NLP Project File
23 pages
ATSSI Abstractive Text Summarization Using Sentiment Infusion
No ratings yet
ATSSI Abstractive Text Summarization Using Sentiment Infusion
7 pages
Arabic Text Summarization
No ratings yet
Arabic Text Summarization
3 pages
The Impact of Rule-Based Text Generation On The Quality of Abstractive Summaries
No ratings yet
The Impact of Rule-Based Text Generation On The Quality of Abstractive Summaries
10 pages
Multimodal Dialogue System Seamless Sign Language To Text and Speech Translation
No ratings yet
Multimodal Dialogue System Seamless Sign Language To Text and Speech Translation
1 page
NLP Techniques and Language Modeling
No ratings yet
NLP Techniques and Language Modeling
18 pages
Cae Questions Paper
No ratings yet
Cae Questions Paper
14 pages
Natural Language Processing Exam Questions
No ratings yet
Natural Language Processing Exam Questions
2 pages
CVD Lab Manual
No ratings yet
CVD Lab Manual
33 pages
Prathamesh Ghatole: Experience
No ratings yet
Prathamesh Ghatole: Experience
2 pages
Cat in The Rain
No ratings yet
Cat in The Rain
3 pages
Suncare Trends for Skincare Innovators
No ratings yet
Suncare Trends for Skincare Innovators
19 pages
Midnight Sun and Earth's Axial Tilt
No ratings yet
Midnight Sun and Earth's Axial Tilt
11 pages
Narrative Past Tenses Explained
No ratings yet
Narrative Past Tenses Explained
2 pages
Climate Responsive Architecture Overview
No ratings yet
Climate Responsive Architecture Overview
40 pages
Effect of Global Warming
100% (1)
Effect of Global Warming
6 pages
Morning Broadcast Script: Resource Links
No ratings yet
Morning Broadcast Script: Resource Links
2 pages
Astm C511 21
No ratings yet
Astm C511 21
3 pages
4.1 Forecasts and Generation Expansion Plan (PC 4) of The Grid Code - pdf-180919111727080
No ratings yet
4.1 Forecasts and Generation Expansion Plan (PC 4) of The Grid Code - pdf-180919111727080
3 pages
Exam Unit 10 y 11 Family and Friends 2
No ratings yet
Exam Unit 10 y 11 Family and Friends 2
5 pages
White Christmas Script - ACT 1
100% (2)
White Christmas Script - ACT 1
82 pages
Romeo and Juliet: Sehuencas Frog Love Story
No ratings yet
Romeo and Juliet: Sehuencas Frog Love Story
2 pages
Climate VS Weather Grades5 8
No ratings yet
Climate VS Weather Grades5 8
35 pages
Confusing Words 2 Grammar Guides Reading Comprehension Exercises Sen - 87337
No ratings yet
Confusing Words 2 Grammar Guides Reading Comprehension Exercises Sen - 87337
2 pages
Geography of Arid and Semi-Arid Areas
No ratings yet
Geography of Arid and Semi-Arid Areas
9 pages
Pursuit of The Truth #Chapter 232 - He Saw It - Read Pursuit of The Truth Chapter 232 - He Saw It
No ratings yet
Pursuit of The Truth #Chapter 232 - He Saw It - Read Pursuit of The Truth Chapter 232 - He Saw It
10 pages
Vernacular and Tropical Architecture of Cambodia
No ratings yet
Vernacular and Tropical Architecture of Cambodia
10 pages
Tunnel
No ratings yet
Tunnel
36 pages
Mastering English Verb Tenses Quiz
No ratings yet
Mastering English Verb Tenses Quiz
5 pages
Nonconformity in Historical Context
No ratings yet
Nonconformity in Historical Context
2 pages
Airport Performance Measurement
No ratings yet
Airport Performance Measurement
20 pages
2.4 TFT Arduino Weather Station With Multiple Sensors - 7 Steps - Instructables
No ratings yet
2.4 TFT Arduino Weather Station With Multiple Sensors - 7 Steps - Instructables
11 pages
Understanding Insolation and Its Impact
No ratings yet
Understanding Insolation and Its Impact
5 pages
HF Inventory Form
No ratings yet
HF Inventory Form
5 pages
MR Roy Geo 2025 Grad 11
No ratings yet
MR Roy Geo 2025 Grad 11
17 pages
The Magnus Archives: MAG091 Summary
No ratings yet
The Magnus Archives: MAG091 Summary
16 pages
Saturn
No ratings yet
Saturn
3 pages
The Indolence of The Filipino With Quiz
No ratings yet
The Indolence of The Filipino With Quiz
3 pages
PSC1 - Limitless Light (5e)
No ratings yet
PSC1 - Limitless Light (5e)
41 pages
Latest Pilot Job
No ratings yet
Latest Pilot Job
12 pages

Exploring The Extractive Method of Text Summarization

Uploaded by

Exploring The Extractive Method of Text Summarization

Uploaded by

10/3/23, 3:19 PM Exploring the Extractive Method of Text Summarization

Exploring the Extractive

extractive and abstractive, and examine their

In this article, you will:

1. Understand the different categories of text

Types of Text Summarization

Let’s dive a little deeper into each of the above-

Now, the question that comes is, exactly on what

There are various ways through which the ranking

The main motive of the extractive method is to

Again, how exactly is the summary generated in

This method uses advanced NLP techniques such

Note that, here in this article, we’ll only deal with

Understanding with Code

But, before that, let’s quickly understand it with a

Here, we will use a Python library called NLTK

Let’s take a look at the following code that

# import the required libraries

from [Link] import sent_tokenize, word_tokenize #

# this function would take 2 inputs, one being the text,

# Tokenize each sentence into individual words and remove

them all to lowercase

# Compute the frequency of each word

# Compute the score for each sentence based on the

# empty dictionary to store the scores for each sentence

for sentence in sentences:

# checks if the length of the sentence_words list is less

# Select the top n sentences with the highest scores

Using a Sample Text From Wikipedia to Generate

The following output is what we would be getting

We wear different clothes and do different things

What’s happening in the above code?

Then the score for each sentence is computed

In the next section, we will explore how the

advanced techniques such as TF-IDF.

# importing the required libraries

# importing TfidfVectorizer class to convert a collection

# importing cosine_similarity function to compute the

# importing nlargest to return the n largest elements from

def generate_summary(text, n):

# Create the TF-IDF matrix

# Compute the cosine similarity between each sentence and

# Select the top n sentences with the highest scores

summary_tfidf = ' '.join([sentences[i] for i in

Using a Sample Text to Check the Summary

The following output is what we would be getting

Energy from the Sun affects the weather too.

Weather stations around the world measure

The above code generates a summary for a given

Okay, before moving further, let’s quickly

So, the cosine similarity considers the angle

We have two sentences.

1. “I love cats and dogs.”

We first need to convert each sentence into a

1. “I love cats and dogs.” -> [1, 1, 1, 1, 0, 0]

How are we getting the vector representation? We

“I love cats and dogs.” -> [‘I’, ‘love’, ‘cats’, ‘and’,

2. Now, Create a vocabulary of unique words from

‘and’ is present, hence 1

Next, we compute the TF-IDF weights for each

Since each word occurs in both sentences, their

Finally, we compute the cosine similarity between

cosine_similarity = (v1 . v2) / (||v1|| * ||v2||)

where v1 and v2 are the vector representations of

Using the vector representations and the formula

The dot product of the vectors [1, 1, 1, 1, 1, 1, 0] and

1*1 + 1*1 + 1*1 + 1*0 + 1*0 + 1*1 + 0*1 = 4

The magnitude (or Euclidean length) of the first

Similarly, the magnitude for the second vector [1,

Therefore, the cosine similarity between the two

cosine_similarity = 4 / (2.44 * 2.23) => 4 / 5.4412

This indicates that the two sentences are

Let’s now check how well our approach is working.

Weather is the day-to-day or hour-to-hour change

Weather stations around the world measure

How can we check the accuracy of the above text’s

11 + 11 + 11 + 10 + 10 + 11 + 0*1 = 4