0% found this document useful (0 votes)

14 views10 pages

Text Summarization in Python With SpaCy Library

Uploaded by

Rachel Alegado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views10 pages

Text Summarization in Python With SpaCy Library

Uploaded by

Rachel Alegado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Text Summarization In Python

With SpaCy Library

by Varsha Saini
According to a research paper by Anthony Cocciolo from Pratt Institute, Textual data
on the internet is decreasing gradually.

He told in the research paper as We may believe that online users are not interested
much in textual data anymore. Right from the year 1990, they have researched that
text content on websites is decreasing year by year.
More and more websites are using ways to provide content in smaller bits. These
smaller text bits could be used with Images, Videos, Infographics to convey messages
in a shorter context.

These facts give emphasis to the need for a process known as Text Summarization.
We will look into its definition, applications, and then we will build a Text
Summarization Python algorithm with the help of the spaCy library.

Table of Contents

What is Text Summarization?

As the above discussion might have already provided you with an image of a textual
summary. It is a technique to convert a long piece of content into a shorter one
without removing the actual context.

With the help of this technique, A summary of any text material could be generated.
It only removes text data which does not change the overall meaning of the content.

One practical example of it is with mobile application Inshorts, This application

provides 60 words News summary. It only contains Headlines and important facts
rather than varied opinions.

It has a greater scope of application in the scientific research field. As there are
research papers containing thousands of pages having important documentation. Text
Summarization could help scientists in focusing only on the key phrases from all that
data.

In earlier times it was manual work to produce a summary of textual content. With the
advancement in artificial intelligence and Natural Language Processing techniques it
is much easier to perform the task.
In this article, we will be using one such advanced Python library named spaCy. With
the help of spaCy library, it becomes very much easy to dig out important information
from tons of text data.

Types of Text Summarization

There are no fixed guidelines for categorization of the techniques that we use for it.
Although for performing tasks in an organized way they are generally be divided into
these following types:

1. Short Tail Summarization: In this type of summary the input content is very short
and precise. Even after having a short length it needs to be summarized in such a way
that it could be bounded further without any change in its meaning.

2. Long Tail Summarization: As you might have already grasped by the name. The
content here could be too long to be handled by a human being alone. It could contain
text data from thousands of pages and books at once.

3. Single Entity: When the input usually contains elements from just one source.

4. Multiple Entities: When the input contains elements from different document
sources. This is one of the most useful applications of this technique.

5. General Purpose: In this type of Text Summarization Python has no attribute for
the type of input is provided. The algorithm does not have a sense of the domain in
which the text deals. It is performed after multiple training of algorithms on various
types of textual content.

6. Domain-Specific: They are performed under a specific domain each time. For
example, a Text Summarization algorithm that summarizes the Food recipe in just a
few words. So it does have a domain of Food recipes and knows the context behind
the data.

7. Informative Summarization: This summary keeps all the information related to

the actual content. No change in meaning is seen in the output results.

8. Headlines Generation: News channels and applications use this type of summary.
Headlines are generated according to the text content of an article.

9. Keyword Extraction: Only the most important keywords and phrases are
extracted from the whole data. For example, extracting the phrases where some verbal
conversation is going on and leaving all the narrations behind.
These were just a few of the types and for further explanation, you can check the
article here:

We can create Python algorithms from any of the above-explained types with the help
of the spaCy library. There are basically two techniques to build the final Text
Summarization spaCy Model using Python language.

Below is the simple explanation for both of these techniques:

 Extractive Technique: In this technique, the important phrases from the

actual content are taken together to build a simple and short summary.
 Abstractive Technique: Builds a summary with new phrases and words
but keeps the original meaning alive.
Human beings generally use the Abstractive method to summarize something. We use
words that we are more familiar with but there is one problem with the summary
created by human beings. They never come up with a neutral one and the essence of
their opinion could be easily seen in the final output.

Machines are better at doing this task because they do not have their own opinion.
They just work on the pattern and training we have provided and take decisions with
Artificial Intelligence that they have gained.

Applications of Text Summarization

1. News: There are multiple applications of this technique in the field of News. It
includes creating an introduction, Generating headlines, Embedding captions on
pictures.

2. Scientific Research: Algorithms are used to dig out important information from
Scientific research papers. AI is outranking human beings in doing so.

3. Social Media Posting: Content on Social media is preferred to be concise.

Companies use this technique to convert long blog articles into shorter ones suited for
the audience.

4. Creating Study Notes: Many applications use this process to create student notes
from vast syllabus and content.

5. Conversation Summary: Long conversations and meeting recording could be first

converted into text and then important information could be fetched out of them.

6. Movie Plots and Reviews: The whole movie plot could be converted into bullet
points through this process.
7. Deliverable Feeds: They are the short piece of information derived from the
complete informative articles. These are generally delivered to people through emails
or feed delivery services.

8. Content Writing: Not from the scratch though but on providing a topic and points
an outlined summary could be generated.

Although there are hundreds of other applications we just limited them to a few main
topics.

Text Summarization in Python With

spaCy
We will be building some Python algorithms for performing the basics of automated
Text Summarization.

The spaCy library is our choice for doing so but you could go with any other Machine
Learning library of your choice.
We have already written an article on the complete implementation of the spaCy
library you can read it in our blog.

1. Importing the Library

import spacy

from [Link].stop_words import STOP_WORDS

from string import punctuation

Here [Link] contain English stopwords. Similarly, you can access stopwords
for any language using its extension such as ‘en‘ is for the English Language.

2. Getting data
extra_words=list(STOP_WORDS)+list(punctuation)+['\n']

nlp=[Link]('en')

doc = """Your Text Content Here"""docx = nlp(doc)

Getting Data for Text Summarization
[Link](‘en’) is used to load the object for the English language. extra_words is
created to hold all the stop words and punctuation.

We have used the text about NLP from Algorithmia but you could choose any text
material that you have.
3. Creating Vocabulary with spaCy
All the extra words are removed and the count of each other word is entered into the
dictionary.
all_words=[[Link] for word in docx]

Freq_word={}

for w in all_words:

w1=[Link]()

if w1 not in extra_words and [Link]():

if w1 in Freq_word.keys():

Freq_word[w1]+=1

else:

Freq_word[w1]=1

Output Screen:
Creating Vocabulary

4. Assigning a Title – Headline Generation

With the help of spaCy, we can actually find titles of the content that we have entered.
This way it could be implemented for Headline generation.
val=sorted(Freq_word.values())

max_freq=val[-3:]

print("Topic of document given :-")

for word,freq in Freq_word.items():

if freq in max_freq:

print(word ,end=" ")

else:

continue

Output Screen:
Headline Generation
We got “NLP Human Language Text” as our title which is quite near to our text. It is
not the best title but this is just the basic test of features provided by python library
spaCy.

5. TFIDF
Short form for Term Frequency – Inverse Document Frequency, It is used to represent
how important a given word is to a document on a complete collection relatively.

We can represent it with the code below:

for word in Freq_word.keys():

Freq_word[word] = (Freq_word[word]/max_freq[-1])

Tf – Idf
After getting the strength of each individual word we can have the strength of each
sentence. This way we will get to know the importance of each sentence so that the
sentences having no importance could be removed from the summary. Python Text
Summarization is one of the best practice to go with.

6. Sentence Strength
The sentence with the most important words will have much more importance. We
can find out this with the code below:
sent_strength={}

for sent in [Link]:

for word in sent :

if [Link]() in Freq_word.keys():

if sent in sent_strength.keys():

sent_strength[sent]+=Freq_word[[Link]()]

else:

sent_strength[sent]=Freq_word[[Link]()]

else:

continue

Output Screen:
Sentence Strength Calculation

7. Getting Important Sentences

We will now be sorting the sentences according to their strength and choosing only
according to the requirement. Sometimes the strength requirement is high for example
in controversial topics it is better to choose all the important factors.

The code to perform this would be:

top_sentences=(sorted(sent_strength.values())[::-1])

top30percent_sentence=int(0.3*len(top_sentences))

top_sent=top_sentences[:top30percent_sentence]

8. Creating the Final Summary

Here the final summary is being created using only the valuable information. All the
content which had less to none importance will be removed from the content.

We will be using the following code to perform this:

summary=[]

for sent,strength in sent_strength.items():

if strength in top_sent:
[Link](sent)

else:

continue

for i in summary:

print(i,end="")

Output Screen:

Text Summarization in Python

Here is our Text summary created in Python Language with the help of spaCy. If you
want to learn Python Programming then do check out some free Python courses listed
by us.
What are your words on this Summary?

It could be refined and made perfect with other spaCy filters that we will look at
some time in our next tutorials.

For the complete code, you can check our GitHub repository

Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
5 pages
Implementation of NLP Based Automatic Text Summarization Using Spacy
No ratings yet
Implementation of NLP Based Automatic Text Summarization Using Spacy
15 pages
Synopsis Creation For Research Paper Using Text Summarization Models
No ratings yet
Synopsis Creation For Research Paper Using Text Summarization Models
5 pages
DNLP ABL Project
No ratings yet
DNLP ABL Project
7 pages
Green Energy
No ratings yet
Green Energy
5 pages
Malayalam 2
No ratings yet
Malayalam 2
4 pages
Automatic Text Summarization in Python
No ratings yet
Automatic Text Summarization in Python
8 pages
Text Summerizer Synopsis-1
No ratings yet
Text Summerizer Synopsis-1
6 pages
11461-Article Text-20356-1-10-20211106
No ratings yet
11461-Article Text-20356-1-10-20211106
5 pages
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
No ratings yet
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
29 pages
Analysis On Text Summarization
No ratings yet
Analysis On Text Summarization
10 pages
AI Text Summarization Report
No ratings yet
AI Text Summarization Report
43 pages
NLP Miniproject
No ratings yet
NLP Miniproject
8 pages
State of The Art Text - Summarisation
No ratings yet
State of The Art Text - Summarisation
15 pages
OCR and Text Summarization Project Report
No ratings yet
OCR and Text Summarization Project Report
26 pages
EASESUM: An Online Abstractive and Extractive Text Summarizer Using Deep Learning Technique
No ratings yet
EASESUM: An Online Abstractive and Extractive Text Summarizer Using Deep Learning Technique
12 pages
Abstractive Text Summarization Using Transformer Based Approach
No ratings yet
Abstractive Text Summarization Using Transformer Based Approach
10 pages
Project File
No ratings yet
Project File
23 pages
Text Summarization
No ratings yet
Text Summarization
13 pages
A Review Paper On Extractive Techniques of Text Summarization
No ratings yet
A Review Paper On Extractive Techniques of Text Summarization
4 pages
150 Poster
No ratings yet
150 Poster
1 page
Text Summarization Using Word Frequency
No ratings yet
Text Summarization Using Word Frequency
3 pages
Text Summarization with NLP
No ratings yet
Text Summarization with NLP
14 pages
Text Summarization Using Python NLTK
No ratings yet
Text Summarization Using Python NLTK
8 pages
Technical Seminar Report-6607
No ratings yet
Technical Seminar Report-6607
11 pages
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
No ratings yet
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
13 pages
Text Summarization - Articles - Weights & Biases
No ratings yet
Text Summarization - Articles - Weights & Biases
16 pages
Abstractive Text Summarization Using Transformer Architecture
No ratings yet
Abstractive Text Summarization Using Transformer Architecture
5 pages
Research Paper 8
No ratings yet
Research Paper 8
4 pages
Python Text Summarization Techniques
No ratings yet
Python Text Summarization Techniques
18 pages
Proposing An Extractive Mono-Document Summarization System For Persian Language
No ratings yet
Proposing An Extractive Mono-Document Summarization System For Persian Language
8 pages
ATSSI Abstractive Text Summarization Using Sentiment Infusion
No ratings yet
ATSSI Abstractive Text Summarization Using Sentiment Infusion
7 pages
NLP-Based Text Summarization Project
67% (3)
NLP-Based Text Summarization Project
23 pages
Abstractive Text Summarizer A Comparative Study On Dot Product Attention and Cosine Similarity
No ratings yet
Abstractive Text Summarizer A Comparative Study On Dot Product Attention and Cosine Similarity
8 pages
Research Paper Summer Izer
No ratings yet
Research Paper Summer Izer
6 pages
Summarization Technique in Python Programming Language
No ratings yet
Summarization Technique in Python Programming Language
9 pages
TC6 PROJECT SYNOPSIS KrishShetty VedantLandge 231106 101402
No ratings yet
TC6 PROJECT SYNOPSIS KrishShetty VedantLandge 231106 101402
13 pages
Automatic Text Recognisation
No ratings yet
Automatic Text Recognisation
4 pages
Abstrating Wisdom: Text Summarization in The Age of Intelligence
No ratings yet
Abstrating Wisdom: Text Summarization in The Age of Intelligence
8 pages
Textlytic Research Paper
No ratings yet
Textlytic Research Paper
10 pages
Summarization of Unstructured Text Data Methodology and Pre Processing Approach IJERTV14IS010028
No ratings yet
Summarization of Unstructured Text Data Methodology and Pre Processing Approach IJERTV14IS010028
5 pages
Advances in Text Summarization Techniques
No ratings yet
Advances in Text Summarization Techniques
7 pages
A Survey of Advances in Text Summarization Methods
No ratings yet
A Survey of Advances in Text Summarization Methods
5 pages
Text Summarisation and Document Understanding Report
No ratings yet
Text Summarisation and Document Understanding Report
50 pages
Conceptual Framework For Abstractive Text Summarization
No ratings yet
Conceptual Framework For Abstractive Text Summarization
11 pages
A Hybrid Approach For Text Summarization Using Semantic Latent Dirichlet Allocation and Sentence Concept Mapping With Transformer
No ratings yet
A Hybrid Approach For Text Summarization Using Semantic Latent Dirichlet Allocation and Sentence Concept Mapping With Transformer
10 pages
Text Summarization
No ratings yet
Text Summarization
6 pages
Lect NLP 20
No ratings yet
Lect NLP 20
31 pages
PPR Confe (1) Docx
No ratings yet
PPR Confe (1) Docx
5 pages
Types of Extractive Methods
No ratings yet
Types of Extractive Methods
22 pages
NLP Applications: WSD, Summarization, OCR
No ratings yet
NLP Applications: WSD, Summarization, OCR
26 pages
(Group-12) NLP Project File
No ratings yet
(Group-12) NLP Project File
23 pages
21 Automatic Text Summarization
No ratings yet
21 Automatic Text Summarization
1 page
Summarization of Odia Text Document Using Cosine Similarity and Clustering
No ratings yet
Summarization of Odia Text Document Using Cosine Similarity and Clustering
4 pages
Automated Business Report Summarization Using Transformer Model
No ratings yet
Automated Business Report Summarization Using Transformer Model
5 pages
Paper 1
No ratings yet
Paper 1
23 pages
Automatic Text Summarization Using Natural Language Processing PDF
No ratings yet
Automatic Text Summarization Using Natural Language Processing PDF
54 pages
Automatic Text Summarization Techniques
No ratings yet
Automatic Text Summarization Techniques
54 pages
Data Science Course
No ratings yet
Data Science Course
2 pages
Sentiment Analysis of COVID-19 Vaccine Tweets
No ratings yet
Sentiment Analysis of COVID-19 Vaccine Tweets
25 pages
Cict Swot and Pestle
No ratings yet
Cict Swot and Pestle
2 pages
CAPAS Letter
No ratings yet
CAPAS Letter
1 page
Proof of Journal and Article Indexing
No ratings yet
Proof of Journal and Article Indexing
1 page
English: Quarter 2 - Module 7
100% (1)
English: Quarter 2 - Module 7
8 pages
English: Quarter 2 - Module 8: Subordinating Conjunctions
No ratings yet
English: Quarter 2 - Module 8: Subordinating Conjunctions
12 pages
English 5 Q2 Mod6 Types of Viewing Materials LSerrano
No ratings yet
English 5 Q2 Mod6 Types of Viewing Materials LSerrano
15 pages
Empowering ASEAN's Disabled Workforce
No ratings yet
Empowering ASEAN's Disabled Workforce
15 pages
Kleene's Theorem and Finite Automata Examples
No ratings yet
Kleene's Theorem and Finite Automata Examples
23 pages
Overview of String Functions in Java
No ratings yet
Overview of String Functions in Java
8 pages
Motivational Letter Samples
No ratings yet
Motivational Letter Samples
6 pages
Cse325 Os Laboratory Manual PDF
No ratings yet
Cse325 Os Laboratory Manual PDF
37 pages
Understanding Java Variables and Types
No ratings yet
Understanding Java Variables and Types
5 pages
Software Engineering Conversion
No ratings yet
Software Engineering Conversion
12 pages
Just Javascript
No ratings yet
Just Javascript
123 pages
Memory Safety
No ratings yet
Memory Safety
78 pages
Python Functions and Parameters
No ratings yet
Python Functions and Parameters
79 pages
Intro to Data Structures for Students
No ratings yet
Intro to Data Structures for Students
59 pages
OOP Group C1
No ratings yet
OOP Group C1
7 pages
100 Most Important C++ Programs (Code Only) PDF
75% (20)
100 Most Important C++ Programs (Code Only) PDF
238 pages
webMethods 7.1.2 SOAP Header Handlers Guide
No ratings yet
webMethods 7.1.2 SOAP Header Handlers Guide
26 pages
Visual C++
No ratings yet
Visual C++
6 pages
Unit-2 (OOAD)
100% (9)
Unit-2 (OOAD)
98 pages
Sigma 9.0 Curriculum @owlhuji
No ratings yet
Sigma 9.0 Curriculum @owlhuji
14 pages
Comprehensive Roadmap For AI, ML, DS, DA & DSA
No ratings yet
Comprehensive Roadmap For AI, ML, DS, DA & DSA
26 pages
Computer Graphics Lab Manual for OOP
No ratings yet
Computer Graphics Lab Manual for OOP
32 pages
Atul (Python)
No ratings yet
Atul (Python)
11 pages
1 - Advance Web Technologies
No ratings yet
1 - Advance Web Technologies
27 pages
Unearthing Vulnerabilities in The Apple Ecosystem The Art of KidFuzzerV2.0 Offensivecon 2023
No ratings yet
Unearthing Vulnerabilities in The Apple Ecosystem The Art of KidFuzzerV2.0 Offensivecon 2023
75 pages
Lab3 - SWT301
No ratings yet
Lab3 - SWT301
72 pages
Visual Basic
No ratings yet
Visual Basic
7 pages
Acp - Imp
No ratings yet
Acp - Imp
2 pages
Understanding Hive: Data Types & Tables
No ratings yet
Understanding Hive: Data Types & Tables
24 pages
Extra Credit Programming Assignment
No ratings yet
Extra Credit Programming Assignment
2 pages
Lab17 Fuzzing Windows Software
No ratings yet
Lab17 Fuzzing Windows Software
21 pages
Salesforce B2B Commerce Developer Q&A
No ratings yet
Salesforce B2B Commerce Developer Q&A
5 pages
Complete Course On Event-Driven Programming in VBA - Net HND I
No ratings yet
Complete Course On Event-Driven Programming in VBA - Net HND I
57 pages
CS3353 Unit5
No ratings yet
CS3353 Unit5
21 pages