Text Summarization using HuggingFace Model

Text summarization using models from Hugging Face allows developers to automatically generate concise summaries from long pieces of text. By using pretrained transformer models, it becomes easy to build applications that can extract key information and present it in a shorter, meaningful form. It has 2 components:

Extractive : Selects important sentences directly from the text
Abstractive : Generates new sentences that capture the same meaning

Implementation of Text Summarisation

Step 1: Set Up the Environment

First, install the required libraries. Run the following command in your command prompt.
This installs the Transformers library

pip install transformers torch

Step 2: Import Required Classes

T5Tokenizer: Converts text into numerical tokens that the model can process
T5ForConditionalGeneration: Generates new text based on the input

Python

from transformers import T5Tokenizer, T5ForConditionalGeneration

Step 3: Load Pre-trained Model and Tokenizer

T5 (Text-to-Text Transfer Transformer) handles every NLP task as a text transformation, it takes text as input and generates text as output.
For summarization, it reads the full content and produces a shorter version. The model downloads automatically on first use.

Python

model_name = "t5-small"

tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

Output:

hugging_face_2 — Loading pretrained model

Step 4: Prepare Input Text

T5 works in an instruction based manner, so adding summarize tells the model what task to perform. Without this prefix, it won’t clearly know that it needs to generate a summary.

Python

text = """
Artificial Intelligence is transforming industries across the globe.
From healthcare to finance, AI systems are automating processes,
analyzing data and improving decision making.
Organizations are investing heavily in AI research,
though ethical and privacy challenges remain.
"""

input_text = "summarize: " + text

Step 5: Tokenize the Input

Tokenization converts the text into numeric IDs the model can understand. Since the model processes numbers not raw words this step transforms the text into a format suitable for computation.

Python

inputs = tokenizer.encode(
    input_text,
    return_tensors="pt",
    max_length=512,
    truncation=True
)

Step 6: Generate the Summary

max_length: Sets the maximum summary length
min_length: Avoids very short summaries
num_beams=4: Uses beam search to improve output quality
length_penalty: Maintains a balance between key details and conciseness.
early_stopping=True: Stops when the best result is found

Python

summary_ids = model.generate(
    inputs,
    max_length=60,
    min_length=20,
    length_penalty=2.0,
    num_beams=4,
    early_stopping=True
)

Step 7: Decoding the Output

The model generates numeric token IDs and decoding converts them back into readable text so you can see the final summary.

Python

summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print("Summary:", summary)

Output:

Summary: AI systems are automating processes, analyzing data, and improving decision-making. organizations investing heavily in AI research, though ethical and privacy challenges remain.

You can download source code from here.

Text Summarization using HuggingFace Model

Implementation of Text Summarisation

Step 1: Set Up the Environment

Step 2: Import Required Classes

Step 3: Load Pre-trained Model and Tokenizer

Step 4: Prepare Input Text

Step 5: Tokenize the Input

Step 6: Generate the Summary

Step 7: Decoding the Output

Explore