Text Summarization using HuggingFace Model

Last Updated : 8 Apr, 2026

Text summarization using models from Hugging Face allows developers to automatically generate concise summaries from long pieces of text. By using pretrained transformer models, it becomes easy to build applications that can extract key information and present it in a shorter, meaningful form. It has 2 components:

  • Extractive : Selects important sentences directly from the text
  • Abstractive : Generates new sentences that capture the same meaning

Implementation of Text Summarisation

Step 1: Set Up the Environment

  • First, install the required libraries. Run the following command in your command prompt.
  • This installs the Transformers library

pip install transformers torch

Step 2: Import Required Classes

  • T5Tokenizer: Converts text into numerical tokens that the model can process
  • T5ForConditionalGeneration: Generates new text based on the input
Python
from transformers import T5Tokenizer, T5ForConditionalGeneration

Step 3: Load Pre-trained Model and Tokenizer

  • T5 (Text-to-Text Transfer Transformer) handles every NLP task as a text transformation, it takes text as input and generates text as output.
  • For summarization, it reads the full content and produces a shorter version. The model downloads automatically on first use.
Python
model_name = "t5-small"

tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

Output:

hugging_face_2
Loading pretrained model

Step 4: Prepare Input Text

T5 works in an instruction based manner, so adding summarize tells the model what task to perform. Without this prefix, it won’t clearly know that it needs to generate a summary.

Python
text = """
Artificial Intelligence is transforming industries across the globe.
From healthcare to finance, AI systems are automating processes,
analyzing data and improving decision making.
Organizations are investing heavily in AI research,
though ethical and privacy challenges remain.
"""

input_text = "summarize: " + text

Step 5: Tokenize the Input

Tokenization converts the text into numeric IDs the model can understand. Since the model processes numbers not raw words this step transforms the text into a format suitable for computation.

Python
inputs = tokenizer.encode(
    input_text,
    return_tensors="pt",
    max_length=512,
    truncation=True
)

Step 6: Generate the Summary

  • max_length: Sets the maximum summary length
  • min_length: Avoids very short summaries
  • num_beams=4: Uses beam search to improve output quality
  • length_penalty: Maintains a balance between key details and conciseness.
  • early_stopping=True: Stops when the best result is found
Python
summary_ids = model.generate(
    inputs,
    max_length=60,
    min_length=20,
    length_penalty=2.0,
    num_beams=4,
    early_stopping=True
)

Step 7: Decoding the Output

The model generates numeric token IDs and decoding converts them back into readable text so you can see the final summary.

Python
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print("Summary:", summary)

Output:

Summary: AI systems are automating processes, analyzing data, and improving decision-making. organizations investing heavily in AI research, though ethical and privacy challenges remain.

You can download source code from here.

Comment

Explore