EX.
NO : 2 STEMMING & LAMMATIZATION
DATE :
AIM :
To understand and implement stemming and lemmatization using Python for
preprocessing textual data in NLP tasks.
ALGORITHM:
1. Import Required Libraries: Import nltk for stemming and lemmatization
tasks.
2. Download Necessary Resources: Download resources like wordnet for
lemmatization.
3. Define Example Text: Create a list of words or sentences to test stemming
and lemmatization.
4. Initialize Tools:
o Use PorterStemmer for stemming.
o Use WordNetLemmatizer for lemmatization.
5. Perform Stemming: Apply the stemmer to the words and observe how
suffixes are removed to generate root forms.
6. Perform Lemmatization: Apply the lemmatizer to reduce words to their
dictionary base forms (lemmas), optionally providing part-of-speech (POS)
tags for better accuracy.
7. Compare Results: Observe the differences between stemming and
lemmatization in terms of their output and linguistic correctness.
PROGRAM:
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')
words = ["running", "runs", "easily", "better", "studies"]
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()
print("Stemming Results:")
for word in words:
print(f"{word} -> {stemmer.stem(word)}")
print("\nLemmatization Results:")
for word in words:
print(f"{word} -> {lemmatizer.lemmatize(word)}")
print("\nLemmatization with POS tagging:")
for word in words:
pos_tag = "v" if word.endswith("ing") or word.endswith("s") else "a"
print(f"{word} -> {lemmatizer.lemmatize(word, pos=pos_tag)}")
OUTPUT :
RESULT:
Thus, stemming and lemmatization have been successfully implemented
using Python, demonstrating the difference between rule-based reduction
(stemming) and context-aware base form extraction (lemmatization).