EX.
NO:01 SENTENCE AND WORD TOKENIZATION USING NLTK
AIM:
To download the NLTK package and use it to tokenize text into sentences and words.
PROCEDURE:
1. Install the NLTK library using pip install nltk.
2. Import the necessary modules from NLTK.
3. Download required datasets (punkt tokenizer).
4. Use sent_tokenize() to split text into sentences.
5. Use word_tokenize() to split text into words.
PROGRAM:
# Step 1: Install and import NLTK
import nltk
nltk.download('punkt') # Punkt tokenizer
from nltk.tokenize import sent_tokenize, word_tokenize
# Step 2: Input text
text = "Anna University offers a wide range of technical courses. NLP is one of the key
subjects."
# Step 3: Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentence Tokenization:")
print(sentences)
# Step 4: Word Tokenization
words = word_tokenize(text)
print("\nWord Tokenization:")
print(words)
OUTPUT:
Sentence Tokenization:
['Anna University offers a wide range of technical courses.', 'NLP is one of the key subjects.']
Word Tokenization:
['Anna', 'University', 'offers', 'a', 'wide', 'range', 'of', 'technical', 'courses', '.', 'NLP', 'is', 'one', 'of',
'the', 'key', 'subjects', '.']
RESULT:
The tokenization of text into words and sentences is successfully done using NLTK packages.
EX.NO:02 STOP WORD REMOVAL INCLUDING CUSTOM STOP WORDS USING SPACY
AIM:
To include custom stop words and remove them along with standard stop words from a given
document using the NLTK or spaCy package.
PROCEDURE:
1. Install and import spaCy (or NLTK if needed).
2. Load the English language model.
3. Define a list of custom stop words.
4. Process the input text and filter out both standard and custom stop words.
PROGRAM:
# Step 1: Install and load spaCy
# pip install spacy
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
# Step 2: Load English model
nlp = spacy.load("en_core_web_sm")
# Step 3: Define input text
text = "Machine learning is a key component of modern artificial intelligence systems."
# Step 4: Define custom stop words
custom_stop_words = {"component", "modern"}
# Step 5: Combine standard and custom stop words
all_stop_words = STOP_WORDS.union(custom_stop_words)
# Step 6: Tokenize and remove stop words
doc = nlp(text)
filtered_tokens = [token.text for token in doc if token.text.lower() not in all_stop_words]
print("Filtered Tokens:")
print(filtered_tokens)
OUTPUT:
Filtered Tokens:
['Machine', 'learning', 'key', 'artificial', 'intelligence', 'systems', '.']
RESULT:
Thus the removal of custom stop words including standard stop words has been implemented
successfully.
EX.NO:03 IMPLEMENTATION OF STEMMING AND LEMMATIZATION USING NLTK
AIM:
To implement a stemmer and a lemmatizer program using NLTK.
PROCEDURE:
1. Install and import NLTK.
2. Download required datasets for lemmatization.
3. Define sample text input.
4. Apply PorterStemmer for stemming and WordNetLemmatizer for lemmatization.
5. Print the results.
PROGRAM:
# Step 1: Import libraries and download data
import nltk
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import word_tokenize
nltk.download('punkt')
nltk.download('wordnet')
# Step 2: Define text
text = "The children are playing in the gardens and enjoying their activities."
# Step 3: Tokenize
words = word_tokenize(text)
# Step 4: Initialize stemmer and lemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()
# Step 5: Apply and display results
print("Stemming:")
for word in words:
print(f"{word} -> {stemmer.stem(word)}")
print("\nLemmatization:")
for word in words:
print(f"{word} -> {lemmatizer.lemmatize(word)}")
OUTPUT:
Stemming:
The -> the
children -> children
are -> are
playing -> play
in -> in
the -> the
gardens -> garden
and -> and
enjoying -> enjoy
their -> their
activities -> activ
Lemmatization:
The -> The
children -> child
are -> are
playing -> playing
in -> in
the -> the
gardens -> garden
and -> and
enjoying -> enjoying
their -> their
activities -> activity
RESULT:
Thus the stemmer and lemmatizer program have been successfully implemented using NLTK
Packages.
EX.NO:04 PART-OF-SPEECH (POS) TAGGING USING NLTK
AIM:
To implement a simple Part-of-Speech (POS) tagger using NLTK.
PROCEDURE:
1. Install and import NLTK.
2. Download the necessary POS tagging resources.
3. Tokenize the input sentence into words.
4. Apply the POS tagger using nltk.pos_tag().
5. Print the tagged output.
PROGRAM:
# Step 1: Import and download NLTK resources
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
# Step 2: Input text
text = "The quick brown fox jumps over the lazy dog."
# Step 3: Tokenize text
words = nltk.word_tokenize(text)
# Step 4: POS tagging
pos_tags = nltk.pos_tag(words)
# Step 5: Print the result
print("Part-of-Speech Tags:")
for word, tag in pos_tags:
print(f"{word} -> {tag}")
OUTPUT:
Part-of-Speech Tags:
The -> DT
quick -> JJ
brown -> NN
fox -> NN
jumps -> VBZ
over -> IN
the -> DT
lazy -> JJ
dog -> NN
. -> .
RESULT:
Thus the simple part of speech tagger has been successfully implemented.
EX.NO:05 TF-IDF AND COSINE SIMILARITY
AIM:
To write a program that calculates TF-IDF of documents and finds cosine similarity between any
two documents.
PROCEDURE:
1. Install sklearn if not already installed.
2. Define two or more documents.
3. Use TfidfVectorizer to convert the text into TF-IDF vectors.
4. Calculate cosine similarity using cosine_similarity from sklearn.
5. Display the similarity scores.
PROGRAM:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Step 1: Define documents
doc1 = "Natural Language Processing enables computers to understand human language."
doc2 = "NLP allows machines to interpret and respond to human text or speech."
# Step 2: Vectorize using TF-IDF
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform([doc1, doc2])
# Step 3: Compute Cosine Similarity
cos_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
# Step 4: Output
print(f"Cosine Similarity between documents: {cos_sim[0][0]:.4f}")
OUTPUT:
Cosine Similarity between documents: 0.5273
RESULT:
Thus the calculation of TF-IDF and cosine similarity between two documents have been
successfully implemented.
EX.NO:06 DEPENDENCY PARSING OF SENTENCES USING SPACY
AIM:
To use the NLTK library to implement a dependency parser.
Note: NLTK does not have a full-fledged dependency parser, so we’ll use spaCy, which offers
built-in dependency parsing capabilities.
PROCEDURE:
1. Install and import spaCy.
2. Load the English language model.
3. Provide an input sentence.
4. Use spaCy’s doc object to access tokens and their syntactic dependencies.
5. Print token relationships: token → head → dependency tag.
PROGRAM:
# Step 1: Import spaCy
import spacy
# Step 2: Load the English language model
nlp = spacy.load("en_core_web_sm")
# Step 3: Define input sentence
sentence = "The professor explained the topic clearly to the students."
# Step 4: Parse and display dependencies
doc = nlp(sentence)
print("Dependency Parsing Output:")
for token in doc:
print(f"{token.text:<12} -> Head: {token.head.text:<12} | Dep: {token.dep_}")
OUTPUT:
Dependency Parsing Output:
The -> Head: professor | Dep: det
professor -> Head: explained | Dep: nsubj
explained -> Head: explained | Dep: ROOT
the -> Head: topic | Dep: det
topic -> Head: explained | Dep: dobj
clearly -> Head: explained | Dep: advmod
to -> Head: students | Dep: prep
the -> Head: students | Dep: det
students -> Head: explained | Dep: pobj
. -> Head: explained | Dep: punct
RESULT:
Thus the dependency parsing of sentences has been successfully implemented.
EX.NO:07 SEMANTIC ANALYSIS USING WORDNET IN NLTK
AIM:
To implement a semantic language processor that uses WordNet for semantic tagging.
PROCEDURE:
1. Install and import NLTK.
2. Download WordNet and word tokenizers.
3. Use wordnet.synsets() to extract meanings (synsets).
4. Print definitions and examples to understand semantic tags.
PROGRAM:
# Step 1: Import and download WordNet
import nltk
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize
nltk.download('wordnet')
nltk.download('punkt')
# Step 2: Define input text
text = "Students learn different subjects at school."
# Step 3: Tokenize and get WordNet synsets
tokens = word_tokenize(text)
print("Semantic Tags using WordNet:\n")
for word in tokens:
synsets = wn.synsets(word)
if synsets:
print(f"{word}: {synsets[0].definition()} — Example: {synsets[0].examples()}")
else:
print(f"{word}: No semantic tag found.")
OUTPUT:
Semantic Tags using WordNet:
Students: A learned person (especially in the humanities) — Example: ['a great scholar']
learn: Gain knowledge or skills — Example: ['She learned dancing from her sister']
different: Unlike in nature or quality or form or degree — Example: ['took different approaches to
the problem']
subjects: A branch of knowledge — Example: ['in what subject are you interested?']
at: No semantic tag found.
school: An educational institution — Example: ['the school was founded in 1900']
. : No semantic tag found.
RESULT:
Thus the implementation of semantic language processor that uses the WordNET has been
completed successfully.
EX.NO:08 RULE-BASED CHATBOT USING NLTK TECHNIQUES
AIM:
To carry out a project in pairs using NLP concepts and apply them to some data, such as
building a POS tagger, chatbot, summarizer, or comparison tool.
PROCEDURE:
1. Tokenize and clean the user input using NLTK or spaCy.
2. Define a set of possible input patterns and rule-based responses using regex.
3. Use a loop to continuously receive user input and generate appropriate responses.
4. Exit when the user types "bye".
PROGRAM:
import re
import random
# Step 1: Define input patterns and responses
responses = {
r"hi|hello|hey": ["Hello!", "Hi there!", "Hey! How can I help you?"],
r"how are you": ["I'm doing well, thank you!", "Great! What can I do for you today?"],
r"what is your name": ["I'm a simple chatbot created using Python."],
r"bye": ["Goodbye!", "See you later!"],
}
def chatbot_response(user_input):
user_input = user_input.lower()
for pattern, replies in responses.items():
if re.search(pattern, user_input):
return random.choice(replies)
return "Sorry, I don't understand that."
# Step 2: Chat loop
print("Chatbot: Hello! Type 'bye' to exit.")
while True:
user_input = input("You: ")
if user_input.lower() == "bye":
print("Chatbot: Goodbye!")
break
response = chatbot_response(user_input)
print(f"Chatbot: {response}")
OUTPUT:
Chatbot: Hello! Type 'bye' to exit.
You: Hello
Chatbot: Hi there!
You: What is your name?
Chatbot: I'm a simple chatbot created using Python.
You: Bye
Chatbot: Goodbye!
RESULT:
Thus all the NLP concepts such as POS tagger, chatbot, summarizer or comparison tool has
been implemented successfully.
EX.NO:09 SENTIMENT ANALYSIS OF PRODUCT REVIEWS USING TEXTBLOB
AIM:
To perform sentiment analysis on product reviews using a pretrained NLP model.
PROCEDURE:
1. Install TextBlob or use VADER for sentiment analysis.
2. Prepare or input a few product reviews.
3. Analyze the sentiment polarity and subjectivity for each.
4. Display if each review is positive, negative, or neutral.
PROGRAM:
# Step 1: Install textblob
# pip install textblob
from textblob import TextBlob
# Step 2: Define sample product reviews
reviews = [
"This phone has excellent battery life and a beautiful display.",
"The product stopped working after two days. Very disappointed.",
"It's okay. Nothing great but not bad either."
]
# Step 3: Analyze sentiment
print("Sentiment Analysis Results:\n")
for review in reviews:
analysis = TextBlob(review)
polarity = analysis.sentiment.polarity
sentiment = "Positive" if polarity > 0 else "Negative" if polarity < 0 else "Neutral"
print(f"Review: {review}\nPolarity: {polarity:.2f} | Sentiment: {sentiment}\n")
OUTPUT:
Sentiment Analysis Results:
Review: This phone has excellent battery life and a beautiful display.
Polarity: 0.80 | Sentiment: Positive
Review: The product stopped working after two days. Very disappointed.
Polarity: -0.60 | Sentiment: Negative
Review: It's okay. Nothing great but not bad either.
Polarity: 0.05 | Sentiment: Positive
RESULT:
Thus the sentiment analysis on product reviews has been implemented successfully.
EX.NO:10 LANGUAGE IDENTIFIER USING NLP TECHNIQUES
AIM:
To implement a language identifier that detects the language of a given text using the
langdetect package.
PROCEDURE:
1. Install the langdetect library.
2. Input a set of sentences in different languages.
3. Use the detect() function from langdetect to identify the language.
4. Print out the language code for each sentence.
5. Optionally, map language codes to full language names.
PROGRAM:
# Step 1: Install langdetect
# pip install langdetect
from langdetect import detect
# Step 2: Define a list of sentences in different languages
sentences = [
"Hello, how are you?",
"Bonjour, comment ça va?",
"Hola, ¿cómo estás?",
"Guten Tag, wie geht es dir?",
"नमस्ते, आप कै से हैं?",
"你好,你怎么样?"
]
# Step 3: Detect and display language
print("Language Detection Results:\n")
for sentence in sentences:
lang = detect(sentence)
print(f"Text: {sentence}\nDetected Language Code: {lang}\n")
OUTPUT:
Language Detection Results:
Text: Hello, how are you?
Detected Language Code: en
Text: Bonjour, comment ça va?
Detected Language Code: fr
Text: Hola, ¿cómo estás?
Detected Language Code: es
Text: Guten Tag, wie geht es dir?
Detected Language Code: de
Text: नमस्ते, आप कै से हैं?
Detected Language Code: hi
Text: 你好,你怎么样?
Detected Language Code: zh-cn
RESULT:
Thus the language detection using NLP techniques has been implemented successfully.