0% found this document useful (0 votes)
35 views19 pages

NLP Lab Manual

The document outlines various NLP tasks implemented using Python libraries such as NLTK and spaCy, including tokenization, stop word removal, stemming, lemmatization, POS tagging, TF-IDF, dependency parsing, semantic analysis, chatbot creation, sentiment analysis, and language identification. Each task includes an aim, procedure, program code, output, and result indicating successful implementation. The document serves as a comprehensive guide for applying NLP techniques in practical scenarios.

Uploaded by

abiaravind1525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

NLP Lab Manual

The document outlines various NLP tasks implemented using Python libraries such as NLTK and spaCy, including tokenization, stop word removal, stemming, lemmatization, POS tagging, TF-IDF, dependency parsing, semantic analysis, chatbot creation, sentiment analysis, and language identification. Each task includes an aim, procedure, program code, output, and result indicating successful implementation. The document serves as a comprehensive guide for applying NLP techniques in practical scenarios.

Uploaded by

abiaravind1525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

EX.

NO:01 SENTENCE AND WORD TOKENIZATION USING NLTK

AIM:

To download the NLTK package and use it to tokenize text into sentences and words.

PROCEDURE:

1. Install the NLTK library using pip install nltk.

2. Import the necessary modules from NLTK.

3. Download required datasets (punkt tokenizer).

4. Use sent_tokenize() to split text into sentences.

5. Use word_tokenize() to split text into words.

PROGRAM:

# Step 1: Install and import NLTK


import nltk
nltk.download('punkt') # Punkt tokenizer

from nltk.tokenize import sent_tokenize, word_tokenize

# Step 2: Input text


text = "Anna University offers a wide range of technical courses. NLP is one of the key
subjects."

# Step 3: Sentence Tokenization


sentences = sent_tokenize(text)
print("Sentence Tokenization:")
print(sentences)

# Step 4: Word Tokenization


words = word_tokenize(text)
print("\nWord Tokenization:")
print(words)

OUTPUT:

Sentence Tokenization:
['Anna University offers a wide range of technical courses.', 'NLP is one of the key subjects.']
Word Tokenization:
['Anna', 'University', 'offers', 'a', 'wide', 'range', 'of', 'technical', 'courses', '.', 'NLP', 'is', 'one', 'of',
'the', 'key', 'subjects', '.']

RESULT:

The tokenization of text into words and sentences is successfully done using NLTK packages.

EX.NO:02 STOP WORD REMOVAL INCLUDING CUSTOM STOP WORDS USING SPACY
AIM:

To include custom stop words and remove them along with standard stop words from a given
document using the NLTK or spaCy package.

PROCEDURE:

1. Install and import spaCy (or NLTK if needed).

2. Load the English language model.

3. Define a list of custom stop words.

4. Process the input text and filter out both standard and custom stop words.

PROGRAM:

# Step 1: Install and load spaCy


# pip install spacy
import spacy
from spacy.lang.en.stop_words import STOP_WORDS

# Step 2: Load English model


nlp = spacy.load("en_core_web_sm")

# Step 3: Define input text


text = "Machine learning is a key component of modern artificial intelligence systems."

# Step 4: Define custom stop words


custom_stop_words = {"component", "modern"}

# Step 5: Combine standard and custom stop words


all_stop_words = STOP_WORDS.union(custom_stop_words)

# Step 6: Tokenize and remove stop words


doc = nlp(text)
filtered_tokens = [token.text for token in doc if token.text.lower() not in all_stop_words]

print("Filtered Tokens:")
print(filtered_tokens)

OUTPUT:
Filtered Tokens:
['Machine', 'learning', 'key', 'artificial', 'intelligence', 'systems', '.']

RESULT:

Thus the removal of custom stop words including standard stop words has been implemented
successfully.

EX.NO:03 IMPLEMENTATION OF STEMMING AND LEMMATIZATION USING NLTK


AIM:

To implement a stemmer and a lemmatizer program using NLTK.

PROCEDURE:

1. Install and import NLTK.

2. Download required datasets for lemmatization.

3. Define sample text input.

4. Apply PorterStemmer for stemming and WordNetLemmatizer for lemmatization.

5. Print the results.

PROGRAM:

# Step 1: Import libraries and download data


import nltk
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import word_tokenize

nltk.download('punkt')
nltk.download('wordnet')

# Step 2: Define text


text = "The children are playing in the gardens and enjoying their activities."

# Step 3: Tokenize
words = word_tokenize(text)

# Step 4: Initialize stemmer and lemmatizer


stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Step 5: Apply and display results


print("Stemming:")
for word in words:
print(f"{word} -> {stemmer.stem(word)}")

print("\nLemmatization:")
for word in words:
print(f"{word} -> {lemmatizer.lemmatize(word)}")

OUTPUT:

Stemming:
The -> the
children -> children
are -> are
playing -> play
in -> in
the -> the
gardens -> garden
and -> and
enjoying -> enjoy
their -> their
activities -> activ

Lemmatization:
The -> The
children -> child
are -> are
playing -> playing
in -> in
the -> the
gardens -> garden
and -> and
enjoying -> enjoying
their -> their
activities -> activity

RESULT:

Thus the stemmer and lemmatizer program have been successfully implemented using NLTK
Packages.

EX.NO:04 PART-OF-SPEECH (POS) TAGGING USING NLTK


AIM:

To implement a simple Part-of-Speech (POS) tagger using NLTK.

PROCEDURE:

1. Install and import NLTK.

2. Download the necessary POS tagging resources.

3. Tokenize the input sentence into words.

4. Apply the POS tagger using nltk.pos_tag().

5. Print the tagged output.

PROGRAM:

# Step 1: Import and download NLTK resources


import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Step 2: Input text


text = "The quick brown fox jumps over the lazy dog."

# Step 3: Tokenize text


words = nltk.word_tokenize(text)

# Step 4: POS tagging


pos_tags = nltk.pos_tag(words)

# Step 5: Print the result


print("Part-of-Speech Tags:")
for word, tag in pos_tags:
print(f"{word} -> {tag}")

OUTPUT:
Part-of-Speech Tags:
The -> DT
quick -> JJ
brown -> NN
fox -> NN
jumps -> VBZ
over -> IN
the -> DT
lazy -> JJ
dog -> NN
. -> .

RESULT:

Thus the simple part of speech tagger has been successfully implemented.

EX.NO:05 TF-IDF AND COSINE SIMILARITY


AIM:

To write a program that calculates TF-IDF of documents and finds cosine similarity between any
two documents.

PROCEDURE:

1. Install sklearn if not already installed.

2. Define two or more documents.

3. Use TfidfVectorizer to convert the text into TF-IDF vectors.

4. Calculate cosine similarity using cosine_similarity from sklearn.

5. Display the similarity scores.

PROGRAM:

from sklearn.feature_extraction.text import TfidfVectorizer


from sklearn.metrics.pairwise import cosine_similarity
# Step 1: Define documents
doc1 = "Natural Language Processing enables computers to understand human language."
doc2 = "NLP allows machines to interpret and respond to human text or speech."
# Step 2: Vectorize using TF-IDF
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform([doc1, doc2])
# Step 3: Compute Cosine Similarity
cos_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
# Step 4: Output
print(f"Cosine Similarity between documents: {cos_sim[0][0]:.4f}")

OUTPUT:

Cosine Similarity between documents: 0.5273

RESULT:

Thus the calculation of TF-IDF and cosine similarity between two documents have been
successfully implemented.

EX.NO:06 DEPENDENCY PARSING OF SENTENCES USING SPACY


AIM:

To use the NLTK library to implement a dependency parser.

Note: NLTK does not have a full-fledged dependency parser, so we’ll use spaCy, which offers
built-in dependency parsing capabilities.

PROCEDURE:

1. Install and import spaCy.

2. Load the English language model.

3. Provide an input sentence.

4. Use spaCy’s doc object to access tokens and their syntactic dependencies.

5. Print token relationships: token → head → dependency tag.

PROGRAM:

# Step 1: Import spaCy


import spacy

# Step 2: Load the English language model


nlp = spacy.load("en_core_web_sm")

# Step 3: Define input sentence


sentence = "The professor explained the topic clearly to the students."

# Step 4: Parse and display dependencies


doc = nlp(sentence)

print("Dependency Parsing Output:")


for token in doc:
print(f"{token.text:<12} -> Head: {token.head.text:<12} | Dep: {token.dep_}")

OUTPUT:
Dependency Parsing Output:
The -> Head: professor | Dep: det
professor -> Head: explained | Dep: nsubj
explained -> Head: explained | Dep: ROOT
the -> Head: topic | Dep: det
topic -> Head: explained | Dep: dobj
clearly -> Head: explained | Dep: advmod
to -> Head: students | Dep: prep
the -> Head: students | Dep: det
students -> Head: explained | Dep: pobj
. -> Head: explained | Dep: punct

RESULT:

Thus the dependency parsing of sentences has been successfully implemented.

EX.NO:07 SEMANTIC ANALYSIS USING WORDNET IN NLTK


AIM:

To implement a semantic language processor that uses WordNet for semantic tagging.

PROCEDURE:

1. Install and import NLTK.

2. Download WordNet and word tokenizers.

3. Use wordnet.synsets() to extract meanings (synsets).

4. Print definitions and examples to understand semantic tags.

PROGRAM:

# Step 1: Import and download WordNet


import nltk
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize

nltk.download('wordnet')
nltk.download('punkt')

# Step 2: Define input text


text = "Students learn different subjects at school."

# Step 3: Tokenize and get WordNet synsets


tokens = word_tokenize(text)

print("Semantic Tags using WordNet:\n")


for word in tokens:
synsets = wn.synsets(word)
if synsets:
print(f"{word}: {synsets[0].definition()} — Example: {synsets[0].examples()}")
else:
print(f"{word}: No semantic tag found.")

OUTPUT:
Semantic Tags using WordNet:

Students: A learned person (especially in the humanities) — Example: ['a great scholar']
learn: Gain knowledge or skills — Example: ['She learned dancing from her sister']
different: Unlike in nature or quality or form or degree — Example: ['took different approaches to
the problem']
subjects: A branch of knowledge — Example: ['in what subject are you interested?']
at: No semantic tag found.
school: An educational institution — Example: ['the school was founded in 1900']
. : No semantic tag found.

RESULT:

Thus the implementation of semantic language processor that uses the WordNET has been
completed successfully.

EX.NO:08 RULE-BASED CHATBOT USING NLTK TECHNIQUES


AIM:

To carry out a project in pairs using NLP concepts and apply them to some data, such as
building a POS tagger, chatbot, summarizer, or comparison tool.

PROCEDURE:

1. Tokenize and clean the user input using NLTK or spaCy.

2. Define a set of possible input patterns and rule-based responses using regex.

3. Use a loop to continuously receive user input and generate appropriate responses.

4. Exit when the user types "bye".

PROGRAM:

import re
import random

# Step 1: Define input patterns and responses


responses = {
r"hi|hello|hey": ["Hello!", "Hi there!", "Hey! How can I help you?"],
r"how are you": ["I'm doing well, thank you!", "Great! What can I do for you today?"],
r"what is your name": ["I'm a simple chatbot created using Python."],
r"bye": ["Goodbye!", "See you later!"],
}

def chatbot_response(user_input):
user_input = user_input.lower()
for pattern, replies in responses.items():
if re.search(pattern, user_input):
return random.choice(replies)
return "Sorry, I don't understand that."

# Step 2: Chat loop


print("Chatbot: Hello! Type 'bye' to exit.")
while True:
user_input = input("You: ")
if user_input.lower() == "bye":
print("Chatbot: Goodbye!")
break
response = chatbot_response(user_input)
print(f"Chatbot: {response}")
OUTPUT:

Chatbot: Hello! Type 'bye' to exit.


You: Hello
Chatbot: Hi there!
You: What is your name?
Chatbot: I'm a simple chatbot created using Python.
You: Bye
Chatbot: Goodbye!

RESULT:

Thus all the NLP concepts such as POS tagger, chatbot, summarizer or comparison tool has
been implemented successfully.

EX.NO:09 SENTIMENT ANALYSIS OF PRODUCT REVIEWS USING TEXTBLOB


AIM:

To perform sentiment analysis on product reviews using a pretrained NLP model.

PROCEDURE:

1. Install TextBlob or use VADER for sentiment analysis.

2. Prepare or input a few product reviews.

3. Analyze the sentiment polarity and subjectivity for each.

4. Display if each review is positive, negative, or neutral.

PROGRAM:

# Step 1: Install textblob


# pip install textblob

from textblob import TextBlob

# Step 2: Define sample product reviews


reviews = [
"This phone has excellent battery life and a beautiful display.",
"The product stopped working after two days. Very disappointed.",
"It's okay. Nothing great but not bad either."
]

# Step 3: Analyze sentiment


print("Sentiment Analysis Results:\n")
for review in reviews:
analysis = TextBlob(review)
polarity = analysis.sentiment.polarity
sentiment = "Positive" if polarity > 0 else "Negative" if polarity < 0 else "Neutral"
print(f"Review: {review}\nPolarity: {polarity:.2f} | Sentiment: {sentiment}\n")

OUTPUT:
Sentiment Analysis Results:

Review: This phone has excellent battery life and a beautiful display.
Polarity: 0.80 | Sentiment: Positive

Review: The product stopped working after two days. Very disappointed.
Polarity: -0.60 | Sentiment: Negative

Review: It's okay. Nothing great but not bad either.


Polarity: 0.05 | Sentiment: Positive

RESULT:

Thus the sentiment analysis on product reviews has been implemented successfully.

EX.NO:10 LANGUAGE IDENTIFIER USING NLP TECHNIQUES


AIM:

To implement a language identifier that detects the language of a given text using the
langdetect package.

PROCEDURE:

1. Install the langdetect library.

2. Input a set of sentences in different languages.

3. Use the detect() function from langdetect to identify the language.

4. Print out the language code for each sentence.

5. Optionally, map language codes to full language names.

PROGRAM:

# Step 1: Install langdetect


# pip install langdetect

from langdetect import detect

# Step 2: Define a list of sentences in different languages


sentences = [
"Hello, how are you?",
"Bonjour, comment ça va?",
"Hola, ¿cómo estás?",
"Guten Tag, wie geht es dir?",
"नमस्ते, आप कै से हैं?",
"你好,你怎么样?"
]

# Step 3: Detect and display language


print("Language Detection Results:\n")
for sentence in sentences:
lang = detect(sentence)
print(f"Text: {sentence}\nDetected Language Code: {lang}\n")

OUTPUT:
Language Detection Results:

Text: Hello, how are you?


Detected Language Code: en

Text: Bonjour, comment ça va?


Detected Language Code: fr

Text: Hola, ¿cómo estás?


Detected Language Code: es

Text: Guten Tag, wie geht es dir?


Detected Language Code: de

Text: नमस्ते, आप कै से हैं?


Detected Language Code: hi

Text: 你好,你怎么样?
Detected Language Code: zh-cn

RESULT:

Thus the language detection using NLP techniques has been implemented successfully.

You might also like