0% found this document useful (0 votes)

35 views19 pages

NLP Lab Manual

The document outlines various NLP tasks implemented using Python libraries such as NLTK and spaCy, including tokenization, stop word removal, stemming, lemmatization, POS tagging, TF-IDF, dependency parsing, semantic analysis, chatbot creation, sentiment analysis, and language identification. Each task includes an aim, procedure, program code, output, and result indicating successful implementation. The document serves as a comprehensive guide for applying NLP techniques in practical scenarios.

Uploaded by

abiaravind1525

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views19 pages

NLP Lab Manual

Uploaded by

abiaravind1525

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

EX.

NO:01 SENTENCE AND WORD TOKENIZATION USING NLTK

AIM:

To download the NLTK package and use it to tokenize text into sentences and words.

PROCEDURE:

1. Install the NLTK library using pip install nltk.

2. Import the necessary modules from NLTK.

3. Download required datasets (punkt tokenizer).

4. Use sent_tokenize() to split text into sentences.

5. Use word_tokenize() to split text into words.

PROGRAM:

# Step 1: Install and import NLTK

import nltk
nltk.download('punkt') # Punkt tokenizer

from nltk.tokenize import sent_tokenize, word_tokenize

# Step 2: Input text

text = "Anna University offers a wide range of technical courses. NLP is one of the key
subjects."

# Step 3: Sentence Tokenization

sentences = sent_tokenize(text)
print("Sentence Tokenization:")
print(sentences)

# Step 4: Word Tokenization

words = word_tokenize(text)
print("\nWord Tokenization:")
print(words)

OUTPUT:

Sentence Tokenization:
['Anna University offers a wide range of technical courses.', 'NLP is one of the key subjects.']
Word Tokenization:
['Anna', 'University', 'offers', 'a', 'wide', 'range', 'of', 'technical', 'courses', '.', 'NLP', 'is', 'one', 'of',
'the', 'key', 'subjects', '.']

RESULT:

The tokenization of text into words and sentences is successfully done using NLTK packages.

EX.NO:02 STOP WORD REMOVAL INCLUDING CUSTOM STOP WORDS USING SPACY
AIM:

To include custom stop words and remove them along with standard stop words from a given
document using the NLTK or spaCy package.

PROCEDURE:

1. Install and import spaCy (or NLTK if needed).

2. Load the English language model.

3. Define a list of custom stop words.

4. Process the input text and filter out both standard and custom stop words.

PROGRAM:

# Step 1: Install and load spaCy

# pip install spacy
import spacy
from spacy.lang.en.stop_words import STOP_WORDS

# Step 2: Load English model

nlp = spacy.load("en_core_web_sm")

# Step 3: Define input text

text = "Machine learning is a key component of modern artificial intelligence systems."

# Step 4: Define custom stop words

custom_stop_words = {"component", "modern"}

# Step 5: Combine standard and custom stop words

all_stop_words = STOP_WORDS.union(custom_stop_words)

# Step 6: Tokenize and remove stop words

doc = nlp(text)
filtered_tokens = [token.text for token in doc if token.text.lower() not in all_stop_words]

print("Filtered Tokens:")
print(filtered_tokens)

OUTPUT:
Filtered Tokens:
['Machine', 'learning', 'key', 'artificial', 'intelligence', 'systems', '.']

RESULT:

Thus the removal of custom stop words including standard stop words has been implemented
successfully.

EX.NO:03 IMPLEMENTATION OF STEMMING AND LEMMATIZATION USING NLTK

AIM:

To implement a stemmer and a lemmatizer program using NLTK.

PROCEDURE:

1. Install and import NLTK.

2. Download required datasets for lemmatization.

3. Define sample text input.

4. Apply PorterStemmer for stemming and WordNetLemmatizer for lemmatization.

5. Print the results.

PROGRAM:

# Step 1: Import libraries and download data

import nltk
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import word_tokenize

nltk.download('punkt')
nltk.download('wordnet')

# Step 2: Define text

text = "The children are playing in the gardens and enjoying their activities."

# Step 3: Tokenize
words = word_tokenize(text)

# Step 4: Initialize stemmer and lemmatizer

stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Step 5: Apply and display results

print("Stemming:")
for word in words:
print(f"{word} -> {stemmer.stem(word)}")

print("\nLemmatization:")
for word in words:
print(f"{word} -> {lemmatizer.lemmatize(word)}")

OUTPUT:

Stemming:
The -> the
children -> children
are -> are
playing -> play
in -> in
the -> the
gardens -> garden
and -> and
enjoying -> enjoy
their -> their
activities -> activ

Lemmatization:
The -> The
children -> child
are -> are
playing -> playing
in -> in
the -> the
gardens -> garden
and -> and
enjoying -> enjoying
their -> their
activities -> activity

RESULT:

Thus the stemmer and lemmatizer program have been successfully implemented using NLTK
Packages.

EX.NO:04 PART-OF-SPEECH (POS) TAGGING USING NLTK

AIM:

To implement a simple Part-of-Speech (POS) tagger using NLTK.

PROCEDURE:

1. Install and import NLTK.

2. Download the necessary POS tagging resources.

3. Tokenize the input sentence into words.

4. Apply the POS tagger using nltk.pos_tag().

5. Print the tagged output.

PROGRAM:

# Step 1: Import and download NLTK resources

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Step 2: Input text

text = "The quick brown fox jumps over the lazy dog."

# Step 3: Tokenize text

words = nltk.word_tokenize(text)

# Step 4: POS tagging

pos_tags = nltk.pos_tag(words)

# Step 5: Print the result

print("Part-of-Speech Tags:")
for word, tag in pos_tags:
print(f"{word} -> {tag}")

OUTPUT:
Part-of-Speech Tags:
The -> DT
quick -> JJ
brown -> NN
fox -> NN
jumps -> VBZ
over -> IN
the -> DT
lazy -> JJ
dog -> NN
. -> .

RESULT:

Thus the simple part of speech tagger has been successfully implemented.

EX.NO:05 TF-IDF AND COSINE SIMILARITY

AIM:

To write a program that calculates TF-IDF of documents and finds cosine similarity between any
two documents.

PROCEDURE:

1. Install sklearn if not already installed.

2. Define two or more documents.

3. Use TfidfVectorizer to convert the text into TF-IDF vectors.

4. Calculate cosine similarity using cosine_similarity from sklearn.

5. Display the similarity scores.

PROGRAM:

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import cosine_similarity
# Step 1: Define documents
doc1 = "Natural Language Processing enables computers to understand human language."
doc2 = "NLP allows machines to interpret and respond to human text or speech."
# Step 2: Vectorize using TF-IDF
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform([doc1, doc2])
# Step 3: Compute Cosine Similarity
cos_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
# Step 4: Output
print(f"Cosine Similarity between documents: {cos_sim[0][0]:.4f}")

OUTPUT:

Cosine Similarity between documents: 0.5273

RESULT:

Thus the calculation of TF-IDF and cosine similarity between two documents have been
successfully implemented.

EX.NO:06 DEPENDENCY PARSING OF SENTENCES USING SPACY

AIM:

To use the NLTK library to implement a dependency parser.

Note: NLTK does not have a full-fledged dependency parser, so we’ll use spaCy, which offers
built-in dependency parsing capabilities.

PROCEDURE:

1. Install and import spaCy.

2. Load the English language model.

3. Provide an input sentence.

4. Use spaCy’s doc object to access tokens and their syntactic dependencies.

5. Print token relationships: token → head → dependency tag.

PROGRAM:

# Step 1: Import spaCy

import spacy

# Step 2: Load the English language model

nlp = spacy.load("en_core_web_sm")

# Step 3: Define input sentence

sentence = "The professor explained the topic clearly to the students."

# Step 4: Parse and display dependencies

doc = nlp(sentence)

print("Dependency Parsing Output:")

for token in doc:
print(f"{token.text:<12} -> Head: {token.head.text:<12} | Dep: {token.dep_}")

RESULT:

Thus the dependency parsing of sentences has been successfully implemented.

EX.NO:07 SEMANTIC ANALYSIS USING WORDNET IN NLTK

AIM:

To implement a semantic language processor that uses WordNet for semantic tagging.

PROCEDURE:

1. Install and import NLTK.

2. Download WordNet and word tokenizers.

3. Use wordnet.synsets() to extract meanings (synsets).

4. Print definitions and examples to understand semantic tags.

PROGRAM:

# Step 1: Import and download WordNet

import nltk
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize

nltk.download('wordnet')
nltk.download('punkt')

# Step 2: Define input text

text = "Students learn different subjects at school."

# Step 3: Tokenize and get WordNet synsets

tokens = word_tokenize(text)

print("Semantic Tags using WordNet:\n")

for word in tokens:
synsets = wn.synsets(word)
if synsets:
print(f"{word}: {synsets[0].definition()} — Example: {synsets[0].examples()}")
else:
print(f"{word}: No semantic tag found.")

OUTPUT:
Semantic Tags using WordNet:

Students: A learned person (especially in the humanities) — Example: ['a great scholar']
learn: Gain knowledge or skills — Example: ['She learned dancing from her sister']
different: Unlike in nature or quality or form or degree — Example: ['took different approaches to
the problem']
subjects: A branch of knowledge — Example: ['in what subject are you interested?']
at: No semantic tag found.
school: An educational institution — Example: ['the school was founded in 1900']
. : No semantic tag found.

RESULT:

Thus the implementation of semantic language processor that uses the WordNET has been
completed successfully.

EX.NO:08 RULE-BASED CHATBOT USING NLTK TECHNIQUES

AIM:

To carry out a project in pairs using NLP concepts and apply them to some data, such as
building a POS tagger, chatbot, summarizer, or comparison tool.

PROCEDURE:

1. Tokenize and clean the user input using NLTK or spaCy.

2. Define a set of possible input patterns and rule-based responses using regex.

3. Use a loop to continuously receive user input and generate appropriate responses.

4. Exit when the user types "bye".

PROGRAM:

import re
import random

# Step 1: Define input patterns and responses

responses = {
r"hi|hello|hey": ["Hello!", "Hi there!", "Hey! How can I help you?"],
r"how are you": ["I'm doing well, thank you!", "Great! What can I do for you today?"],
r"what is your name": ["I'm a simple chatbot created using Python."],
r"bye": ["Goodbye!", "See you later!"],
}

def chatbot_response(user_input):
user_input = user_input.lower()
for pattern, replies in responses.items():
if re.search(pattern, user_input):
return random.choice(replies)
return "Sorry, I don't understand that."

# Step 2: Chat loop

print("Chatbot: Hello! Type 'bye' to exit.")
while True:
user_input = input("You: ")
if user_input.lower() == "bye":
print("Chatbot: Goodbye!")
break
response = chatbot_response(user_input)
print(f"Chatbot: {response}")
OUTPUT:

Chatbot: Hello! Type 'bye' to exit.

You: Hello
Chatbot: Hi there!
You: What is your name?
Chatbot: I'm a simple chatbot created using Python.
You: Bye
Chatbot: Goodbye!

RESULT:

Thus all the NLP concepts such as POS tagger, chatbot, summarizer or comparison tool has
been implemented successfully.

EX.NO:09 SENTIMENT ANALYSIS OF PRODUCT REVIEWS USING TEXTBLOB

AIM:

To perform sentiment analysis on product reviews using a pretrained NLP model.

PROCEDURE:

1. Install TextBlob or use VADER for sentiment analysis.

2. Prepare or input a few product reviews.

3. Analyze the sentiment polarity and subjectivity for each.

4. Display if each review is positive, negative, or neutral.

PROGRAM:

# Step 1: Install textblob

# pip install textblob

from textblob import TextBlob

# Step 2: Define sample product reviews

reviews = [
"This phone has excellent battery life and a beautiful display.",
"The product stopped working after two days. Very disappointed.",
"It's okay. Nothing great but not bad either."
]

# Step 3: Analyze sentiment

print("Sentiment Analysis Results:\n")
for review in reviews:
analysis = TextBlob(review)
polarity = analysis.sentiment.polarity
sentiment = "Positive" if polarity > 0 else "Negative" if polarity < 0 else "Neutral"
print(f"Review: {review}\nPolarity: {polarity:.2f} | Sentiment: {sentiment}\n")

OUTPUT:
Sentiment Analysis Results:

Review: This phone has excellent battery life and a beautiful display.
Polarity: 0.80 | Sentiment: Positive

Review: The product stopped working after two days. Very disappointed.
Polarity: -0.60 | Sentiment: Negative

Review: It's okay. Nothing great but not bad either.

Polarity: 0.05 | Sentiment: Positive

RESULT:

Thus the sentiment analysis on product reviews has been implemented successfully.

EX.NO:10 LANGUAGE IDENTIFIER USING NLP TECHNIQUES

AIM:

To implement a language identifier that detects the language of a given text using the
langdetect package.

PROCEDURE:

1. Install the langdetect library.

2. Input a set of sentences in different languages.

3. Use the detect() function from langdetect to identify the language.

4. Print out the language code for each sentence.

5. Optionally, map language codes to full language names.

PROGRAM:

# Step 1: Install langdetect

# pip install langdetect

from langdetect import detect

# Step 2: Define a list of sentences in different languages

sentences = [
"Hello, how are you?",
"Bonjour, comment ça va?",
"Hola, ¿cómo estás?",
"Guten Tag, wie geht es dir?",
"नमस्ते, आप कै से हैं?",
"你好，你怎么样？"
]

# Step 3: Detect and display language

print("Language Detection Results:\n")
for sentence in sentences:
lang = detect(sentence)
print(f"Text: {sentence}\nDetected Language Code: {lang}\n")

OUTPUT:
Language Detection Results:

Text: Hello, how are you?

Detected Language Code: en

Text: Bonjour, comment ça va?

Detected Language Code: fr

Text: Hola, ¿cómo estás?

Detected Language Code: es

Text: Guten Tag, wie geht es dir?

Detected Language Code: de

Text: नमस्ते, आप कै से हैं?

Detected Language Code: hi

Text: 你好，你怎么样？
Detected Language Code: zh-cn

RESULT:

Thus the language detection using NLP techniques has been implemented successfully.

NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
No ratings yet
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
42 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
NLP Lab
No ratings yet
NLP Lab
7 pages
NLP Lab
No ratings yet
NLP Lab
63 pages
Date: Practical No.4:: Foundation of AI and ML (4351601)
No ratings yet
Date: Practical No.4:: Foundation of AI and ML (4351601)
10 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
7 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
NLP 1
No ratings yet
NLP 1
6 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Module 1 Updated Final
No ratings yet
Module 1 Updated Final
45 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
NLP Lab 2
No ratings yet
NLP Lab 2
6 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
4 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
27 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP
No ratings yet
NLP
12 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP with NLTK in Python Guide
No ratings yet
NLP with NLTK in Python Guide
5 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
22 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
Python NLP Practical Exercises
No ratings yet
Python NLP Practical Exercises
14 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Python NLP Techniques Guide
No ratings yet
Python NLP Techniques Guide
18 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
NLP Lab Programms
No ratings yet
NLP Lab Programms
9 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
NLP Lab Manual for CSE Students
No ratings yet
NLP Lab Manual for CSE Students
28 pages
NLP Journl
No ratings yet
NLP Journl
15 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP - Record (Weeks 1-12)
No ratings yet
NLP - Record (Weeks 1-12)
41 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
LP Vi Manual
No ratings yet
LP Vi Manual
77 pages
NLP Applications and Text Preprocessing
No ratings yet
NLP Applications and Text Preprocessing
54 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
H7 W5 NLP - Merged
No ratings yet
H7 W5 NLP - Merged
17 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
UBC Summer Linguistics Course Overview
No ratings yet
UBC Summer Linguistics Course Overview
33 pages
Natural Language Processing With Python's NLTK Package - Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package - Real Python
27 pages
NLP Exp1
No ratings yet
NLP Exp1
4 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Pertemuan 3 - Preprocessing
No ratings yet
Pertemuan 3 - Preprocessing
25 pages
NLP PRGRM-1
No ratings yet
NLP PRGRM-1
7 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
NLP Tutorial with Python NLTK
No ratings yet
NLP Tutorial with Python NLTK
19 pages
NLP Lab Codes Till Mod3
No ratings yet
NLP Lab Codes Till Mod3
7 pages
Design Process and Perspectives M-1
No ratings yet
Design Process and Perspectives M-1
35 pages
IGCSE Physics Practical Exam
No ratings yet
IGCSE Physics Practical Exam
19 pages
Bootstrap and React
No ratings yet
Bootstrap and React
20 pages
CSR Legal Provisions in Nepal's IEA 2076
No ratings yet
CSR Legal Provisions in Nepal's IEA 2076
10 pages
Other Sample of Evaluation Report
No ratings yet
Other Sample of Evaluation Report
28 pages
Calculation Exchange Ratio
100% (1)
Calculation Exchange Ratio
10 pages
Forensic Blood Analysis Guide
No ratings yet
Forensic Blood Analysis Guide
27 pages
Piazza San Marco Estratto 17-45
No ratings yet
Piazza San Marco Estratto 17-45
29 pages
Guest Worker Programs and Circular Migration: What Works?
No ratings yet
Guest Worker Programs and Circular Migration: What Works?
17 pages
Honeywell International Stock Analysis Overview
No ratings yet
Honeywell International Stock Analysis Overview
8 pages
Illnesses from Poor Workplace Lighting
No ratings yet
Illnesses from Poor Workplace Lighting
198 pages
Case Study
100% (4)
Case Study
7 pages
EProcurement System Government of Uttar Pradesh NIT Sambhal
No ratings yet
EProcurement System Government of Uttar Pradesh NIT Sambhal
2 pages
Quarter 2 Module 1
100% (3)
Quarter 2 Module 1
23 pages
Philip Schultz Resume Final
No ratings yet
Philip Schultz Resume Final
1 page
10 Top Tips For Building A Successful Career
No ratings yet
10 Top Tips For Building A Successful Career
2 pages
2018 India Fuel Price Breakdown
No ratings yet
2018 India Fuel Price Breakdown
3 pages
IPL Match Statistics and Visualizations
No ratings yet
IPL Match Statistics and Visualizations
18 pages
WCA Audit Document Checklist
No ratings yet
WCA Audit Document Checklist
2 pages
Topic 14. Comparison
No ratings yet
Topic 14. Comparison
7 pages
Chief Officer Responsibilities Overview
100% (1)
Chief Officer Responsibilities Overview
5 pages
13 Hospitality Business Ideas
No ratings yet
13 Hospitality Business Ideas
1 page
Memorization: A Proven Method of Learning
No ratings yet
Memorization: A Proven Method of Learning
8 pages
UN Sustainable Development Law
No ratings yet
UN Sustainable Development Law
16 pages
Memoirs of Student in Manila By: P. Jacinto (A Pen Name of Jose Rizal)
No ratings yet
Memoirs of Student in Manila By: P. Jacinto (A Pen Name of Jose Rizal)
13 pages
l4FgLDfwsuEo2uIfHu9Nw 1
No ratings yet
l4FgLDfwsuEo2uIfHu9Nw 1
1 page
Modernism & Postmodernism
No ratings yet
Modernism & Postmodernism
15 pages
Sugar Color and Ash Relationship Study
No ratings yet
Sugar Color and Ash Relationship Study
8 pages
Presentation On Collective Investment Schemes
No ratings yet
Presentation On Collective Investment Schemes
64 pages
BOM Structure
0% (1)
BOM Structure
9 pages