0% found this document useful (0 votes)
31 views15 pages

NLP Lab File

Uploaded by

Bharat Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views15 pages

NLP Lab File

Uploaded by

Bharat Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

DELHI TECHNOLOGICAL

UNIVERSITY
SE-316
NATURAL LANGUAGE PROCESSING

Submitted by
Bharat Mishra
Roll Number: - 2K21/SE/54

Batch: - SE-A1

Submitted to: Geetanjali Garg

Department of Software Engineering


Delhi Technological University
Bawana Road, Delhi-110042
INDEX

S. No. Experiment Date

1. Import nltk and download the ‘stopwords’ and 13-01-2024


‘punkt’ packages

2. Import spacy and load the language model. 19-01-2024

3. WAP in python to tokenize a given text. 09-02-2024

4. WAP in python to get the sentences of a text 09-02-2024


document.

5. WAP in python to tokenize text with stopwords 23-02-2024


as delimiters.

6. WAP in python to add custom stop words in 05-03-2024


spaCy.

7. WAP to remove punctuations, perform 19-03-2024


stemming, lemmatize given text and extract
usernames from emails

8. WAP to do spell correction, extract all nouns, 26-03-2024


pronouns and verbs in a given text

9. WAP to find similarity between two words and 02-04-2024


classify a text as positive/negative sentiment
EXPERIMENT-1
AIM : Import nltk and download the ‘stopwords’ and ‘punkt’
packages

CODE :
import nltk
nltk.download('stopwords')
nltk.download('punkt')

OUTPUT :
EXPERIMENT-2

AIM : Import spacy and load the language model

CODE :
import spacy
nlp_eng = spacy.load('en_core_web_sm')
nlp_multi = spacy.load('xx_ent_wiki_sm')

OUTPUT :
EXPERIMENT-3

AIM : WAP in python to tokenize a given text

CODE :
from nltk import word_tokenize
text = "Last week, the University of Cambridge shared its own research that shows if
everyone wears a mask outside home,dreaded ‘second wave’ of the pandemic can be
avoided."
text = word_tokenize(text)
for t in text:
print(t)

OUTPUT :
EXPERIMENT-4
AIM : WAP in python to get the sentences of a text document.

CODE :
file = open('/content/demo.text')
Input_text = file.read()
ans = Input_text.split('.')

for an in ans:
print(an,'\n')

OUTPUT :
EXPERIMENT-5

AIM : WAP in python to tokenize text with stopwords as


delimiters.

CODE :
text = "Walter was feeling anxious. He was diagnosed today. He probably is the best
person I know."

stop_words_and_delims = ['was', 'is', 'the', '.', ',', '-', '!', '?']


for r in stop_words_and_delims:
text = text.replace(r, 'DELIM')

words = [t.strip() for t in text.split('DELIM')]


words_filtered = list(filter(lambda a: a not in [''], words))
for word in words_filtered:
print(word)

OUTPUT :
EXPERIMENT-6

AIM : WAP in python to add custom stop words in spaCy.

CODE :
import spacy

nlp = spacy.load('en_core_web_sm')

custom_stop_words = ['was', 'is','the','JUNK','NIL','of','more' ,'.',


',', '-', '!', '?','a']
for word in custom_stop_words:
nlp.vocab[word].is_stop = True

doc = nlp("Jonas was a JUNK great guy NIL Adam was evil NIL Martha JUNK was
more of a fool")
for token in doc:
if not token.is_stop:
print(token.text, end=" ")

OUTPUT :
EXPERIMENT-7
AIM : WAP to remove punctuations, perform stemming,
lemmatize given text and extract usernames from emails

CODE :
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
string = "Jonas!!! great \\guy <> Adam --evil [Martha] ;;fool() ."
ans = ""
for char in string:
if char not in punctuations:
ans+=char

print(ans)

from nltk.stem import PorterStemmer


from nltk.tokenize import word_tokenize
text= "Dancing is an art. Students should be taught dance as a subjectin schools . I
danced in many of my school function. Some people arealways hesitating to dance."
ans = ""
stemmer = PorterStemmer()
tokens = word_tokenize(text)
for token in tokens:
ans+=stemmer.stem(token)
ans+=" "
print(ans)

from nltk.corpus import wordnet


from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()
text= "Dancing is an art. Students should be taught dance as a subject in schools . I
danced in many of my school function. Some people are always hesitating to dance."
ans = ""
tokens = word_tokenize(text)
for token in tokens:
ans+=lemmatizer.lemmatize(token, wordnet.VERB)
ans+=" "
print(ans)

from nltk.tokenize import word_tokenize

text= "The new registrations are [email protected] , [email protected]. If you


find any disruptions, kindly contact [email protected] or [email protected]
"

text_list = word_tokenize(text)
usernames = []
for i in range(len(text_list)):
if text_list[i] == "@":
usernames.append(text_list[i-1])
print(username)

OUTPUT :
EXPERIMENT – 8
AIM : WAP to do spell correction, extract all nouns, pronouns and verbs in a
given text.

CODE :
from textblob import TextBlob
text="He is a gret person. He beleives in bod"
textb = TextBlob(text)
correct_text = textb.correct()
print(correct_text)

import nltk
from nltk import word_tokenize, pos_tag
text="James works at Microsoft. She lives in manchester and likes to play the flute"
tokens = word_tokenize(text)
parts_of_speech = nltk.pos_tag(tokens)
nouns = list(filter(lambda x: x[1] == "NN" or x[1] == "NNP", parts_of_speech))
for noun in nouns:
print(noun[0])

from nltk import pos_tag, word_tokenize

text = "I may bake a cake for my birthday. The talk will introduce reader about Use of
baking"

words = word_tokenize(text)

verb_phrases = []
for i in range(len(words)):
if i > 0 and pos_tag(words)[i][1] == 'VB':
verb_phrase = words[i-1] + ' ' + words[i]
verb_phrases.append(verb_phrase)

for i in verb_phrases:
print (i)
OUTPUT :
EXPERIMENT - 9
AIM : WAP to find similarity between two words and classify a text
as positive/negative sentiment

CODE :
import spacy

nlp = spacy.load('en_core_web_md')
words = "amazing terrible excellent"

tokens = nlp(words)

token1, token2, token3 = tokens[0], tokens[1], tokens[2]

print(f"Similarity between {token1} and {token2} : ", token1.similarity(token2))


print(f"Similarity between {token1} and {token3} : ", token1.similarity(token3))

from textblob import TextBlob


text = "It was a very pleasant day"
print(TextBlob(text).sentiment)

OUTPUT :

You might also like