0% found this document useful (0 votes)

74 views6 pages

NLP Lab1

The document provides examples of code to perform various natural language processing tasks including tokenization, stemming, part-of-speech tagging, n-gram modeling, and shallow parsing.

Uploaded by

karthikeyacharan78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views6 pages

NLP Lab1

The document provides examples of code to perform various natural language processing tasks including tokenization, stemming, part-of-speech tagging, n-gram modeling, and shallow parsing.

Uploaded by

karthikeyacharan78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

1. Read the paragraph and obtain the frequency of words.

code:
import nltk
from [Link] import word_tokenize
from [Link] import FreqDist

paragraph = "Sukumar is good at coding and pratcing lot of problems in

leetcode .sukumar is very nice guy"

words = word_tokenize(paragraph)

fdist = FreqDist(words)
for word, frequency in [Link]():
print(f"{word}: {frequency}")

2. Write a program to slit sentences in a document?

code:
import nltk
from [Link] import sent_tokenize

# Sample document
document = "sukumar is good boy. Sukumar in vitap"
# Tokenize the document into sentences
sentences = sent_tokenize(document)

# Print each sentence

for sentence in sentences:
print(sentence)

[Link] tokenizing and stemming by reading the input string?

code:
import nltk
from [Link] import word_tokenize
from [Link] import PorterStemmer

# Sample input string

input_string = "i am running"

# Tokenize the input string into words

words = word_tokenize(input_string)

# Initialize the PorterStemmer

stemmer = PorterStemmer()

# Perform stemming on each word

stemmed_words = [[Link](word) for word in words]

# Print the original words and their stemmed forms

for original, stemmed in zip(words, stemmed_words):
print(f"{original} -> {stemmed}")

4. Remove the stopwords and rareword in the document?

code:
import nltk
[Link]('stopwords')
from [Link] import word_tokenize
from [Link] import stopwords
from [Link] import FreqDist

# Sample document
document = "running in the forest is most dangerous than any ting in world of
human. sukumar sukumar hero model model run"

# Tokenize the document into words

words = word_tokenize(document)

# Remove stopwords
stop_words = set([Link]('english'))
filtered_words = [word for word in words if [Link]() not in stop_words]

# Calculate the frequency distribution of words

fdist = FreqDist(filtered_words)

# Define a threshold for rare words (e.g., words that occur less than 2 times)
rare_words = [word for word, frequency in [Link]() if frequency < 2]

# Remove rare words from the filtered words

filtered_words = [word for word in filtered_words if word not in rare_words]

# Join the filtered words back into a document

filtered_document = ' '.join(filtered_words)

print(filtered_words)

[Link] the parts of speech in the document?

code:
import nltk
from [Link] import word_tokenize
from nltk import pos_tag

# Sample document
document = "NLTK is a leading platform for building Python programs. It provides
easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet."

# Tokenize the document into words

words = word_tokenize(document)

# Perform part-of-speech tagging

pos_tags = pos_tag(words)

# Print the part-of-speech tags

for word, pos_tag in pos_tags:
print(f"{word}: {pos_tag}")

6. Write a program to read the words form a string variable/Text and perform
tokenizing
and Lancaster stemming by reading the input string?
code:
import nltk
from [Link] import word_tokenize
from [Link] import LancasterStemmer

# Sample input string

input_string = "NLTK is a leading platform for building Python programs."

# Tokenize the input string into words

words = word_tokenize(input_string)

# Initialize the LancasterStemmer

stemmer = LancasterStemmer()

# Perform stemming on each word

stemmed_words = [[Link](word) for word in words]

# Print the original words and their stemmed forms

for original, stemmed in zip(words, stemmed_words):
print(f"{original} -> {stemmed}")

[Link]:
CODE:

import nltk
from [Link] import word_tokenize,sent_tokenize
from [Link] import ngrams
import re

s= """Natural language processing is the ability of a computer program to

understand
human language as it is spoken and written referred to as natural language. It is
a
component of Artificial intelligence."""

s = [Link]()
s = [Link](r'[^a-zA-Z0-9\s]',' ',s)
tokens = [token for token in [Link](" ") if token!=""]
ouput = list(ngrams(tokens,5))
print(ouput)

[Link] BIGRAM TRIGRAM

CODE:
import nltk
from [Link] import treebank
from [Link] import UnigramTagger, BigramTagger, TrigramTagger

# Download the Treebank corpus if not already downloaded

[Link]('treebank')

# Get tagged sentences from the Treebank corpus

tagged_sentences = treebank.tagged_sents()

# Split the tagged sentences into train and test sets

train_size = int(0.8 * len(tagged_sentences))
train_sents = tagged_sentences[:train_size]
test_sents = tagged_sentences[train_size:]
# Train Unigram, Bigram, and Trigram taggers
unigram_tagger = UnigramTagger(train_sents)
bigram_tagger = BigramTagger(train_sents, backoff=unigram_tagger)
trigram_tagger = TrigramTagger(train_sents, backoff=bigram_tagger)

# Evaluate the taggers on the test set

print(f"Unigram tagger accuracy: {unigram_tagger.evaluate(test_sents)}")
print(f"Bigram tagger accuracy: {bigram_tagger.evaluate(test_sents)}")
print(f"Trigram tagger accuracy: {trigram_tagger.evaluate(test_sents)}")

# Tag a sample sentence

sentence = "Barack Obama was born in Hawaii."
words = nltk.word_tokenize(sentence)
tags = trigram_tagger.tag(words)
print(tags)

[Link] Tagger
code:
import nltk
from [Link] import treebank
from [Link] import AffixTagger

# Download the Treebank corpus if not already downloaded

[Link]('treebank')

# Get tagged sentences from the Treebank corpus

tagged_sentences = treebank.tagged_sents()

# Split the tagged sentences into train and test sets

train_size = int(0.8 * len(tagged_sentences))
train_sents = tagged_sentences[:train_size]
test_sents = tagged_sentences[train_size:]

# Specify the affix tagger parameters

prefix_length = 3
suffix_length = 3
min_stem_length = 2

# Train an affix tagger

affix_tagger = AffixTagger(train_sents, affix_length=prefix_length,
min_stem_length=2)

# Tag a sample sentence

sentence = "Barack Obama was born in Hawaii."
words = nltk.word_tokenize(sentence)
tags = affix_tagger.tag(words)
print(tags)

12. Dependency parser

code:
import nltk

# Define a simple context-free grammar for parsing

# Input sentences
input_sentences = ['the dog chased the cat', 'the man saw the ball']

# Create a chart parser

parser = [Link](grammar)

# Iterate over input sentences

for sent in input_sentences:
# Tokenize the sentence
tokens = nltk.word_tokenize(sent)
# Parse the sentence
for tree in [Link](tokens):
# Convert constituency parse tree to dependency parse tree
dep_tree = [Link](tree)
# Print the original sentence
print("Input Sentence:", sent)
# Print the dependency parse tree
print("Dependency Parse Tree:")
print(dep_tree)
print()

[Link] parsing
import nltk
[Link]('averaged_perceptron_tagger')
text = "The quick brown fox jumps over the lazy dog"

tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)

chunk_grammar = r"""
NP: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN
PP: {<IN><NP>} # Chunk prepositions followed by NP
VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>} # Chunk NP followed by VP
"""

chunk_parser = [Link](chunk_grammar)

chunks = chunk_parser.parse(pos_tags)

print(chunks)
#14 NER:'
code:
import nltk
from [Link] import word_tokenize
from [Link] import pos_tag
from [Link] import ne_chunk
[Link]("punkt")
[Link]('averaged_perceptron_tagger')
[Link]('maxent_ne_chunker')
[Link]('words')

doc = "Harry Potter, the young wizard with a lightning-shaped scar, attended
Hogwarts School, faced challenges, and triumphed over the dark wizard Voldemort,
bringing an end to the magical conflict."

words = word_tokenize(doc)

pos_tags = pos_tag(words)

ne_tags = ne_chunk(pos_tags)

for chunk in ne_tags:

if hasattr(chunk, 'label'):
print([Link](),':',' '.join(c[0] for c in chunk))

Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLP Lab Manual for CSE Students
No ratings yet
NLP Lab Manual for CSE Students
28 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
22 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP Lab Codes Till Mod3
No ratings yet
NLP Lab Codes Till Mod3
7 pages
NLP Tasks for MCA Students
No ratings yet
NLP Tasks for MCA Students
16 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
NLP Lab Manual - Final
No ratings yet
NLP Lab Manual - Final
15 pages
NLP Lab
No ratings yet
NLP Lab
63 pages
Sample Program Using Python 3
No ratings yet
Sample Program Using Python 3
5 pages
NLP
No ratings yet
NLP
12 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP PRGRM-1
No ratings yet
NLP PRGRM-1
7 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
NLP Session 4
No ratings yet
NLP Session 4
13 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
No ratings yet
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
42 pages
Python NLP Practical Exercises
No ratings yet
Python NLP Practical Exercises
14 pages
NLP Record
No ratings yet
NLP Record
23 pages
Lab Assignment-10
No ratings yet
Lab Assignment-10
1 page
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Main Topics: Start With A Checkmark Followed by The Topic Name
No ratings yet
Main Topics: Start With A Checkmark Followed by The Topic Name
48 pages
Python NLP Techniques Guide
No ratings yet
Python NLP Techniques Guide
18 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
H7 W5 NLP - Merged
No ratings yet
H7 W5 NLP - Merged
17 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
LP Vi Manual
No ratings yet
LP Vi Manual
77 pages
Python
No ratings yet
Python
9 pages
NLP Journl
No ratings yet
NLP Journl
15 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Date: Practical No.4:: Foundation of AI and ML (4351601)
No ratings yet
Date: Practical No.4:: Foundation of AI and ML (4351601)
10 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
27 pages
TSA Lab Manual New
No ratings yet
TSA Lab Manual New
14 pages
NLP Lab Manual for CSE Students
No ratings yet
NLP Lab Manual for CSE Students
45 pages
NLP 1
No ratings yet
NLP 1
6 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
Soundarya 256 NLP Practs
No ratings yet
Soundarya 256 NLP Practs
14 pages
Batch 2
No ratings yet
Batch 2
13 pages
Natural Language Processing Practical Journal
No ratings yet
Natural Language Processing Practical Journal
27 pages
NLP Lab
No ratings yet
NLP Lab
7 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
NLP Lab Programms
No ratings yet
NLP Lab Programms
9 pages
Natural Language Processing Journal
No ratings yet
Natural Language Processing Journal
73 pages
UBC Summer Linguistics Course Overview
No ratings yet
UBC Summer Linguistics Course Overview
33 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP Practicals
No ratings yet
NLP Practicals
6 pages
NLP - Record (Weeks 1-12)
No ratings yet
NLP - Record (Weeks 1-12)
41 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
Rehab After Osteotomy
No ratings yet
Rehab After Osteotomy
2 pages
Tata Group's Ethical Management Practices
No ratings yet
Tata Group's Ethical Management Practices
13 pages
CHap 17 - Leadership
No ratings yet
CHap 17 - Leadership
42 pages
Classes of Corporation Organized Under The RCC
No ratings yet
Classes of Corporation Organized Under The RCC
3 pages
Benchmarking Edge For Successful Sales Execution1
No ratings yet
Benchmarking Edge For Successful Sales Execution1
14 pages
Laura Knight-Jadczyk - Schwaller de Lubicz and The Fourth Reich
No ratings yet
Laura Knight-Jadczyk - Schwaller de Lubicz and The Fourth Reich
236 pages
Canada Triage Acuity Scale (CTAS) Terhadap Ketepatan Prioritas Triase Pasien
No ratings yet
Canada Triage Acuity Scale (CTAS) Terhadap Ketepatan Prioritas Triase Pasien
9 pages
Indian Council For Cultural Relations Application For Empanelment As A Performing Artist/Group
No ratings yet
Indian Council For Cultural Relations Application For Empanelment As A Performing Artist/Group
17 pages
Assessment Format TVET
No ratings yet
Assessment Format TVET
2 pages
The History of Coffee
No ratings yet
The History of Coffee
6 pages
Definition: What Is Play? What Does Play Do?
No ratings yet
Definition: What Is Play? What Does Play Do?
4 pages
An English Accent
No ratings yet
An English Accent
119 pages
English For Academic and Professional Purposes Long Test 1
No ratings yet
English For Academic and Professional Purposes Long Test 1
2 pages
MCQ Questions For Class 10 English First Flight Chapter 2 Nelson Mandela Long Walk To Freedom With Answers
No ratings yet
MCQ Questions For Class 10 English First Flight Chapter 2 Nelson Mandela Long Walk To Freedom With Answers
22 pages
UNIT 2 - Tumin
No ratings yet
UNIT 2 - Tumin
10 pages
Types of Scientific Publications Guide
No ratings yet
Types of Scientific Publications Guide
29 pages
Modal Verbs - TEXT
100% (2)
Modal Verbs - TEXT
3 pages
The School of The Third Joyous Kingdom
No ratings yet
The School of The Third Joyous Kingdom
24 pages
ImageNet Classification With Deep Convolutional Convolutional Neural Networks PDF
No ratings yet
ImageNet Classification With Deep Convolutional Convolutional Neural Networks PDF
37 pages
Project Athena Presentation
No ratings yet
Project Athena Presentation
19 pages
Growing Together: Stronger
No ratings yet
Growing Together: Stronger
43 pages
1-Introduction To Spectrochemical Methods
No ratings yet
1-Introduction To Spectrochemical Methods
36 pages
2 Biodiversity Student Resource
No ratings yet
2 Biodiversity Student Resource
104 pages
Trinity College Dublin Academic Calendar
No ratings yet
Trinity College Dublin Academic Calendar
1 page
Work Experience Diary at Glenhaven
No ratings yet
Work Experience Diary at Glenhaven
7 pages
ResearchPaper 2
No ratings yet
ResearchPaper 2
112 pages
Strategies of Conflict Management: Prepared by Sushma
No ratings yet
Strategies of Conflict Management: Prepared by Sushma
19 pages
Reported Speech 3
No ratings yet
Reported Speech 3
2 pages
Antarctica Bioprospecting
No ratings yet
Antarctica Bioprospecting
6 pages
Master Personal Branding for Success
No ratings yet
Master Personal Branding for Success
39 pages

NLP Lab1

Uploaded by

NLP Lab1

Uploaded by

1. Read the paragraph and obtain the frequency of words.

paragraph = "Sukumar is good at coding and pratcing lot of problems in

2. Write a program to slit sentences in a document?

# Print each sentence

[Link] tokenizing and stemming by reading the input string?

# Sample input string

# Tokenize the input string into words

# Initialize the PorterStemmer

# Perform stemming on each word

# Print the original words and their stemmed forms

4. Remove the stopwords and rareword in the document?

# Tokenize the document into words

# Calculate the frequency distribution of words

# Remove rare words from the filtered words

# Join the filtered words back into a document

[Link] the parts of speech in the document?

# Tokenize the document into words

# Perform part-of-speech tagging

# Print the part-of-speech tags

# Sample input string

# Tokenize the input string into words

# Initialize the LancasterStemmer

# Perform stemming on each word

# Print the original words and their stemmed forms

s= """Natural language processing is the ability of a computer program to

[Link] BIGRAM TRIGRAM

# Download the Treebank corpus if not already downloaded

# Get tagged sentences from the Treebank corpus

# Split the tagged sentences into train and test sets

# Evaluate the taggers on the test set

# Tag a sample sentence

# Download the Treebank corpus if not already downloaded

# Get tagged sentences from the Treebank corpus

# Split the tagged sentences into train and test sets

# Specify the affix tagger parameters

# Train an affix tagger

# Tag a sample sentence

12. Dependency parser

# Define a simple context-free grammar for parsing

# Create a chart parser

# Iterate over input sentences

for chunk in ne_tags:

You might also like