0% found this document useful (0 votes)

18 views19 pages

NLP Lab

The document outlines a series of experiments and programming tasks related to Natural Language Processing (NLP) for a course taught by Soumya M S at GEC, Challakere. It includes tasks such as text preprocessing, N-gram modeling, Minimum Edit Distance algorithm, parsing techniques, and implementing a Naive Bayes classifier for genre classification of movie reviews. Additionally, it introduces the Natural Language Toolkit (NLTK) as a resource for building NLP applications in Python.

Uploaded by

Varun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views19 pages

NLP Lab

Uploaded by

Varun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Natural Language Proce

[IPCC LAB]

Soumya M S
Asst Professor
Dept of AIML
GEC,Challakere
Natural Language Processing (BAI601)

Sl no Experiments
1 Write a Python program for the following preprocessing of text in NLP:
● Tokenization ● Filtration ● Script Validation ● Stop Word Removal ● Stemming
2 Demonstrate the N-gram modeling to analyze and establish the probability distribution across
sentences and explore the utilization of unigrams, bigrams, and trigrams in diverse English
sentences to illustrate the impact of varying n-gram orders on the calculated probabilities.
3 Investigate the Minimum Edit Distance (MED) algorithm and its application in string comparison
and the goal is to understand how the algorithm efficiently computes the minimum number of
edit operations required to transform one string into another.
● Test the algorithm on strings with different type of variations (e.g., typos, substitutions,
insertions, deletions)
● Evaluate its adaptability to different types of input variations
4 Write a program to implement top-down and bottom-up parser using appropriate context free
grammar.
5 Given the following short movie reviews, each labeled with a genre, either comedy or action:
● fun, couple, love, love comedy
● fast, furious, shoot action
● couple, fly, fast, fun, fun comedy
● furious, shoot, shoot, fun action
● fly, fast, shoot, love action and
A new document D: fast, couple, shoot, fly Compute the most likely class for D. Assume a Naive
Bayes classifier and use add-1 smoothing for the likelihoods.
6 Demonstrate the following using appropriate programming tool which illustrates the use of
information retrieval in NLP:
● Study the various Corpus – Brown, Inaugural, Reuters, udhr with various methods like filelds,
raw, words, sents, categories 3
● Create and use your own corpora (plaintext, categorical)
● Study Conditional frequency distributions
● Study of tagged corpora with methods like tagged_sents, tagged_words
● Write a program to find the most frequent noun tags
● Map Words to Properties Using Python Dictionaries
● Study Rule based tagger, Unigram Tagger Find different words from a given plain text without
any space by comparing this text with a given corpus of words. Also find the score of words.
7 Write a Python program to find synonyms and antonyms of the word "active" using WordNet.
8 Implement the machine translation application of NLP where it needs to train a machine
translation model for a language with limited parallel corpora. Investigate and incorporate
techniques to improve performance in low-resource scenarios.

Soumya M S, Dept of AIML, GEC,Challakere. 1

Natural Language Processing (BAI601)
Natural Language Toolkit (NLTK) is a leading platform for building Python programs to work with human
language data (Natural Language Processing).
It is accompanied by a book that explains the underlying concepts behind the language processing tasks
supported by the toolkit.
NLTK is intended to support research and teaching in NLP or closely related areas, including empirical
linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning.
For installation instructions on your local machine, please refer to: http://www.nltk.org/install.html ,
http://www.nltk.org/data.html

Soumya M S, Dept of AIML, GEC,Challakere. 2

Natural Language Processing (BAI601)
Program 1.
Write a Python program for the following preprocessing of text in NLP:
 Tokenization
 Filtration
 Script Validation
 Stop Word Removal
 Stemming

import nltk
import re
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
# Download necessary NLTK resources
nltk.download('punkt_tab')
nltk.download('stopwords')
def preprocess_text(text):
# Step 1: Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)
# Step 2: Filtration (remove special characters, numbers, etc.)
filtered_tokens = [word for word in tokens if re.match(r'^[a-zA-Z]+$', word)]
print("Filtered Tokens:", filtered_tokens)
# Step 3: Script Validation (ensure all tokens are in English script)
# Assuming the text is already in English, no further action is needed.
# If not, you can use a language detection library like `langdetect`.
# Step 4: Stop Word Removal
stop_words = set(stopwords.words('english'))
tokens_without_stopwords = [word for word in filtered_tokens if word.lower() not in
stop_words]
print("Tokens without Stopwords:", tokens_without_stopwords)
# Step 5: Stemming
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(word) for word in tokens_without_stopwords]

Soumya M S, Dept of AIML, GEC,Challakere. 3

Natural Language Processing (BAI601)
print("Stemmed Tokens:", stemmed_tokens)
return stemmed_tokens
# Example Usage
text = "This is an example text! It includes different words, numbers like 123, and punctuation."
processed_text = preprocess_text(text)
print("Processed Tokens:", processed_text)
Output:
Tokens: ['This', 'is', 'an', 'example', 'text', '!', 'It', 'includes', 'different', 'words', ',', 'numbers', 'like',
'123', ',', 'and', 'punctuation', '.']
Filtered Tokens: ['This', 'is', 'an', 'example', 'text', 'It', 'includes', 'different', 'words', 'numbers',
'like', 'and', 'punctuation']
Tokens without Stopwords: ['example', 'text', 'includes', 'different', 'words', 'numbers', 'like',
'punctuation']
Stemmed Tokens: ['exampl', 'text', 'includ', 'differ', 'word', 'number', 'like', 'punctuat']
Processed Tokens: ['exampl', 'text', 'includ', 'differ', 'word', 'number', 'like', 'punctuat']

Soumya M S, Dept of AIML, GEC,Challakere. 4

Natural Language Processing (BAI601)
Program 2:
Demonstrate the N-gram modeling to analyze and establish the probability distribution across sentences and
explore the utilization of unigrams, bigrams, and trigrams in diverse English sentences to illustrate the
impact of varying n-gram orders on the calculated probabilities.
Program:
import nltk
from nltk.util import ngrams
from collections import Counter
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist
# Download necessary NLTK resources
nltk.download('punkt_tab')
# Sample sentences
sentences = [
"The quick brown fox jumps over the lazy dog.",
"A quick brown fox jumps over the lazy dog.",
"The lazy dog is jumped over by the quick brown fox."
]
# Function to generate N-grams and calculate probabilities
def ngram_probability(sentences, n):
# Tokenize sentences and generate N-grams
tokens = []
for sentence in sentences:
tokens.extend(word_tokenize(sentence.lower()))
# Generate N-grams
n_grams = list(ngrams(tokens, n))
# Calculate frequency distribution
freq_dist = FreqDist(n_grams)

# Calculate probabilities
total_ngrams = len(n_grams)
probabilities = {gram: count / total_ngrams for gram, count in freq_dist.items()}

Soumya M S, Dept of AIML, GEC,Challakere. 5

Natural Language Processing (BAI601)

return probabilities
# Unigrams (n=1)
unigram_probs = ngram_probability(sentences, 1)
print("Unigram Probabilities:")
for gram, prob in unigram_probs.items():
print(f"{gram}: {prob:.4f}")
# Bigrams (n=2)
bigram_probs = ngram_probability(sentences, 2)
print("\nBigram Probabilities:")
for gram, prob in bigram_probs.items():
print(f"{gram}: {prob:.4f}")
# Trigrams (n=3)
trigram_probs = ngram_probability(sentences, 3)
print("\nTrigram Probabilities:")
for gram, prob in trigram_probs.items():
print(f"{gram}: {prob:.4f}")

Output:
Unigram Probabilities:
('the',): 0.1562
('quick',): 0.0938
('brown',): 0.0938
('fox',): 0.0938
('jumps',): 0.0625
('over',): 0.0938
('lazy',): 0.0938
('dog',): 0.0938
('.',): 0.0938
('a',): 0.0312
('is',): 0.0312

Soumya M S, Dept of AIML, GEC,Challakere. 6

Natural Language Processing (BAI601)
('jumped',): 0.0312
('by',): 0.0312

Bigram Probabilities:
('the', 'quick'): 0.0645
('quick', 'brown'): 0.0968
('brown', 'fox'): 0.0968
('fox', 'jumps'): 0.0645
('jumps', 'over'): 0.0645
('over', 'the'): 0.0645
('the', 'lazy'): 0.0968
('lazy', 'dog'): 0.0968
('dog', '.'): 0.0645
('.', 'a'): 0.0323
('a', 'quick'): 0.0323
('.', 'the'): 0.0323
('dog', 'is'): 0.0323
('is', 'jumped'): 0.0323
('jumped', 'over'): 0.0323
('over', 'by'): 0.0323
('by', 'the'): 0.0323
('fox', '.'): 0.0323

Trigram Probabilities:
('the', 'quick', 'brown'): 0.0667
('quick', 'brown', 'fox'): 0.1000
('brown', 'fox', 'jumps'): 0.0667
('fox', 'jumps', 'over'): 0.0667
('jumps', 'over', 'the'): 0.0667
('over', 'the', 'lazy'): 0.0667
('the', 'lazy', 'dog'): 0.1000

Soumya M S, Dept of AIML, GEC,Challakere. 7

Natural Language Processing (BAI601)
('lazy', 'dog', '.'): 0.0667
('dog', '.', 'a'): 0.0333
('.', 'a', 'quick'): 0.0333
('a', 'quick', 'brown'): 0.0333
('dog', '.', 'the'): 0.0333
('.', 'the', 'lazy'): 0.0333
('lazy', 'dog', 'is'): 0.0333
('dog', 'is', 'jumped'): 0.0333
('is', 'jumped', 'over'): 0.0333
('jumped', 'over', 'by'): 0.0333
('over', 'by', 'the'): 0.0333
('by', 'the', 'quick'): 0.0333
('brown', 'fox', '.'): 0.0333

Soumya M S, Dept of AIML, GEC,Challakere. 8

Natural Language Processing (BAI601)
Program 3:
Investigate the Minimum Edit Distance (MED) algorithm and its application in string comparison and the
goal is to understand how the algorithm efficiently computes the minimum number of edit operations
required to transform one string into another.
● Test the algorithm on strings with different type of variations (e.g., typos, substitutions, insertions,
deletions)
● Evaluate its adaptability to different types of input variations

def min_edit_distance(str1, str2):

m = len(str1)
n = len(str2)

# Create a DP table to store results of subproblems

dp = [[0] * (n + 1) for _ in range(m + 1)]

# Initialize the base cases

for i in range(m + 1):
dp[i][0] = i # Deletion cost
for j in range(n + 1):
dp[0][j] = j # Insertion cost

# Fill the DP table

for i in range(1, m + 1):
for j in range(1, n + 1):
if str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] # No operation needed
else:
dp[i][j] = 1 + min(
dp[i - 1][j], # Deletion
dp[i][j - 1], # Insertion
dp[i - 1][j - 1] # Substitution
)

# The final result is in dp[m][n]

Soumya M S, Dept of AIML, GEC,Challakere. 9

Natural Language Processing (BAI601)
return dp[m][n]

# Test cases
test_cases = [
("kitten", "sitting"), # Substitutions and insertions
("intention", "execution"), # Substitutions and deletions
("flaw", "lawn"), # Substitutions
("apple", "aple"), # Deletion
("book", "books"), # Insertion
("abc", "def"), # All substitutions
("", "abc"), # All insertions
("abc", "") # All deletions
]

# Evaluate MED for each test case

for str1, str2 in test_cases:
distance = min_edit_distance(str1, str2)
print(f"MED between '{str1}' and '{str2}': {distance}")

Output:
MED between 'kitten' and 'sitting': 3
MED between 'intention' and 'execution': 5
MED between 'flaw' and 'lawn': 2
MED between 'apple' and 'aple': 1
MED between 'book' and 'books': 1
MED between 'abc' and 'def': 3
MED between '' and 'abc': 3
MED between 'abc' and '': 3

Soumya M S, Dept of AIML, GEC,Challakere. 10

Natural Language Processing (BAI601)
Program 4:
Write a program to implement top-down and bottom-up parser using appropriate context free grammar.
import nltk
from nltk import CFG
# Define a simple Context-Free Grammar (CFG)
grammar = CFG.fromstring("""
S -> NP VP
NP -> Det N | N
VP -> V NP | V
Det -> 'the' | 'a'
N -> 'cat' | 'dog'
V -> 'chased' | 'barked'
""")
# Create Top-Down (Recursive Descent) and Bottom-Up (Chart) parsers
top_down_parser = nltk.RecursiveDescentParser(grammar)
bottom_up_parser = nltk.ChartParser(grammar)
# Input sentence
sentence = "the cat chased a dog".split()
# Top-Down Parsing
print("Top-Down Parsing Results:")
for tree in top_down_parser.parse(sentence):
print(tree)
# Bottom-Up Parsing
print("\nBottom-Up Parsing Results:")
for tree in bottom_up_parser.parse(sentence):
print(tree)
Output:
Top-Down Parsing Results:
(S (NP (Det the) (N cat)) (VP (V chased) (NP (Det a) (N dog))))

Bottom-Up Parsing Results:

(S (NP (Det the) (N cat)) (VP (V chased) (NP (Det a) (N dog))))

Soumya M S, Dept of AIML, GEC,Challakere. 11

Natural Language Processing (BAI601)
Program 5:
Given the following short movie reviews, each labeled with a genre, either comedy or action:
● fun, couple, love, love comedy
● fast, furious, shoot action
● couple, fly, fast, fun, fun comedy
● furious, shoot, shoot, fun action
● fly, fast, shoot, love action and
A new document D: fast, couple, shoot, fly Compute the most likely class for D.
Assume a Naive Bayes classifier and use add-1 smoothing for the likelihoods.

Program:
import nltk
from collections import defaultdict
from nltk.tokenize import word_tokenize
# Define dataset: Labeled movie reviews
documents = [
(["fun", "couple", "love", "love"], "Comedy"),
(["fast", "furious", "shoot"], "Action"),
(["couple", "fly", "fast", "fun", "fun"], "Comedy"),
(["furious", "shoot", "shoot", "fun"], "Action"),
(["fly", "fast", "shoot", "love"], "Action"),
]
# The new document to classify
D = ["fast", "couple", "shoot", "fly"]
# Count class occurrences
class_counts = defaultdict(int)
word_counts = defaultdict(lambda: defaultdict(int))
vocabulary = set()
# Process dataset
for words, label in documents:
class_counts[label] += 1
for word in words:
word_counts[label][word] += 1
vocabulary.add(word)

Soumya M S, Dept of AIML, GEC,Challakere. 12

Natural Language Processing (BAI601)
# Compute total words per class
total_words = {label: sum(word_counts[label].values()) for label in class_counts}
V = len(vocabulary) # Vocabulary size

# Compute prior probabilities

total_docs = sum(class_counts.values())
priors = {label: class_counts[label] / total_docs for label in class_counts}

# Function to compute likelihood with Add-1 (Laplace) smoothing

def compute_likelihood(word, label):
return (word_counts[label][word] + 1) / (total_words[label] + V)

# Compute posterior probabilities for D

posterior_probs = {}
for label in class_counts:
posterior_probs[label] = priors[label] # Start with prior probability
for word in D:
posterior_probs[label] *= compute_likelihood(word, label) # Multiply likelihoods

# Determine the most likely class

predicted_class = max(posterior_probs, key=posterior_probs.get)

# Output results
print("Posterior Probabilities:")
for label, prob in posterior_probs.items():
print(f"P({label} | D) = {prob:.6f}")
print(f"\nThe new document D is classified as: *{Predicted_class}*")

Soumya M S, Dept of AIML, GEC,Challakere. 13

Natural Language Processing (BAI601)
Program 6
Demonstrate the following using appropriate programming tool which illustrates the use of information
retrieval in NLP:
● Study the various Corpus – Brown, Inaugural, Reuters, udhr with various methods like filelds, raw, words,
sents, categories 3
● Create and use your own corpora (plaintext, categorical)
● Study Conditional frequency distributions
● Study of tagged corpora with methods like tagged_sents, tagged_words
● Write a program to find the most frequent noun tags
● Map Words to Properties Using Python Dictionaries
● Study Rule based tagger, Unigram Tagger Find different words from a given plain text without any space
by comparing this text with a given corpus of words. Also find the score of words.

from nltk import download

from nltk.corpus import brown, inaugural, reuters, udhr, wordnet
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag, DefaultTagger, UnigramTagger
from nltk.probability import ConditionalFreqDist
from collections import Counter

# Download necessary corpora

download('brown')
download('inaugural')
download('reuters')
download('udhr')
download('punkt')
download('averaged_perceptron_tagger')
download('wordnet')
# 1. STUDY VARIOUS CORPORA
print("\n--- Studying Standard Corpora ---\n")
print("Brown Corpus Categories:", brown.categories())
print("Brown Corpus Sample Words:", brown.words(categories='news')[:20])
print("\nInaugural Speech Words:", inaugural.words(fileids='2009-Obama.txt')[:20])
print("\nReuters Corpus Categories:", reuters.categories()[:5])
print("Reuters Corpus Sample Words:", reuters.words(categories='trade')[-20:])
print("\nUDHR Available Languages:", udhr.fileids()[:5])
print("UDHR English Sample:", udhr.words(fileids='English-Latin1')[:20])

Soumya M S, Dept of AIML, GEC,Challakere. 14

Natural Language Processing (BAI601)

# 2. CREATE & USE CUSTOM CORPUS

print("\n--- Creating and Using Custom Corpus ---\n")
custom_text = "Natural Language Processing is amazing. NLP helps human language."
custom_tokens = word_tokenize(custom_text)
print("Tokenized Custom Corpus:", custom_tokens)

# 3. CONDITIONAL FREQUENCY DISTRIBUTION

print("\n--- Conditional Frequency Distribution ---\n")
cfd = ConditionalFreqDist((genre, word) for genre in brown.categories()
for word in brown.words(categories=genre))
print("Most common words in 'news':", cfd['news'].most_common(10))

# 4. STUDY TAGGED CORPORA

print("\n--- Studying Tagged Corpora ---\n")
brown_tagged_sents = brown.tagged_sents(categories='news')
brown_tagged_words = brown.tagged_words(categories='news')
print("Tagged Sentence Sample:", brown_tagged_sents[0])
print("Tagged Words Sample:", brown_tagged_words[:10])

# 5. FIND MOST FREQUENT NOUN TAGS

print("\n--- Most Frequent Noun Tags ---\n")
noun_tags = [tag for (word, tag) in brown_tagged_words if tag.startswith('NN')]
freq_nouns = Counter(noun_tags)
print("Most Frequent Noun Tags:", freq_nouns.most_common(5))

# 6. RULE-BASED AND UNIGRAM TAGGER

print("\n--- Rule-Based and Unigram Tagger ---\n")
default_tagger = DefaultTagger('NN')
print("Default Tagger Example:", default_tagger.tag(custom_tokens))

unigram_tagger = UnigramTagger(brown_tagged_sents[:5000])
print("Unigram Tagger Example:", unigram_tagger.tag(custom_tokens))

Soumya M S, Dept of AIML, GEC,Challakere. 15

Natural Language Processing (BAI601)
# 7. SPLITTING TEXT WITHOUT SPACES
print("\n--- Finding Words in a Plain Text Without Spaces ---\n")
corpus_words = set(brown.words())
text_without_spaces = "thecatinthehat"

def split_text(text, corpus):

possible_words = [text[i:j] for i in range(len(text))
for j in range(i+1, len(text)+1)
if text[i:j] in corpus]
return possible_words

valid_words = split_text(text_without_spaces, corpus_words)

print("Identified Words:", valid_words)

Soumya M S, Dept of AIML, GEC,Challakere. 16

Natural Language Processing (BAI601)
Program 7
Write a Python program to find synonyms and antonyms of the word "active" using WordNet

import nltk
from nltk.corpus import wordnet
# Download WordNet
nltk.download('wordnet')
#Function to find synonyms and antonyms
def get_synonyms_antonyms(word):
synonyms = set()
antonyms = set()
for synset in wordnet.synsets(word):
for lemma in synset.lemmas():
synonyms.add(lemma.name())
if lemma.antonyms():
antonyms.add(lemma.antonyms()[0].name())
return synonyms, antonyms
#Get synonyms and antonyms
synonyms, antonyms = get_synonyms_antonyms("active")
#Results
print("Synonyms of 'active':", synonyms)
print("Antonyms of 'active':", antonyms)

Soumya M S, Dept of AIML, GEC,Challakere. 17

Natural Language Processing (BAI601)

Program 8
Implement the machine translation application of NLP where it needs to train a machine translation model
for a language with limited parallel corpora. Investigate and incorporate techniques to improve performance
in low-resource scenarios.

Soumya M S, Dept of AIML, GEC,Challakere. 18

NLP Lab Codes Till Mod3
No ratings yet
NLP Lab Codes Till Mod3
7 pages
NLP Lab Manual - Final
No ratings yet
NLP Lab Manual - Final
15 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
22 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP Record
No ratings yet
NLP Record
23 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
NLP Applications and Text Preprocessing
No ratings yet
NLP Applications and Text Preprocessing
54 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
13 pages
InfoSec Lab Manual for Students
No ratings yet
InfoSec Lab Manual for Students
25 pages
Experiment 2
No ratings yet
Experiment 2
4 pages
NLP Lab Manual for CSE Students
No ratings yet
NLP Lab Manual for CSE Students
28 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
NLP Practical
No ratings yet
NLP Practical
16 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
NLP Exp 4
No ratings yet
NLP Exp 4
5 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
5 pages
Nlpexp5 30
No ratings yet
Nlpexp5 30
4 pages
TSA Lab Manual New
No ratings yet
TSA Lab Manual New
14 pages
Module 5
No ratings yet
Module 5
69 pages
Nlpexp5 30
No ratings yet
Nlpexp5 30
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP Text Preprocessing in Python
No ratings yet
NLP Text Preprocessing in Python
2 pages
NLP Intro1
No ratings yet
NLP Intro1
61 pages
NLP Manual Final
No ratings yet
NLP Manual Final
28 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
NLP Lab Manual All Experiments
No ratings yet
NLP Lab Manual All Experiments
13 pages
NLP
No ratings yet
NLP
64 pages
H7 W5 NLP - Merged
No ratings yet
H7 W5 NLP - Merged
17 pages
NLP Experiment 2
No ratings yet
NLP Experiment 2
5 pages
Batch 2
No ratings yet
Batch 2
13 pages
Module 1 Updated Final
No ratings yet
Module 1 Updated Final
45 pages
NLP Tasks for MCA Students
No ratings yet
NLP Tasks for MCA Students
16 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
NLP Lab
No ratings yet
NLP Lab
63 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
Python Text Processing Techniques
No ratings yet
Python Text Processing Techniques
13 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
NLP Text Preprocessing Techniques
No ratings yet
NLP Text Preprocessing Techniques
15 pages
Natural Language Processing Practical Journal
No ratings yet
Natural Language Processing Practical Journal
27 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
NLP Manual Final
No ratings yet
NLP Manual Final
22 pages
NLP 02
No ratings yet
NLP 02
6 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Syllabus R21
100% (1)
NLP Syllabus R21
2 pages
NLP Lab Manual for Students
No ratings yet
NLP Lab Manual for Students
24 pages
NLP Techniques in Machine Learning Lab
No ratings yet
NLP Techniques in Machine Learning Lab
4 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
12 pages
NLP Short Notes
No ratings yet
NLP Short Notes
21 pages
ITC TradeMap
No ratings yet
ITC TradeMap
17 pages
Brain Herniation PDF
No ratings yet
Brain Herniation PDF
5 pages
Communication
No ratings yet
Communication
180 pages
Needle Detector ND800 Datasheet
No ratings yet
Needle Detector ND800 Datasheet
4 pages
Analysis of The Effects of Deforestation On The Environment and Agriculture in Kenya
No ratings yet
Analysis of The Effects of Deforestation On The Environment and Agriculture in Kenya
20 pages
Goa Solar Energy Policy 2017 Draft
No ratings yet
Goa Solar Energy Policy 2017 Draft
13 pages
Sony Cdx-Gt650ui Gt700ui Gt707ui
No ratings yet
Sony Cdx-Gt650ui Gt700ui Gt707ui
42 pages
FAILURE MODE AND EFFECT ANALYSIS PADA DAMPAK ZAKAT TERHADAP PEREKONOMIAN LOKAL
No ratings yet
FAILURE MODE AND EFFECT ANALYSIS PADA DAMPAK ZAKAT TERHADAP PEREKONOMIAN LOKAL
15 pages
LBP Organizational Structure
No ratings yet
LBP Organizational Structure
1 page
Staar Grade 4 2015 Test Read
100% (1)
Staar Grade 4 2015 Test Read
40 pages
Understanding A Restaurant Cash Flow Statement
No ratings yet
Understanding A Restaurant Cash Flow Statement
6 pages
GE Versana Balance Platinum Brosjyre
100% (1)
GE Versana Balance Platinum Brosjyre
8 pages
Seoul National University Graduate School of Public Health
No ratings yet
Seoul National University Graduate School of Public Health
1 page
Benhin CV
No ratings yet
Benhin CV
1 page
Arduino VU Meter
No ratings yet
Arduino VU Meter
7 pages
Civil Engineer Exam Room Assignments 2019
No ratings yet
Civil Engineer Exam Room Assignments 2019
66 pages
Grade 2 Seasons Lesson Plan
No ratings yet
Grade 2 Seasons Lesson Plan
3 pages
Arun Kumar CV 2025 PDF
No ratings yet
Arun Kumar CV 2025 PDF
3 pages
Kondotec Wire Rope Product Overview
No ratings yet
Kondotec Wire Rope Product Overview
30 pages
Mineral
No ratings yet
Mineral
38 pages
Numerical Analysis of Ordinary and Delay Differential Equations Taketomo Mitsui PDF Available
No ratings yet
Numerical Analysis of Ordinary and Delay Differential Equations Taketomo Mitsui PDF Available
152 pages
PDF DRRR q1 Module 7 Compress
No ratings yet
PDF DRRR q1 Module 7 Compress
24 pages
A Better Future For All - Unit Planner v2
No ratings yet
A Better Future For All - Unit Planner v2
6 pages
Harry's Zoo Adventure & Family Struggles
No ratings yet
Harry's Zoo Adventure & Family Struggles
13 pages
IoT Tools and Platforms Assignment by Anuwar
No ratings yet
IoT Tools and Platforms Assignment by Anuwar
2 pages
Journal Morphology
No ratings yet
Journal Morphology
9 pages
The Symbolism of Theatre by Rene Guenon
No ratings yet
The Symbolism of Theatre by Rene Guenon
4 pages
English Assessment Multiple Choice
No ratings yet
English Assessment Multiple Choice
14 pages
CV - Nur Choliyah
No ratings yet
CV - Nur Choliyah
3 pages
Ronin-S User Manual - v1.0 PDF
No ratings yet
Ronin-S User Manual - v1.0 PDF
30 pages

NLP Lab

Uploaded by

NLP Lab

Uploaded by

Natural Language Proce

Soumya M S, Dept of AIML, GEC,Challakere. 1

Soumya M S, Dept of AIML, GEC,Challakere. 2

Soumya M S, Dept of AIML, GEC,Challakere. 3

Soumya M S, Dept of AIML, GEC,Challakere. 4

Soumya M S, Dept of AIML, GEC,Challakere. 5

Soumya M S, Dept of AIML, GEC,Challakere. 6

Soumya M S, Dept of AIML, GEC,Challakere. 7

Soumya M S, Dept of AIML, GEC,Challakere. 8

def min_edit_distance(str1, str2):

# Create a DP table to store results of subproblems

# Initialize the base cases

# Fill the DP table

# The final result is in dp[m][n]

Soumya M S, Dept of AIML, GEC,Challakere. 9

# Evaluate MED for each test case

Soumya M S, Dept of AIML, GEC,Challakere. 10

Bottom-Up Parsing Results:

Soumya M S, Dept of AIML, GEC,Challakere. 11

Soumya M S, Dept of AIML, GEC,Challakere. 12

# Compute prior probabilities

# Function to compute likelihood with Add-1 (Laplace) smoothing

# Compute posterior probabilities for D

# Determine the most likely class

Soumya M S, Dept of AIML, GEC,Challakere. 13

from nltk import download

# Download necessary corpora

Soumya M S, Dept of AIML, GEC,Challakere. 14

# 2. CREATE & USE CUSTOM CORPUS

# 3. CONDITIONAL FREQUENCY DISTRIBUTION

# 4. STUDY TAGGED CORPORA

# 5. FIND MOST FREQUENT NOUN TAGS

# 6. RULE-BASED AND UNIGRAM TAGGER

Soumya M S, Dept of AIML, GEC,Challakere. 15

def split_text(text, corpus):

valid_words = split_text(text_without_spaces, corpus_words)

Soumya M S, Dept of AIML, GEC,Challakere. 16

Soumya M S, Dept of AIML, GEC,Challakere. 17

Soumya M S, Dept of AIML, GEC,Challakere. 18

You might also like