0% found this document useful (0 votes)
21 views16 pages

NLP Lab Programs

The document outlines a course on Natural Language Processing (NLP) that includes practical exercises such as word analysis, word generation, and POS tagging using various methods like Hidden Markov Models and Viterbi Decoding. It details the aims, programs, and results of each exercise, demonstrating key NLP concepts and techniques. By the end of the course, students will be equipped to implement NLP applications and analyze language features.

Uploaded by

Mohana Priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

NLP Lab Programs

The document outlines a course on Natural Language Processing (NLP) that includes practical exercises such as word analysis, word generation, and POS tagging using various methods like Hidden Markov Models and Viterbi Decoding. It details the aims, programs, and results of each exercise, demonstrating key NLP concepts and techniques. By the end of the course, students will be equipped to implement NLP applications and analyze language features.

Uploaded by

Mohana Priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

lOMoARcPSD|57829057

AL3501 - NATURAL LANGUAGE PROCESSING

PRACTICAL EXERCISES:

1. Word Analysis
2. Word Generation
3. Morphology
4. N-Grams
5. N-Grams Smoothing
6. POS Tagging: Hidden Markov Model
7. POS Tagging: Viterbi Decoding
8. Building POS Tagger
9. Chunking
10. Building Chunker

COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: tag a given text with basic Language features
CO2 implement a rule based system to tackle morphology/syntax of a language
CO3: design a tag set to be used for statistical processing for real-timeapplications.
CO4: compare and contrast the use of different statistical approaches for different types of NLP
applications.
CO5: use tools to process natural language and design innovative NLP applications.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO: 1 WORD ANALYSIS

AIM:
The aim of this program is to perform basic word analysis using Natural Language Processing (NLP)
techniques.

import nltk
from [Link] import word_tokenize
from [Link] import pos_tag
from [Link] import PorterStemmer
from [Link] import WordNetLemmatizer

# Download necessary NLTK data


[Link]('punkt')
[Link]('averaged_perceptron_tagger')
[Link]('wordnet')

# Sample text for word analysis


text = "The quick brown fox jumps over the lazy dog."

# Step 1: Tokenization - Splitting the text into individual words


tokens = word_tokenize(text)
print("Tokens:", tokens)

# Step 2: Part-of-Speech (POS) Tagging - Assigning POS tags to each token


pos_tags = pos_tag(tokens)
print("\nPOS Tags:", pos_tags)

# Step 3: Stemming - Reducing words to their root form


stemmer = PorterStemmer()
stems = [[Link](word) for word in tokens]
print("\nStems:", stems)

# Step 4: Lemmatization - Reducing words to their base or dictionary form


lemmatizer = WordNetLemmatizer()
lemmas = [[Link](word, pos='v') for word in tokens] # pos='v' indicates verbs
print("\nLemmas:", lemmas)

OUTPUT:

Tokens: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '.']

POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the',
'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]

Stems: ['the', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazi', 'dog', '.']

Lemmas: ['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog', '.']

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

RESULTS:
Running the program with the sample sentence "The quick brown fox jumps over the lazy dog."
produces the following results.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO: 2 WORD GENERATION

AIM:
The aim of this program is to generate new words or text sequences based on given input using basic
n-gram models.

PROGRAM:

import random

# Sample corpus of characters


corpus = "abcdefghijklmnopqrstuvwxyz"

# Function to generate a new word


def generate_word(length):
word = "".join([Link](corpus) for _ in range(length))
return word

# Generate a word of length 6


new_word = generate_word(6)
print("Generated Word:", new_word)

OUTPUT:

Generated Word: tnwaey

RESULTS:
This simple word generation program demonstrates the basics of creating new words using
character-level prediction.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO:3 MORPHOLOGY

AIM:
The aim of this program is to demonstrate morphological analysis in Natural Language Processing
(NLP). Specifically, the program will perform stemming and lemmatization, two common techniques in
morphology, to analyze the structure of words and reduce them to their base forms.

PROGRAM:

import nltk
from [Link] import PorterStemmer
from [Link] import WordNetLemmatizer

# Download necessary NLTK data


[Link]('wordnet')

# Sample list of words for morphological analysis


words = ["running", "jumps", "easily", "fairly", "happier"]

# Initialize the stemmer and lemmatizer


stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Perform stemming and lemmatization


print(f"{'Word':<10} {'Stem':<10} {'Lemma':<10}")
for word in words:
stem = [Link](word)
lemma = [Link](word, pos='v') # 'v' for verb
print(f"{word:<10} {stem:<10} {lemma:<10}")

OUTPUT:

Word Stem Lemma


running run run
jumps jump jump
easily easili easily
fairly fairli fairly
happier happier happier

RESULT:
Thus simple program illustrates the basic concepts of morphology in NLP, focusing on stemming
and lemmatization.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO: 4 N-grams

AIM:
The aim of this program is to demonstrate the concept of N-grams in Natural Language Processing
(NLP). The program will generate and display unigrams, bigrams, and trigrams from a given text. N-grams
are sequences of n words used to analyze and predict language patterns.

PROGRAM:

import nltk
from nltk import ngrams
from [Link] import word_tokenize

# Sample text
text = "Natural Language Processing is fascinating."

# Tokenize the text into words


tokens = word_tokenize(text)

# Generate Unigrams (n=1)


unigrams = list(ngrams(tokens, 1))
print("Unigrams:", unigrams)

# Generate Bigrams (n=2)


bigrams = list(ngrams(tokens, 2))
print("Bigrams:", bigrams)

# Generate Trigrams (n=3)


trigrams = list(ngrams(tokens, 3))
print("Trigrams:", trigrams)

OUTPUT:

Unigrams: [('Natural',), ('Language',), ('Processing',), ('is',), ('fascinating',), ('.',)]

Bigrams: [('Natural', 'Language'), ('Language', 'Processing'), ('Processing', 'is'), ('is', 'fascinating'),


('fascinating', '.')]

Trigrams: [('Natural', 'Language', 'Processing'), ('Language', 'Processing', 'is'), ('Processing', 'is', 'fascinating'),
('is', 'fascinating', '.')]

RESULTS:
Thus program provides a basic understanding of N-grams in NLP. Unigrams capture individual
words, bigrams represent pairs of words, and trigrams capture triplets of words. N-grams are fundamental in
language modeling, text prediction, and many other NLP tasks.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO:5 N-GRAM SMOOTHING

AIM:
The aim of this program is to demonstrate the concept of N-grams smoothing in Natural Language
Processing (NLP). Smoothing is used to handle the issue of zero probabilities in language models by
assigning small probabilities to unseen N-grams. This program will use Laplace (Add-One) Smoothing for
bigram generation.

PROGRAM:

from collections import Counter

# Sample text
text = "I love natural language processing. I love learning about NLP."

# Tokenize the text into words


tokens = [Link]().split()

# Generate bigrams
bigrams = [(tokens[i], tokens[i + 1]) for i in range(len(tokens) - 1)]

# Count bigrams and unigrams


bigram_counts = Counter(bigrams)
unigram_counts = Counter(tokens)

# Vocabulary size
vocab_size = len(unigram_counts)

# Function to calculate bigram probability with Laplace smoothing


def bigram_probability(bigram):
unigram = bigram[0]
return (bigram_counts[bigram] + 1) / (unigram_counts[unigram] + vocab_size)

# Example bigrams and their smoothed probabilities


example_bigrams = [('i', 'love'), ('love', 'natural'), ('about', 'nlp'), ('nlp', 'is')]

# Print results
print(f"{'Bigram':<20} {'Probability':<10}")
for bigram in example_bigrams:
prob = bigram_probability(bigram)
print(f"{str(bigram):<20} {prob:.4f}")

OUTPUT:
Bigram Probability
('i', 'love') 0.2727
('love', 'natural') 0.1818
('about', 'nlp') 0.1818
('nlp', 'is') 0.0909

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

RESULTS:
Thus program demonstrates how to apply Laplace smoothing to calculate bigram probabilities in a
simple text.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO:6 POS TAGGING: HIDDEN MARKOV MODEL

AIM:
The aim of this program is to implement Part-of-Speech (POS) tagging using a Hidden Markov Model
(HMM) in Natural Language Processing (NLP).

PROGRAM:

import nltk
from [Link] import hmm
from [Link] import treebank

# Ensure that the necessary NLTK data is downloaded


[Link]('treebank')
[Link]('universal_tagset')

# Load the Treebank corpus and use a simplified tagset


train_data = treebank.tagged_sents(tagset='universal')

# Train a Hidden Markov Model POS tagger


trainer = [Link]()
hmm_tagger = [Link](train_data)

# Test sentence
test_sentence = "I love natural language processing".split()

# Perform POS tagging


tagged_sentence = hmm_tagger.tag(test_sentence)

print("Test Sentence:", test_sentence)


print("Tagged Sentence:", tagged_sentence)

OUTPUT
Test Sentence: ['I', 'love', 'natural', 'language', 'processing']
Tagged Sentence: [('I', 'PRON'), ('love', 'VERB'), ('natural', 'ADJ'), ('language', 'NOUN'), ('processing',
'NOUN')]

RESULT:
This program demonstrates how a Hidden Markov Model can be used for POS tagging in NLP. The
model is trained on a dataset of labeled sentences and is capable of predicting POS tags for new sentences
based on statistical probabilities, showing the effectiveness of HMMs in sequence labeling tasks.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO:7 POS TAGGING WITH VITERBI DECODING

AIM: The aim of this program is to demonstrate Part-of-Speech (POS) tagging using Viterbi Decoding in a
Hidden Markov Model (HMM)

PROGRAM:

import nltk
from [Link] import hmm
from [Link] import treebank

# Ensure that the necessary NLTK data is downloaded


[Link]('treebank')
[Link]('universal_tagset')

# Load the Treebank corpus and use a simplified tagset


train_data = treebank.tagged_sents(tagset='universal')

# Train a Hidden Markov Model POS tagger


trainer = [Link]()
hmm_tagger = [Link](train_data)

# Test sentence
test_sentence = "I enjoy learning about natural language processing".split()

# Perform POS tagging using Viterbi Decoding


tagged_sentence = hmm_tagger.tag(test_sentence)

print("Test Sentence:", test_sentence)


print("Tagged Sentence:", tagged_sentence)

OUTPUT:
Test Sentence: ['I', 'enjoy', 'learning', 'about', 'natural', 'language', 'processing']
Tagged Sentence: [('I', 'PRON'), ('enjoy', 'VERB'), ('learning', 'VERB'), ('about', 'ADP'), ('natural', 'ADJ'),
('language', 'NOUN'), ('processing', 'NOUN')]

RESULTS:

This program demonstrates the use of Viterbi Decoding within an HMM for POS tagging in NLP.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO: 8 BUILDING POS TAGGER

AIM: To build a simple Part-of-Speech (POS) tagger using the Natural Language Toolkit (NLTK) library in
Python. The tagger will assign POS tags to each word in a given sentence.

PROGRAM

import nltk
from [Link] import word_tokenize
from [Link] import treebank
from [Link] import UnigramTagger

# Download required resources


[Link]('punkt')
[Link]('treebank')

# Load training data


train_sents = treebank.tagged_sents()

# Initialize and train the POS tagger


tagger = UnigramTagger(train_sents)

# Define a sentence for tagging


sentence = "The quick brown fox jumps over the lazy dog."

# Tokenize the sentence


tokens = word_tokenize(sentence)

# Tag the tokens


tags = [Link](tokens)

# Display the results


print("Sentence:", sentence)
print("Tokens:", tokens)
print("POS Tags:", tags)

OUTPUT:
Sentence: The quick brown fox jumps over the lazy dog.
Tokens: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '.']
POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('t he',
'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]

RESULT:

Thus each token is paired with its corresponding POS tag.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO: 9 CHUNKING

Aim:
The aim of chunking in Natural Language Processing (NLP) is to divide a sentence into syntactically
correlated parts, such as noun phrases (NP), verb phrases (VP), and prepositional phrases (PP). This helps in
better understanding the structure of the sentence and is often used in information extraction and syntactic
analysis.

Algorithm:
1. Tokenization: Split the input sentence into tokens (words).
2. POS Tagging: Assign Part-of-Speech (POS) tags to each token.
3. Chunking: Identify and extract phrases from the tagged sentence based on predefined patterns using
regular expressions.

Program:

import nltk
from [Link] import RegexpParser
from [Link] import word_tokenize
from nltk import pos_tag

# Sample sentence
sentence = "The quick brown fox jumps over the lazy dog."

# Tokenize the sentence


tokens = word_tokenize(sentence)

# POS tagging
tagged_tokens = pos_tag(tokens)

# Define chunking pattern (e.g., NP = Noun Phrase)


chunk_pattern = """
NP: {<DT>?<JJ>*<NN.*>} # Noun Phrases
VP: {<VB.*>} # Verb Phrases
PP: {<IN><NP>} # Prepositional Phrases
"""

# Create a chunk parser


chunk_parser = RegexpParser(chunk_pattern)

# Parse the sentence


chunked = chunk_parser.parse(tagged_tokens)

# Display the chunked output


[Link]()

# Output the chunked structure


print(chunked)

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

Output:
(S
(NP The/DT quick/JJ brown/JJ fox/NN)
(VP jumps/VBZ)
(PP over/IN
(NP the/DT lazy/JJ dog/NN))

RESULT:
 Noun Phrase (NP): "The quick brown fox"
 Verb Phrase (VP): "jumps"
 Prepositional Phrase (PP): "over the lazy dog"
This chunking process helps to identify meaningful syntactic units in the sentence and is widely used in
applications like information extraction, question answering, and machine translation.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

EXP NO: 10 BUILDING CHUNKER

Aim:
The aim of building a Chunker in NLP is to identify and group words in a sentence into predefined chunks
such as noun phrases, verb phrases, etc., to analyze sentence structure and extract meaningful information. A
chunker uses Part-of-Speech (PoS) tagging information to create these chunks.

Algorithm:
1. Tokenize the sentence: Break the sentence into individual words.
2. PoS tagging: Assign Part-of-Speech tags (such as Noun, Verb, Adjective, etc.) to each word in the
sentence.
3. Define Chunking Rules: Specify patterns to group words based on their PoS tags (for example, NP
for Noun Phrase, VP for Verb Phrase).
4. Apply Chunking Rules: Use regular expressions or a ChunkParser to apply the rules and create the
chunks.
5. Output the Chunks: Output the chunks formed in the sentence.

Program

import nltk
from [Link] import RegexpParser
from [Link] import word_tokenize
from nltk import pos_tag

# Download necessary NLTK data


[Link]('punkt')
[Link]('averaged_perceptron_tagger')
[Link]('maxent_ne_chunker')
[Link]('words')

# Define a simple sentence


sentence = "The quick brown fox jumps over the lazy dog"

# Step 1: Tokenize the sentence


tokens = word_tokenize(sentence)

# Step 2: PoS tagging


tagged = pos_tag(tokens)

# Step 3: Define chunking rules


chunk_grammar = """
NP: {<DT>?<JJ>*<NN>} # Noun Phrase
VP: {<VB.*>} # Verb Phrase
"""

# Step 4: Apply the chunking rules


chunk_parser = RegexpParser(chunk_grammar)
chunked = chunk_parser.parse(tagged)

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

# Step 5: Output the chunks


print(chunked)

Output:
(S
(NP The/DT quick/JJ brown/JJ fox/NN)
jumps/VBZ
(PP over/IN)
(NP the/DT lazy/JJ dog/NN))

RESULT:
The sentence "The quick brown fox jumps over the lazy dog" is chunked as:
 Noun Phrase (NP): "The quick brown fox"
 Verb Phrase (VP): "jumps"
 Prepositional Phrase (PP): "over"
 Noun Phrase (NP): "the lazy dog"
The program identifies noun phrases (NP) and verb phrases (VP) based on the chunking rules defined using
regular expressions.

Downloaded by Mohana Priya (mohana.p04@[Link])


lOMoARcPSD|57829057

Downloaded by Mohana Priya (mohana.p04@[Link])

You might also like