0% found this document useful (0 votes)

29 views19 pages

NLP 5th Unit

N-grams are contiguous sequences of 'n' items, commonly used in natural language processing, where 'n' can represent unigrams, bigrams, trigrams, etc. The document explains how n-gram models estimate the probability of a word based on the preceding words, utilizing techniques like Maximum Likelihood Estimation and Bayesian Estimation for parameter estimation. It also provides examples of generating n-grams using Python's NLTK library.

Uploaded by

frag3676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views19 pages

NLP 5th Unit

Uploaded by

frag3676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

N-grams

 N-grams are contiguous sequences of ’n’

items, typically words in the context of
NLP.
 Theseitems can be characters, words, or
even syllables, depending on the
granularity desired. The value of ’n’
determines the order of the N-gram.
N-grams Example

Examples:
Unigrams (1-grams): Single words, e.g., “cat,” “dog.”
Bigrams (2-grams): Pairs of consecutive words, e.g., “natural
language,” “deep learning.”
Trigrams (3-grams): Triplets of consecutive words, e.g., “machine
learning model,” “data science approach.”
4-grams, 5-grams, etc.: Sequences of four, five, or more consecutive
words.
N-grams
N-grams Formula

P(w1,w2,...,wn)≈i=1∏nP(wi∣wi−(n−1),...,
wi−1)
wi: The current word at position iii
N-grams

 The n in n-grams specify the size of a number of items

to consider, unigram for n =1, bigram for n = 2, and
trigram for n = 3, and so on.

 n-gram models are a specific type of language model

that rely on how frequently sequences of tokens
(words or characters) occur in a text.
bigrams

"The cat eats fish.", the pairs of consecutive words

are:
(The, cat)
(cat, eats)
(eats, fish)
(fish, .)
These are all 2-token sequences, hence bigrams.
trigrams

n the trigrams example, the 3-token sequences from the

same sentence are:
(The, cat, eats)
(cat, eats, fish)
(eats, fish, .)
These are all 3-token sequences, hence trigrams.
N-gram model

 This formula estimates the probability of a word

wnw_nwn, given all the previous words in the
sequence.
 But since it's impractical to consider all previous
words (especially for long sequences),
 n-gram models approximate this probability
using only the last few words — specifically, the
last n−1n-1n−1 words.
N -Grma Model Example Program

import nltk
nltk.download('punkt')

from nltk import ngrams

from nltk.tokenize import word_tokenize

# Example sentence
sentence = "N-grams enhance language processing tasks."
N -Grma Model Example Program

 # Tokenize the sentence

 tokens = word_tokenize(sentence)

 # Generate bigrams
 bigrams = list(ngrams(tokens, 2))

 # Generate trigrams
 trigrams = list(ngrams(tokens, 3))
Example Program

# Print the results

print("Bigrams:", bigrams)
print("Trigrams:", trigrams)
Out Put : Output:
Bigrams: [('N-grams', 'enhance'), ('enhance', 'language'), ('language',
'processing'), ('processing', 'tasks'), ('tasks', '.')]
Trigrams: [('N-grams', 'enhance', 'language'), ('enhance', 'language',
'processing'), ('language', 'processing', 'tasks'), ('processing', 'tasks', '.')]
'''
Language modeling

 Language modeling (LM) is the use of various statistical and probabilistic

techniques to determine the probability of a given sequence of words occurring
in a sentence.

 Example: The sentence "I am going to school" is more probable than "School
going I to am".
 Given "I am going to", the model might assign:
 "school" → 0.7
 "market" → 0.2
 "banana" → 0.0
Parameter estimation

 Parameter estimation is the process of finding the best

values for these "knobs" (parameters) based on the data
the model is trained on.
 The goal is to adjust the parameters so that the model
can accurately perform the desired NLP task (e.g.,
predicting the next word, classifying text, translating
languages).
Example :parameter Estimation
Maximum Likelihood Estimation
 Maximum Likelihood Estimation (MLE) is a key method in statistical
modeling, used to estimate parameters by finding the best fit to the observed
data

 Maximum Likelihood Estimation (MLE) is a method used to find the

values of model parameters that make the observed data most probable.
 It works by maximizing a likelihood function, which tells us how likely it is to
observe the data, given different parameter values
Likelihood Estimation

 Likelihood Estimation is a statistical method used to

estimate the parameters of a probability distribution or a
statistical model based on observed data.
 Unlike traditional estimation methods that focus on
finding the "best-fitting" parameters, likelihood
estimation frames the problem in terms of the likelihood
function
Example :Maximum Likelihood
Estimation
Example: Estimating the Probability of a Coin Toss
Suppose you have a coin, and you don't know whether it's
fair (i.e., the probability of heads, P(H), might not be 0.5).
You toss the coin 10 times, and you observe:
7 Heads
3 Tails
Your goal is to estimate the probability of heads (θ) using
Maximum Likelihood Estimation.
Example
Example 2: parameter Estimation

 Bayesian Estimation is a method of statistical inference in which

we estimate unknown parameters by combining:
 Prior beliefs (what we assume or know before seeing the data),
and
 Observed data (evidence)
 using Bayes' Theorem to calculate a posterior probability
distribution for the parameter
Bayesian Formula

NLP Unit-5
No ratings yet
NLP Unit-5
13 pages
N-Gram Language Modeling Techniques
No ratings yet
N-Gram Language Modeling Techniques
87 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Understanding N-grams in Language Modeling
No ratings yet
Understanding N-grams in Language Modeling
78 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
NLP Techniques for Word Prediction
No ratings yet
NLP Techniques for Word Prediction
77 pages
Introduction to N-grams in Language Modeling
No ratings yet
Introduction to N-grams in Language Modeling
76 pages
Unit 2
No ratings yet
Unit 2
75 pages
08 NLP - N-Gram Language Models
No ratings yet
08 NLP - N-Gram Language Models
65 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
Language Modeling with N-grams
No ratings yet
Language Modeling with N-grams
79 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
N Grams
No ratings yet
N Grams
51 pages
N-Gram Language Model Overview
No ratings yet
N-Gram Language Model Overview
28 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Introduction to N-grams in NLP
No ratings yet
Introduction to N-grams in NLP
88 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Ngrams
No ratings yet
Ngrams
22 pages
LM 24 Aug
No ratings yet
LM 24 Aug
75 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
NLP Language Models Explained
No ratings yet
NLP Language Models Explained
65 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Linguistics & N-Gram Models
No ratings yet
Linguistics & N-Gram Models
47 pages
N-gram Language Modeling Overview
No ratings yet
N-gram Language Modeling Overview
84 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
NLP
No ratings yet
NLP
12 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
26 pages
Unit-5 Notes NLP
No ratings yet
Unit-5 Notes NLP
28 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Introduction to N-grams in Language Modeling
No ratings yet
Introduction to N-grams in Language Modeling
97 pages
N-gram Models in NLP Explained
No ratings yet
N-gram Models in NLP Explained
28 pages
Grammar-Based Language Modeling Overview
No ratings yet
Grammar-Based Language Modeling Overview
36 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
NLP for Language Model Enthusiasts
No ratings yet
NLP for Language Model Enthusiasts
74 pages
NLP Units Iv V
No ratings yet
NLP Units Iv V
30 pages
2.1 Chap NLP Ngrams
No ratings yet
2.1 Chap NLP Ngrams
37 pages
Lecture 6 To 8 N-Gram
No ratings yet
Lecture 6 To 8 N-Gram
19 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
04 - N-Gram Language Models
No ratings yet
04 - N-Gram Language Models
41 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Language Models L3-6
No ratings yet
Language Models L3-6
49 pages
Language Modeling Techniques Overview
No ratings yet
Language Modeling Techniques Overview
30 pages
Language Modeling in NLP
No ratings yet
Language Modeling in NLP
15 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Lecture 10 - N-Gram Language Models4 - Unit 2
No ratings yet
Lecture 10 - N-Gram Language Models4 - Unit 2
4 pages
7 Path Profile
No ratings yet
7 Path Profile
19 pages
T4 Arrays
No ratings yet
T4 Arrays
16 pages
Lecture 4 WATER QUALITY MODELLING
No ratings yet
Lecture 4 WATER QUALITY MODELLING
78 pages
Simple Harmonic Motion Questions
No ratings yet
Simple Harmonic Motion Questions
6 pages
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
100% (1)
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
55 pages
Servo Motor iSV2-60TR Specs
No ratings yet
Servo Motor iSV2-60TR Specs
1 page
Iso 2911 2004
No ratings yet
Iso 2911 2004
9 pages
Servo Electra Transformer Oil
No ratings yet
Servo Electra Transformer Oil
2 pages
VLSI Architecture For Depth Invariant Real-Time Fixed/Random Valued Impulse Noise Removal Algorithm For Back-End of Ultrasonography Systems
No ratings yet
VLSI Architecture For Depth Invariant Real-Time Fixed/Random Valued Impulse Noise Removal Algorithm For Back-End of Ultrasonography Systems
13 pages
Experiment 5-1
No ratings yet
Experiment 5-1
8 pages
308 PDF
No ratings yet
308 PDF
15 pages
Tropical Meteorology (Revised Edition) by G.C. Asnani - Chapter 1
100% (3)
Tropical Meteorology (Revised Edition) by G.C. Asnani - Chapter 1
122 pages
Ee3301-Part A With Answer PDF
No ratings yet
Ee3301-Part A With Answer PDF
10 pages
Gate Drive Design for Large MOSFETs
No ratings yet
Gate Drive Design for Large MOSFETs
10 pages
Aptitude and Programming Questions Guide
No ratings yet
Aptitude and Programming Questions Guide
179 pages
My Project
No ratings yet
My Project
5 pages
Radial Flow Between Parallel Disks
No ratings yet
Radial Flow Between Parallel Disks
29 pages
Kta19 G3
100% (1)
Kta19 G3
2 pages
MC 10202184 0001
No ratings yet
MC 10202184 0001
9 pages
232 EEE 2101 A Class 01 Notes
No ratings yet
232 EEE 2101 A Class 01 Notes
16 pages
Unit-5-Np Hard and NP Complete Problems-1
100% (1)
Unit-5-Np Hard and NP Complete Problems-1
32 pages
Qns Maps and Mapwork
No ratings yet
Qns Maps and Mapwork
24 pages
Chionh 1999
No ratings yet
Chionh 1999
12 pages
7TH Ais Mock Paper
No ratings yet
7TH Ais Mock Paper
3 pages
Sap PP Integration Flow
67% (3)
Sap PP Integration Flow
2 pages
Characteristics of Contemporary Architecture
No ratings yet
Characteristics of Contemporary Architecture
4 pages
MTH 102 Calculus of Vector Functions of A Real Variable
No ratings yet
MTH 102 Calculus of Vector Functions of A Real Variable
4 pages
BT KTL
No ratings yet
BT KTL
17 pages
8720D Manual Networl Analyzer PDF
No ratings yet
8720D Manual Networl Analyzer PDF
477 pages
Wireless Electricity: Techniques & Applications
No ratings yet
Wireless Electricity: Techniques & Applications
33 pages

NLP 5th Unit

Uploaded by

NLP 5th Unit

Uploaded by

N-grams

 N-grams are contiguous sequences of ’n’

 The n in n-grams specify the size of a number of items

 n-gram models are a specific type of language model

"The cat eats fish.", the pairs of consecutive words

n the trigrams example, the 3-token sequences from the

 This formula estimates the probability of a word

from nltk import ngrams

 # Tokenize the sentence

# Print the results

 Language modeling (LM) is the use of various statistical and probabilistic

 Parameter estimation is the process of finding the best

 Maximum Likelihood Estimation (MLE) is a method used to find the

 Likelihood Estimation is a statistical method used to

 Bayesian Estimation is a method of statistical inference in which

You might also like