100% found this document useful (1 vote)

100 views4 pages

N-Gram Models in NLP Explained

Uploaded by

purid9991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

100 views4 pages

N-Gram Models in NLP Explained

Uploaded by

purid9991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

N-Gram Model in NLP

Dr Vivek K Verma

Introduction to N-Gram Models

An N-Gram model is a probabilistic language model used in Natural Language
Processing (NLP) to predict the next word in a sequence based on the previous
N − 1 words. The N-Gram model is based on the Markov assumption, which
simplifies the computation by assuming that the probability of a word depends
only on the previous N − 1 words, rather than the entire sequence.
The model is called an N-Gram because it breaks down a sequence of words
into contiguous sequences of N words.

Why N-Gram Models?

- Efficient: The N-Gram model simplifies language modeling by considering only
local context.
- Scalable: It can be applied to large corpora and various tasks such as
speech recognition, machine translation, and text prediction.
- Flexible: The choice of N determines the level of context captured. For
example, a 1-Gram (Unigram) only considers individual words, while a 2-Gram
(Bigram) considers word pairs.

N-Gram Probability Model

The probability of a word sequence W = w1 , w2 , . . . , wn can be computed using
the chain rule of probability:

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w1 , w2 ) · · · · · P (wn |w1 , w2 , . . . , wn−1 )

However, this becomes computationally expensive for large sequences. The

N-Gram model simplifies this by considering only the previous N − 1 words:
n
Y
P (W ) ≈ P (wi |wi−(N −1) , . . . , wi−1 )
i=1

For example, in a Bigram model (2-Gram), we have:

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w2 ) · · · · · P (wn |wn−1 )

1
Types of N-Gram Models
• Unigram Model (1-Gram): The probability of a word depends only
on itself.
P (W ) = P (w1 ) · P (w2 ) · P (w3 ) · · · · · P (wn )

• Bigram Model (2-Gram): The probability of a word depends on the

previous word.

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w2 ) · · · · · P (wn |wn−1 )

• Trigram Model (3-Gram): The probability of a word depends on the

two preceding words.

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w1 , w2 ) · · · · · P (wn |wn−2 , wn−1 )

Example: Bigram Model

Let’s walk through an example using a Bigram model (2-Gram) to calculate the
probability of a given sentence.
Consider the sentence: “I love NLP.”
We want to calculate the probability of this sentence using the Bigram model.

Step-by-Step Calculation
1. **Break the sentence into word pairs**:

“I love NLP” ⇒ (I, love), (love, NLP)

2. Calculate the probability of each word pair: Using a trained Bigram

model (from a corpus), let’s assume the following probabilities:

P (love|I) = 0.4, P (NLP|love) = 0.3

- The probability P (love|I) represents how often ”love” follows ”I” in the
corpus. - The probability P (NLP|love) represents how often ”NLP” follows
”love” in the corpus.
3. **Compute the sentence probability**:

P (“I love NLP”) = P (I) · P (love|I) · P (NLP|love)

Assuming P (I) = 0.1 (the unigram probability of ”I”):

P (“I love NLP”) = 0.1 · 0.4 · 0.3 = 0.012

Therefore, the probability of the sentence “I love NLP” under this Bigram
model is 0.012.

2
Applications of N-Gram Models
N-Gram models are widely used in several NLP applications, including:
• **Text Prediction**: Predicting the next word in a sequence based on the
previous words.
• **Speech Recognition**: Recognizing words in a speech based on phoneme
sequences.

• Machine Translation: Translating text from one language to another

using N-Gram probabilities.

Example: Trigram Model

To better understand the Trigram Model, let’s walk through an example where
we calculate the probability of a sentence based on the two preceding words.
Consider the sentence: “I love learning NLP.”
We will calculate the probability of this sentence using a Trigram model.

Step-by-Step Calculation
1. **Break the sentence into word triplets**:

“I love learning NLP” ⇒ (I, love, learning), (love, learning, NLP)

2. Calculate the probability of each word triplet: Using a trained Tri-

gram model, let’s assume the following probabilities:

P (learning|I, love) = 0.25, P (NLP|love, learning) = 0.4

3. **Calculate the unigram and bigram probabilities as needed for the first
two words**: - Assume:

P (I) = 0.1, P (love|I) = 0.3

4. Compute the sentence probability using the Trigram model:

P (“I love learning NLP”) = P (I)·P (love|I)·P (learning|I, love)·P (NLP|love, learning)

Substituting the values:

P (“I love learning NLP”) = 0.1 · 0.3 · 0.25 · 0.4 = 0.003

Therefore, the probability of the sentence “I love learning NLP” under

this Trigram model is 0.003.

3
Comparison with Bigram Model
The Trigram model provides more context than the Bigram model by consider-
ing an additional preceding word, which helps capture more linguistic structure.
For instance, phrases like “I love learning” may be common and hence carry dif-
ferent probabilities than “I love” followed by other words.

Advantages of Trigram Models

Trigram models can capture more nuances of language by considering a larger
context, which helps in applications where phrase structure and specific word
sequences are important, such as:
• **Text Prediction**: Better prediction accuracy due to more context.
• **Machine Translation**: Captures common three-word phrases that im-
prove translation quality.

• Speech Recognition: Recognizes context within phrases, improving

accuracy.
Trigram models, by considering two preceding words, offer a richer context
compared to Bigram models. This allows for improved predictions in tasks re-
quiring greater understanding of phrase structures and contextual relationships
between words.
The N-Gram model is a simple yet powerful probabilistic model used in
NLP for a variety of tasks. By considering the local context of words, N-Gram
models can capture linguistic patterns and are widely used in applications such
as speech recognition and machine translation. However, the choice of N and
the handling of unseen word pairs through smoothing techniques are crucial for
effective performance.

NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
1 N-Grams and Language Models Detailed
No ratings yet
1 N-Grams and Language Models Detailed
4 pages
N-gram Language Models Explained
No ratings yet
N-gram Language Models Explained
3 pages
Lecture 10 - N-Gram Language Models4 - Unit 2
No ratings yet
Lecture 10 - N-Gram Language Models4 - Unit 2
4 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
26 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Lecture 6 To 8 N-Gram
No ratings yet
Lecture 6 To 8 N-Gram
19 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
Ngrams
No ratings yet
Ngrams
22 pages
N-Gram in NLP
No ratings yet
N-Gram in NLP
15 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
Lecture 4 Language Models Updated
No ratings yet
Lecture 4 Language Models Updated
32 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Unit 2
No ratings yet
Unit 2
75 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
08 NLP - N-Gram Language Models
No ratings yet
08 NLP - N-Gram Language Models
65 pages
N Grams - Nptel Notes
No ratings yet
N Grams - Nptel Notes
75 pages
NLP Language Models Explained
No ratings yet
NLP Language Models Explained
65 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
Statistical Models for Word Prediction
No ratings yet
Statistical Models for Word Prediction
42 pages
Language Modeling with N-grams
No ratings yet
Language Modeling with N-grams
79 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
Language Models L3-6
No ratings yet
Language Models L3-6
49 pages
N-gram Language Modeling Overview
No ratings yet
N-gram Language Modeling Overview
84 pages
Linguistics & N-Gram Models
No ratings yet
Linguistics & N-Gram Models
47 pages
NLP
No ratings yet
NLP
12 pages
N-Gram Models in NLP Basics
No ratings yet
N-Gram Models in NLP Basics
33 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
Unit-5 Notes NLP
No ratings yet
Unit-5 Notes NLP
28 pages
NLP Unit-5
No ratings yet
NLP Unit-5
13 pages
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
No ratings yet
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
46 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Language Modeling
No ratings yet
Language Modeling
3 pages
N-Gram Language Modeling Techniques
No ratings yet
N-Gram Language Modeling Techniques
87 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
122a9012 N-Grams NLP Lab
No ratings yet
122a9012 N-Grams NLP Lab
3 pages
Introduction to N-grams in Language Modeling
No ratings yet
Introduction to N-grams in Language Modeling
97 pages
N-Grams and Smoothing: CSC 371: Spring 2012
No ratings yet
N-Grams and Smoothing: CSC 371: Spring 2012
39 pages
Chain Rule in N-Gram Language Models
No ratings yet
Chain Rule in N-Gram Language Models
24 pages
LM 24 Aug
No ratings yet
LM 24 Aug
75 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
PLM Language Models Overview
No ratings yet
PLM Language Models Overview
35 pages
122a9013 N-Grams NLP Lab
No ratings yet
122a9013 N-Grams NLP Lab
3 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Intro To Language Models - Soumyasis Mishra - 191001021003 - BCS4C
No ratings yet
Intro To Language Models - Soumyasis Mishra - 191001021003 - BCS4C
10 pages
2.1 Chap NLP Ngrams
No ratings yet
2.1 Chap NLP Ngrams
37 pages
UPDATED Kanishq Conference 1
No ratings yet
UPDATED Kanishq Conference 1
8 pages
General Studies I
No ratings yet
General Studies I
4 pages
Essay
No ratings yet
Essay
2 pages
CYK Algorithm
No ratings yet
CYK Algorithm
6 pages
PCFGs for Linguistics Students
No ratings yet
PCFGs for Linguistics Students
79 pages
Python Syntax
No ratings yet
Python Syntax
19 pages
Qns Maps and Mapwork
No ratings yet
Qns Maps and Mapwork
24 pages
Rakit PC Anda - Komputer Medan Toko Komputer Gaming Terlengkap
No ratings yet
Rakit PC Anda - Komputer Medan Toko Komputer Gaming Terlengkap
3 pages
04 Waveguide Discontinuities
No ratings yet
04 Waveguide Discontinuities
3 pages
Quantum Computing
No ratings yet
Quantum Computing
10 pages
FNDLOAD: Downloading Concurrent Programs
100% (1)
FNDLOAD: Downloading Concurrent Programs
5 pages
EE477 - mt1 - Fall2020 Solution
No ratings yet
EE477 - mt1 - Fall2020 Solution
10 pages
Manual Instalacion Servicio y Mantenimiento Del Switchgear
No ratings yet
Manual Instalacion Servicio y Mantenimiento Del Switchgear
104 pages
Team 20 Technical Project Report For The 2018 IREC: Invictus I
No ratings yet
Team 20 Technical Project Report For The 2018 IREC: Invictus I
65 pages
Ericsson Antenna System Catalog 2022 2023 Part 11 Compressed
No ratings yet
Ericsson Antenna System Catalog 2022 2023 Part 11 Compressed
41 pages
33N25 FairchildSemiconductor
No ratings yet
33N25 FairchildSemiconductor
8 pages
Data Satructure and Algorithm
No ratings yet
Data Satructure and Algorithm
4 pages
2-Inventory Management (Krajewski)
No ratings yet
2-Inventory Management (Krajewski)
83 pages
12th Physics 1mrks
No ratings yet
12th Physics 1mrks
151 pages
The Higher The Frequency
No ratings yet
The Higher The Frequency
17 pages
Power Transformers: Principles and Applications
No ratings yet
Power Transformers: Principles and Applications
14 pages
Instruction Manual For Moisture Meter
No ratings yet
Instruction Manual For Moisture Meter
110 pages
Edm 2018 8434989
No ratings yet
Edm 2018 8434989
4 pages
Simple Electrical Calculations
No ratings yet
Simple Electrical Calculations
2 pages
Experiment 5-1
No ratings yet
Experiment 5-1
8 pages
The Platonic Solid GR 7
No ratings yet
The Platonic Solid GR 7
20 pages
ASOE Biology 2022 Final-With-Answers
No ratings yet
ASOE Biology 2022 Final-With-Answers
31 pages
Math 101 - Supplemental Package Final
No ratings yet
Math 101 - Supplemental Package Final
83 pages
स्थापत्य अभियांत्रिकी प्रश्नपत्रिका
No ratings yet
स्थापत्य अभियांत्रिकी प्रश्नपत्रिका
9 pages
The Effect of Inventory Turnover and Debt-To-Equity Ratio On Profitability With Inflation As Moderation
No ratings yet
The Effect of Inventory Turnover and Debt-To-Equity Ratio On Profitability With Inflation As Moderation
22 pages
Anwana
No ratings yet
Anwana
16 pages
Line Segment Division Ratios
No ratings yet
Line Segment Division Ratios
12 pages
Iso 2911 2004
No ratings yet
Iso 2911 2004
9 pages
A Mathematical Simulation Model of A CH-47B Helicopter
No ratings yet
A Mathematical Simulation Model of A CH-47B Helicopter
136 pages
خار تخت
No ratings yet
خار تخت
1 page

N-Gram Models in NLP Explained

Uploaded by

N-Gram Models in NLP Explained

Uploaded by

N-Gram Model in NLP

Introduction to N-Gram Models

Why N-Gram Models?

N-Gram Probability Model

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w1 , w2 ) · · · · · P (wn |w1 , w2 , . . . , wn−1 )

However, this becomes computationally expensive for large sequences. The

For example, in a Bigram model (2-Gram), we have:

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w2 ) · · · · · P (wn |wn−1 )

• Bigram Model (2-Gram): The probability of a word depends on the

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w2 ) · · · · · P (wn |wn−1 )

• Trigram Model (3-Gram): The probability of a word depends on the

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w1 , w2 ) · · · · · P (wn |wn−2 , wn−1 )

Example: Bigram Model

“I love NLP” ⇒ (I, love), (love, NLP)

2. **Calculate the probability of each word pair**: Using a trained Bigram

P (love|I) = 0.4, P (NLP|love) = 0.3

P (“I love NLP”) = P (I) · P (love|I) · P (NLP|love)

Assuming P (I) = 0.1 (the unigram probability of ”I”):

P (“I love NLP”) = 0.1 · 0.4 · 0.3 = 0.012

• **Machine Translation**: Translating text from one language to another

Example: Trigram Model

“I love learning NLP” ⇒ (I, love, learning), (love, learning, NLP)

2. **Calculate the probability of each word triplet**: Using a trained Tri-

P (learning|I, love) = 0.25, P (NLP|love, learning) = 0.4

P (I) = 0.1, P (love|I) = 0.3

4. **Compute the sentence probability using the Trigram model**:

Substituting the values:

P (“I love learning NLP”) = 0.1 · 0.3 · 0.25 · 0.4 = 0.003

Therefore, the probability of the sentence “I love learning NLP” under

Advantages of Trigram Models

• **Speech Recognition**: Recognizes context within phrases, improving

You might also like

2. Calculate the probability of each word pair: Using a trained Bigram

• Machine Translation: Translating text from one language to another

2. Calculate the probability of each word triplet: Using a trained Tri-

4. Compute the sentence probability using the Trigram model:

• Speech Recognition: Recognizes context within phrases, improving