100% found this document useful (1 vote)
100 views4 pages

N-Gram Models in NLP Explained

Uploaded by

purid9991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
100 views4 pages

N-Gram Models in NLP Explained

Uploaded by

purid9991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

N-Gram Model in NLP

Dr Vivek K Verma

Introduction to N-Gram Models


An N-Gram model is a probabilistic language model used in Natural Language
Processing (NLP) to predict the next word in a sequence based on the previous
N − 1 words. The N-Gram model is based on the Markov assumption, which
simplifies the computation by assuming that the probability of a word depends
only on the previous N − 1 words, rather than the entire sequence.
The model is called an N-Gram because it breaks down a sequence of words
into contiguous sequences of N words.

Why N-Gram Models?


- Efficient: The N-Gram model simplifies language modeling by considering only
local context.
- Scalable: It can be applied to large corpora and various tasks such as
speech recognition, machine translation, and text prediction.
- Flexible: The choice of N determines the level of context captured. For
example, a 1-Gram (Unigram) only considers individual words, while a 2-Gram
(Bigram) considers word pairs.

N-Gram Probability Model


The probability of a word sequence W = w1 , w2 , . . . , wn can be computed using
the chain rule of probability:

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w1 , w2 ) · · · · · P (wn |w1 , w2 , . . . , wn−1 )

However, this becomes computationally expensive for large sequences. The


N-Gram model simplifies this by considering only the previous N − 1 words:
n
Y
P (W ) ≈ P (wi |wi−(N −1) , . . . , wi−1 )
i=1

For example, in a Bigram model (2-Gram), we have:

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w2 ) · · · · · P (wn |wn−1 )

1
Types of N-Gram Models
• Unigram Model (1-Gram): The probability of a word depends only
on itself.
P (W ) = P (w1 ) · P (w2 ) · P (w3 ) · · · · · P (wn )

• Bigram Model (2-Gram): The probability of a word depends on the


previous word.

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w2 ) · · · · · P (wn |wn−1 )

• Trigram Model (3-Gram): The probability of a word depends on the


two preceding words.

P (W ) = P (w1 ) · P (w2 |w1 ) · P (w3 |w1 , w2 ) · · · · · P (wn |wn−2 , wn−1 )

Example: Bigram Model


Let’s walk through an example using a Bigram model (2-Gram) to calculate the
probability of a given sentence.
Consider the sentence: “I love NLP.”
We want to calculate the probability of this sentence using the Bigram model.

Step-by-Step Calculation
1. **Break the sentence into word pairs**:

“I love NLP” ⇒ (I, love), (love, NLP)

2. **Calculate the probability of each word pair**: Using a trained Bigram


model (from a corpus), let’s assume the following probabilities:

P (love|I) = 0.4, P (NLP|love) = 0.3

- The probability P (love|I) represents how often ”love” follows ”I” in the
corpus. - The probability P (NLP|love) represents how often ”NLP” follows
”love” in the corpus.
3. **Compute the sentence probability**:

P (“I love NLP”) = P (I) · P (love|I) · P (NLP|love)

Assuming P (I) = 0.1 (the unigram probability of ”I”):

P (“I love NLP”) = 0.1 · 0.4 · 0.3 = 0.012

Therefore, the probability of the sentence “I love NLP” under this Bigram
model is 0.012.

2
Applications of N-Gram Models
N-Gram models are widely used in several NLP applications, including:
• **Text Prediction**: Predicting the next word in a sequence based on the
previous words.
• **Speech Recognition**: Recognizing words in a speech based on phoneme
sequences.

• **Machine Translation**: Translating text from one language to another


using N-Gram probabilities.

Example: Trigram Model


To better understand the Trigram Model, let’s walk through an example where
we calculate the probability of a sentence based on the two preceding words.
Consider the sentence: “I love learning NLP.”
We will calculate the probability of this sentence using a Trigram model.

Step-by-Step Calculation
1. **Break the sentence into word triplets**:

“I love learning NLP” ⇒ (I, love, learning), (love, learning, NLP)

2. **Calculate the probability of each word triplet**: Using a trained Tri-


gram model, let’s assume the following probabilities:

P (learning|I, love) = 0.25, P (NLP|love, learning) = 0.4

3. **Calculate the unigram and bigram probabilities as needed for the first
two words**: - Assume:

P (I) = 0.1, P (love|I) = 0.3

4. **Compute the sentence probability using the Trigram model**:

P (“I love learning NLP”) = P (I)·P (love|I)·P (learning|I, love)·P (NLP|love, learning)

Substituting the values:

P (“I love learning NLP”) = 0.1 · 0.3 · 0.25 · 0.4 = 0.003

Therefore, the probability of the sentence “I love learning NLP” under


this Trigram model is 0.003.

3
Comparison with Bigram Model
The Trigram model provides more context than the Bigram model by consider-
ing an additional preceding word, which helps capture more linguistic structure.
For instance, phrases like “I love learning” may be common and hence carry dif-
ferent probabilities than “I love” followed by other words.

Advantages of Trigram Models


Trigram models can capture more nuances of language by considering a larger
context, which helps in applications where phrase structure and specific word
sequences are important, such as:
• **Text Prediction**: Better prediction accuracy due to more context.
• **Machine Translation**: Captures common three-word phrases that im-
prove translation quality.

• **Speech Recognition**: Recognizes context within phrases, improving


accuracy.
Trigram models, by considering two preceding words, offer a richer context
compared to Bigram models. This allows for improved predictions in tasks re-
quiring greater understanding of phrase structures and contextual relationships
between words.
The N-Gram model is a simple yet powerful probabilistic model used in
NLP for a variety of tasks. By considering the local context of words, N-Gram
models can capture linguistic patterns and are widely used in applications such
as speech recognition and machine translation. However, the choice of N and
the handling of unseen word pairs through smoothing techniques are crucial for
effective performance.

You might also like