N-grams
N-grams are contiguous sequences of ’n’
items, typically words in the context of
NLP.
Theseitems can be characters, words, or
even syllables, depending on the
granularity desired. The value of ’n’
determines the order of the N-gram.
N-grams Example
Examples:
Unigrams (1-grams): Single words, e.g., “cat,” “dog.”
Bigrams (2-grams): Pairs of consecutive words, e.g., “natural
language,” “deep learning.”
Trigrams (3-grams): Triplets of consecutive words, e.g., “machine
learning model,” “data science approach.”
4-grams, 5-grams, etc.: Sequences of four, five, or more consecutive
words.
N-grams
N-grams Formula
P(w1,w2,...,wn)≈i=1∏nP(wi∣wi−(n−1),...,
wi−1)
wi: The current word at position iii
N-grams
The n in n-grams specify the size of a number of items
to consider, unigram for n =1, bigram for n = 2, and
trigram for n = 3, and so on.
n-gram models are a specific type of language model
that rely on how frequently sequences of tokens
(words or characters) occur in a text.
bigrams
"The cat eats fish.", the pairs of consecutive words
are:
(The, cat)
(cat, eats)
(eats, fish)
(fish, .)
These are all 2-token sequences, hence bigrams.
trigrams
n the trigrams example, the 3-token sequences from the
same sentence are:
(The, cat, eats)
(cat, eats, fish)
(eats, fish, .)
These are all 3-token sequences, hence trigrams.
N-gram model
This formula estimates the probability of a word
wnw_nwn, given all the previous words in the
sequence.
But since it's impractical to consider all previous
words (especially for long sequences),
n-gram models approximate this probability
using only the last few words — specifically, the
last n−1n-1n−1 words.
N -Grma Model Example Program
import nltk
nltk.download('punkt')
from nltk import ngrams
from nltk.tokenize import word_tokenize
# Example sentence
sentence = "N-grams enhance language processing tasks."
N -Grma Model Example Program
# Tokenize the sentence
tokens = word_tokenize(sentence)
# Generate bigrams
bigrams = list(ngrams(tokens, 2))
# Generate trigrams
trigrams = list(ngrams(tokens, 3))
Example Program
# Print the results
print("Bigrams:", bigrams)
print("Trigrams:", trigrams)
Out Put : Output:
Bigrams: [('N-grams', 'enhance'), ('enhance', 'language'), ('language',
'processing'), ('processing', 'tasks'), ('tasks', '.')]
Trigrams: [('N-grams', 'enhance', 'language'), ('enhance', 'language',
'processing'), ('language', 'processing', 'tasks'), ('processing', 'tasks', '.')]
'''
Language modeling
Language modeling (LM) is the use of various statistical and probabilistic
techniques to determine the probability of a given sequence of words occurring
in a sentence.
Example: The sentence "I am going to school" is more probable than "School
going I to am".
Given "I am going to", the model might assign:
"school" → 0.7
"market" → 0.2
"banana" → 0.0
Parameter estimation
Parameter estimation is the process of finding the best
values for these "knobs" (parameters) based on the data
the model is trained on.
The goal is to adjust the parameters so that the model
can accurately perform the desired NLP task (e.g.,
predicting the next word, classifying text, translating
languages).
Example :parameter Estimation
Maximum Likelihood Estimation
Maximum Likelihood Estimation (MLE) is a key method in statistical
modeling, used to estimate parameters by finding the best fit to the observed
data
Maximum Likelihood Estimation (MLE) is a method used to find the
values of model parameters that make the observed data most probable.
It works by maximizing a likelihood function, which tells us how likely it is to
observe the data, given different parameter values
Likelihood Estimation
Likelihood Estimation is a statistical method used to
estimate the parameters of a probability distribution or a
statistical model based on observed data.
Unlike traditional estimation methods that focus on
finding the "best-fitting" parameters, likelihood
estimation frames the problem in terms of the likelihood
function
Example :Maximum Likelihood
Estimation
Example: Estimating the Probability of a Coin Toss
Suppose you have a coin, and you don't know whether it's
fair (i.e., the probability of heads, P(H), might not be 0.5).
You toss the coin 10 times, and you observe:
7 Heads
3 Tails
Your goal is to estimate the probability of heads (θ) using
Maximum Likelihood Estimation.
Example
Example 2: parameter Estimation
Bayesian Estimation is a method of statistical inference in which
we estimate unknown parameters by combining:
Prior beliefs (what we assume or know before seeing the data),
and
Observed data (evidence)
using Bayes' Theorem to calculate a posterior probability
distribution for the parameter
Bayesian Formula