Language Modelling

The document discusses Natural Language Processing (NLP) with a focus on language modeling, which predicts the probability of word sequences. It outlines two main types of language modeling: statistical and neural, detailing methods such as N-grams and Maximum Likelihood Estimation. Additionally, it covers evaluation techniques for language models, including intrinsic and extrinsic evaluations, along with metrics like cosine similarity, accuracy, F1 score, and perplexity.

Uploaded by

ha0744106123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views17 pages

Language Modelling

Uploaded by

ha0744106123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Natural Language

Processing
By
Dr. Pankaj Dadure
Assistant Professor
School of Computer Science
UPES Dehradun
Language Modelling
• Language modeling is the way of determining the
probability of any sequence of words.
• Language modeling is the task of predicting the
next word or character in a document and can be
used to train language models that can be applied
to a wide range of natural language tasks like text
generation, text classification, and question
answering.
Methods of Language Modelings

Two types Statistical Language Modelings: Statistical Language Modeling, or

Language Modeling, is the development of probabilistic models that are
of able to predict the next word in the sequence given the words that
Language precede. Examples such as N-gram language modeling.
Modelings:
Neural Language Modelings: Neural network methods are achieving
better results than classical methods both on standalone language
models and when models are incorporated into larger models on
challenging tasks like speech recognition and machine translation. A way
of performing a neural language model is through word embeddings.
Statistical Language Modelings
• It is also called as probabilistic language modeling.
• Goal: Compute the probability of sentence or sequence of words

P(w)= P(𝑤1, 𝑤2, 𝑤3, 𝑤4, …. 𝑤𝑛 )

• Related task: Probability of an upcoming words

P(𝑤1, 𝑤2, 𝑤3, 𝑤4 , 𝑤5 ) = P(𝑤5 | 𝑤1, 𝑤2, 𝑤3, 𝑤4 )

Reminder: The Chain Rule
• The definition of conditional probabilities
P(B|A)P(A) 𝑃(𝐴,𝐵)
P(A|B)= rewrite as: P(A|B)=
𝑃(𝐵) 𝑃(𝐵)

P(𝑤1 , 𝑤2 , 𝑤3 ,…, 𝑤𝑛 ) = ς𝑖 𝑃(𝑤𝑖 |𝑤1 , 𝑤2 , 𝑤3 , … , 𝑤𝑖−1 )

Note: The chain rule shows the link between the joint probability of a
sequence and the conditional probability of a word given previous
words.
Chain rule to Markov Model
• The previous equation suggests that we could estimate the joint probability of an
entire sequence of words by multiplying together a number of conditional
probabilities.
• But using the chain rule doesn’t really seem to help us! We don’t know any way to
compute the exact probability of a word given a long sequence of preceding words,
P(𝑤𝑛 |𝑤1𝑛−1 ).
• As we said above, we can’t just estimate by counting the number of times every
word occurs following every long string, because language is creative, and any
particular context might have never occurred before!
• The intuition of the N-gram model is that instead of computing the probability of a
word given its entire history, we will approximate the history by just the last few
words
Markov Model
• Models that assign probabilities to sequences of words are called language model.
• The simplest model that assigns probabilities to sentences and sequences of words, the
n-gram.
• An n-gram is a sequence n-gram of n words: a 2-gram (which we’ll call bigram) is a two-
word sequence of words like “please turn”, “turn your”, or ”your homework”, and a 3-
gram (a trigram) is a three-word sequence of words like “please turn your”, or “turn your
homework”.
Maximum Likelihood Estimation of N-Gram Model
Parameters
• Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the
parameters of a probability distribution that best describe a given dataset.
• The fundamental idea behind MLE is to find the values of the parameters that maximize
the likelihood of the observed data, assuming that the data are generated by the
specified distribution.
• A parameter is a numerical characteristic of a distribution.
• Normal distributions, as we know, have mean (µ) & variance (σ2) as parameters.
Binomial distributions have the number of trials (n) & probability of success (p) as
parameters. Gamma distributions have shape (k) and scale (θ) as parameters.
Exponential distributions have the inverse mean (λ) as the parameter.
Maximum Likelihood Estimation for bigram
probabilities
𝐶(𝑊𝑖−1, 𝑤𝑖 )
𝑃 𝑤𝑖 𝑤𝑖−1 =
𝐶 (𝑤𝑖−1 )

For Example: <s> I am Sam </s>

<s> Sam I am </s>
<s> I do not like the green eggs and ham </s>
2 1 2
P(I | <s>) = = 0.67 P(sam | <s>) = = 0.33 P(am | I) = = 0.67
3 3 3

1 1 1
P(</s> | sam) = = 0.5 P(sam | am) = = 0.5 P(do | I) = = 0.33
2 2 3
Intrinsic evaluation
• Intrinsic evaluation - Aims to measure the quality of embeddings by assessing their
performance on specific NLP tasks that are related to the embedding space itself, such
as word similarity, analogy, and classification.
• Cosine similarity

• Spearman correlation

• Accuracy
Cosine Similarity
Cosine similarity measures the similarity between two vectors by computing the
cosine of the angle between them. In the context of embeddings, cosine
similarity is often used to measure the similarity between two words, or between
a word and its context. The formula for cosine similarity is as follows:

𝑉1 ⋅ 𝑉2
cosine_similarity(𝑉1 , 𝑉2 ) =
∥ 𝑉1 ∥∥ 𝑉2 ∥

where v1 and v2 are the embeddings of two words

Spearman Correlation
• Spearman correlation measures the
monotonic relationship between two
variables, which can be the similarity scores
of two sets of words or phrases computed
by humans and by embeddings.
• A high Spearman correlation indicates that
the embeddings are able to capture the
semantic relationships between words that
humans perceive.
Accuracy
• Accuracy measures the performance of embeddings on classification tasks, such as
sentiment analysis or topic classification.
• Given a dataset of labeled examples, the embeddings are used to represent each
example, and a classifier is trained on these representations. The accuracy of the
classifier on a held-out test set is then used as a measure of the quality of the
embeddings.
Extrinsic evaluation
Extrinsic evaluation - aims to measure the quality of embeddings by assessing their
performance on downstream NLP tasks, such as machine translation or text classification,
that are not directly related to the embedding space itself.
F1 Score
F1 score is a metric commonly used in binary classification problems, such as sentiment
analysis or named entity recognition. It combines precision and recall into a single score
that ranges from 0 to 1. A high F1 score indicates that the embeddings are able to
capture the relevant features of the input data. The formula for F1 score is as follows:

F1=2⋅precision⋅recall / precision+recall
Perplexity
• It measures how well a language model can predict a held-out test
set of text, given the embeddings as input. A low perplexity
indicates that the embeddings are able to capture the semantic and
syntactic structures of the language. The formula for perplexity is as
follows:

Language Models
No ratings yet
Language Models
59 pages
Language Models L3-6
No ratings yet
Language Models L3-6
49 pages
PLM Language Models Overview
No ratings yet
PLM Language Models Overview
35 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
N-gram Language Modeling Overview
No ratings yet
N-gram Language Modeling Overview
84 pages
Unit 2
No ratings yet
Unit 2
75 pages
08 NLP - N-Gram Language Models
No ratings yet
08 NLP - N-Gram Language Models
65 pages
N-Gram Language Modeling Techniques
No ratings yet
N-Gram Language Modeling Techniques
87 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
Grammar-Based Language Modeling Overview
No ratings yet
Grammar-Based Language Modeling Overview
36 pages
N Grams
No ratings yet
N Grams
51 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
LM 24 Aug
No ratings yet
LM 24 Aug
75 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Lec 03
No ratings yet
Lec 03
31 pages
Ngrams
No ratings yet
Ngrams
22 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
Week 4
No ratings yet
Week 4
37 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
N-Gram Language Modelling Overview
No ratings yet
N-Gram Language Modelling Overview
40 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
NLP Techniques for Word Prediction
No ratings yet
NLP Techniques for Word Prediction
77 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
2003-A Neural Probabilistic Language Model 副本
No ratings yet
2003-A Neural Probabilistic Language Model 副本
19 pages
Word Embedding Techniques Explained
No ratings yet
Word Embedding Techniques Explained
9 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
Natural Language Processing - Language Modelling
No ratings yet
Natural Language Processing - Language Modelling
117 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
2.1 Chap NLP Ngrams
No ratings yet
2.1 Chap NLP Ngrams
37 pages
Understanding n-gram Models in AI
No ratings yet
Understanding n-gram Models in AI
32 pages
NLP Units Iv V
No ratings yet
NLP Units Iv V
30 pages
Module 2
No ratings yet
Module 2
26 pages
Understanding N-grams in Language Modeling
No ratings yet
Understanding N-grams in Language Modeling
78 pages
Language Modeling Techniques Overview
No ratings yet
Language Modeling Techniques Overview
30 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Language Models and POS Tagging in NLP
No ratings yet
Language Models and POS Tagging in NLP
59 pages
NLP Unit-5
No ratings yet
NLP Unit-5
13 pages
Lectures LM
No ratings yet
Lectures LM
57 pages
NLP Language Models Explained
No ratings yet
NLP Language Models Explained
65 pages
NLP
No ratings yet
NLP
12 pages
N-gram Models in Language Processing
No ratings yet
N-gram Models in Language Processing
59 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
Language Modeling Lecture Notes
No ratings yet
Language Modeling Lecture Notes
88 pages
N-gram Language Models Explained
No ratings yet
N-gram Language Models Explained
3 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
Neural Probabilistic Language Model
No ratings yet
Neural Probabilistic Language Model
7 pages
Language Models
No ratings yet
Language Models
34 pages
Lecture 7 - Language Modelling
No ratings yet
Lecture 7 - Language Modelling
107 pages
Bca Be Enabppac.
No ratings yet
Bca Be Enabppac.
2 pages
AT5 - Listening Test 1
No ratings yet
AT5 - Listening Test 1
21 pages
Reducing Adjective Clauses To Adjective Phrases GOOD PDF
100% (1)
Reducing Adjective Clauses To Adjective Phrases GOOD PDF
2 pages
COL - TRB2 - Vocabulary WS - Extension Unit 5 - Without Answers
No ratings yet
COL - TRB2 - Vocabulary WS - Extension Unit 5 - Without Answers
1 page
Unit 2 PDF
No ratings yet
Unit 2 PDF
16 pages
Understanding Noon Saakinah and Tanween
No ratings yet
Understanding Noon Saakinah and Tanween
12 pages
Sindhi Poetry and Culture Insights
No ratings yet
Sindhi Poetry and Culture Insights
172 pages
A Salt Spring Gander
No ratings yet
A Salt Spring Gander
24 pages
Grocery Price List for Shoppers
No ratings yet
Grocery Price List for Shoppers
3 pages
Rocky Livestock Bylaw Hearing Notice
No ratings yet
Rocky Livestock Bylaw Hearing Notice
3 pages
CQB Physics Jee Main 2019 Fluid Mechanics
No ratings yet
CQB Physics Jee Main 2019 Fluid Mechanics
4 pages
Employee Pay Stub Breakdown
100% (4)
Employee Pay Stub Breakdown
1 page
Kinematics in One Dimension: Conceptual Questions
No ratings yet
Kinematics in One Dimension: Conceptual Questions
2 pages
Trigonometry Problems and Solutions
No ratings yet
Trigonometry Problems and Solutions
71 pages
Agrasen Ki Baoli An Architectural Marvel
No ratings yet
Agrasen Ki Baoli An Architectural Marvel
8 pages
3WA91110BA31 Datasheet en
No ratings yet
3WA91110BA31 Datasheet en
3 pages
Develop Extensions Using SAP S4HANA Cloud, ABAP Environment
No ratings yet
Develop Extensions Using SAP S4HANA Cloud, ABAP Environment
5 pages
P 1 Part - 2 Ancient and Medieval History of India
No ratings yet
P 1 Part - 2 Ancient and Medieval History of India
32 pages
Material Balance in Chemical Processes
No ratings yet
Material Balance in Chemical Processes
43 pages
Organising: According To Theo Haimann, "Organising Is The Process of Defining and Grouping The
No ratings yet
Organising: According To Theo Haimann, "Organising Is The Process of Defining and Grouping The
13 pages
Synonym Test PDF
No ratings yet
Synonym Test PDF
2 pages
AVEVA Edge License Activation Guide
No ratings yet
AVEVA Edge License Activation Guide
8 pages
Bhanu - PO - 15293
No ratings yet
Bhanu - PO - 15293
2 pages
Objective Mapping and Kriging: 5.1 Contouring and Gridding Concepts
No ratings yet
Objective Mapping and Kriging: 5.1 Contouring and Gridding Concepts
24 pages
Fire and Gas Detection System
No ratings yet
Fire and Gas Detection System
55 pages
Conditional Deed of Sale
No ratings yet
Conditional Deed of Sale
3 pages
Enumerator Guide for Census
No ratings yet
Enumerator Guide for Census
55 pages
Thesis Help for Public Health Dentistry
100% (3)
Thesis Help for Public Health Dentistry
5 pages
Hotel Room Classification Guide
100% (1)
Hotel Room Classification Guide
2 pages
IT0021 Laboratory Exercise 2
No ratings yet
IT0021 Laboratory Exercise 2
8 pages

Language Modelling

Uploaded by

Language Modelling

Uploaded by

Natural Language

Two types Statistical Language Modelings: Statistical Language Modeling, or

P(w)= P(𝑤1, 𝑤2, 𝑤3, 𝑤4, …. 𝑤𝑛 )

• Related task: Probability of an upcoming words

P(𝑤1, 𝑤2, 𝑤3, 𝑤4 , 𝑤5 ) = P(𝑤5 | 𝑤1, 𝑤2, 𝑤3, 𝑤4 )

P(𝑤1 , 𝑤2 , 𝑤3 ,…, 𝑤𝑛 ) = ς𝑖 𝑃(𝑤𝑖 |𝑤1 , 𝑤2 , 𝑤3 , … , 𝑤𝑖−1 )

For Example: <s> I am Sam </s>

where v1 and v2 are the embeddings of two words

You might also like