0% found this document useful (0 votes)

41 views29 pages

Quadgram Language Model Analysis

The document outlines an assignment on N-gram language models, detailing the process of creating unigrams, bigrams, trigrams, and quadgrams from a text dataset. It includes code snippets for calculating probabilities of these n-grams, generating word clouds, and estimating the probability of specific sentences using different n-gram models. Additionally, it discusses the evaluation of language models through intrinsic and extrinsic methods, including next word prediction.

Uploaded by

leulabay4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views29 pages

Quadgram Language Model Analysis

Uploaded by

leulabay4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Assignment on N-gram Language

Models
Assignment on N-gram language
models
Packages

• ```python
• from collections import Counter
• import random
• from wordcloud import WordCloud
• import matplotlib.pyplot as plt
• ```
Dataset

• ```python
• dataset_path = "GPAC.txt"
• with open(dataset_path, "r", encoding="utf8")
as file:
• corpus_text = file.read()
• ```

• ```python
Create n-grams for n=1, 2, 3, 4.

• We will use 50,000,000 words of the corpuse

text to create the n-grams for demonstration.

• ```python
• def create_ngrams(text, n, stop_words = {'(',
')', '።', '፥', '፡', '፣', '፤'}):
• words = text
• words = words[:50000000].split()
N = 1 (Unigrams)

• ```python
• unigrams = create_ngrams(corpus_text, 1)

• print("N-grams for n=1: ")

• print(unigrams[:5])
• for i in range(5):
• print(unigrams[i][0])
• ```
N = 2 (Bigrams)

• ```python
• bigrams = create_ngrams(corpus_text, 2)

• print("N-grams for n=2: ")

• for i in range(5):
• print(bigrams[i])
• ```
N = 3 (Trigrams)

• ```python
• trigrams = create_ngrams(corpus_text, 3)

• print("N-grams for n=3: ")

• for i in range(5):
• print(trigrams[i])
• ```
N = 4 (Quadgrams)

• ```python
• quadgrams = create_ngrams(corpus_text, 4)

• print("N-grams for n=4: ")

• for i in range(5):
• print(quadgrams[i])
• ```
Probabilities of n-grams and the
top 10 most likely n-grams for all n.

• ```python
• # Precalculate the count of N-grams
• unigram_counts = Counter(unigrams)
• bigram_counts = Counter(bigrams)
• trigram_counts = Counter(trigrams)
• quadgram_counts = Counter(quadgrams)
• ```
Unigram Probabilities

• ```python
• # Calculate probabilities
• unigram_probabilities =
calculate_unigram_probabilities(unigrams)

• top_unigrams =
dict(sorted(unigram_probabilities.items(),
key=lambda x: x[1], reverse=True)[:10])
Bigram Probabilities

• ```python
• # Calculate probabilities
• bigram_probabilities =
calculate_bigram_probabilities(bigrams)

• top_bigrams =
dict(sorted(bigram_probabilities.items(),
key=lambda x: x[1], reverse=True)[:10])
Trigram Probabilities

• ```python
• # Calculate probabilities
• trigram_probabilities =
calculate_trigram_probabilities(trigrams)

• top_trigrams =
dict(sorted(trigram_probabilities.items(),
key=lambda x: x[1], reverse=True)[:10])
Quadgram Probabilities

• ```python
• # Calculate probabilities
• quadgram_probabilities =
calculate_quadgram_probabilities(quadgrams)

• top_quadgrams =
dict(sorted(quadgram_probabilities.items(),
key=lambda x: x[1], reverse=True)[:10])
Remove common stopwords and
recompute biagrams and trigrams
frequencies, find the top 10 n-
grams; n=1,2,3,4.
• ```python
• common_stop_words =
set(stop_words_text.split())

• filtered_unigrams =
create_ngrams(corpus_text, 1,
common_stop_words)
• filtered_bigrams =
create_ngrams(corpus_text, 2,
Create word clouds for unigrams,
bigrams and trigrams before and
after stop word removal
• ```python
• def plot_word_cloud(ngrams):
• ngram_counts = Counter(ngrams)
• ngram_dict = { " ".join(ngram_key):
ngram_counts[ngram_key] for ngram_key in
ngrams}

• # Create word cloud

Word cloud for unigram, bigram
and trigram before common word
removal
• ```python
• plot_word_cloud(unigrams)
• plot_word_cloud(bigrams)
• plot_word_cloud(trigrams)
• ```
Word cloud for unigram, bigram
and trigram after common word
removal
• ```python
• plot_word_cloud(filtered_unigrams)
• plot_word_cloud(filtered_bigrams)
• plot_word_cloud(filtered_trigrams)
• ```
Lets take a random sentence and
calculate it's probability. "ኢትዮጵያ
ታሪካዊ ሀገር ናት "?
• Let's calculate the probability of the sentence
using different n-gram models: Unigram,
Bigram, Trigram, and Quadgram.
Unigram Estimation

• Finding the probability of the sentence using

Unigram Estimation

• ```python
• def
unigram_probability_estimation(sentence):
• # Find probability using the Unigrams
• sentence_ngrams =
Bigram Estimation

• Finding the probability of the sentence using

Bigram Estimation

• ```python
• def bigram_probability_estimation(sentence):
• # Find probability using the Unigrams
• sentence_ngrams =
create_ngrams(sentence, 2)
Trigram Estimation

• Finding the probability of the sentence using

Trigram Estimation

• ```python
• def trigram_probability_estimation(sentence):
• # Find probability using the Unigrams
• sentence_ngrams =
create_ngrams(sentence, 3)
Quadgram Estimation

• Finding the probability of the sentence using

Quadgram Estimation

• ```python
• def
quadgram_probability_estimation(sentence):
• # Find probability using the Unigrams
• sentence_ngrams =
Finiding the probability of the
sentence using the Chain Rule

• ```python
• def
chain_rule_probability_estimation(sentence):

• sentence = sentence.split()
• # Find probability using the Unigrams
• sentence_probability = 1.0
• sentence_probability *=
Generating random sentences
using n-grams to see what happens
as n increases
• ```python
• def
generate_random_sentence_for_unigrams(se
ed_word, ngram_probabilities, n, reps = 10):
• sentence = [*seed_word]
• choices = list(ngram_probabilities.keys())
• for _ in range(reps):
• next_word = random.choice(choices)
Explanation
• As the value of n increases, the model takes
into account a greater amount of context
when generating text.

• 1. <b>Enhanced Contextual Relevance:</b> A

higher n leads to sentences that are more
contextually appropriate and coherent, as the
model considers a longer sequence of
previous words to predict the next one.
Evaluating these Language Models
Using Intrinsic Evaluation Method

• ```python
• import math

• def calculate_probability(sentence, n,
probability_function):
• splitted_sentence = sentence.split()
• sentence_ngrams =
[tuple(splitted_sentence[i:i+n]) for i in
Evaluating these Language Models
Using Extrinsic Evaluation Method

• We chose sentence completion as a task to

evaluate these language models

• We can use the functions that we created

before to generate random sentence but for
generating the next word for the given initial
sentence.

• ```python
Next Word Prediction

• ```python
• def generate_next_words(seed_word, n):
• if n == 1:
• ngram_probabilities =
unigram_probabilities
• elif n == 2:
• ngram_probabilities =
bigram_probabilities

Unit 2
No ratings yet
Unit 2
66 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
NLP Exp5 22108B0055
No ratings yet
NLP Exp5 22108B0055
4 pages
Prog 2 NLP
No ratings yet
Prog 2 NLP
3 pages
Python Bigram Model Implementation
No ratings yet
Python Bigram Model Implementation
11 pages
UBC Summer School in NLP - VSP 2019 Lecture 9
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 9
17 pages
N-Gram Model for Sentence Probability
No ratings yet
N-Gram Model for Sentence Probability
5 pages
N-Gram Model for Sentence Probability
No ratings yet
N-Gram Model for Sentence Probability
5 pages
1 N-Grams and Language Models Detailed
No ratings yet
1 N-Grams and Language Models Detailed
4 pages
NLTK - N-Gram LM
No ratings yet
NLTK - N-Gram LM
13 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
15 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
Ngrams
No ratings yet
Ngrams
22 pages
Module 5
No ratings yet
Module 5
69 pages
Language Models L3-6
No ratings yet
Language Models L3-6
49 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
NLP Exp 4
No ratings yet
NLP Exp 4
5 pages
Be4 A 17 NLP Exp4
No ratings yet
Be4 A 17 NLP Exp4
4 pages
Exp. 5
No ratings yet
Exp. 5
3 pages
Word Generation in NLP with Bigram Model
No ratings yet
Word Generation in NLP with Bigram Model
2 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
7 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
69 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
N Grams
No ratings yet
N Grams
51 pages
2.1 Chap NLP Ngrams
No ratings yet
2.1 Chap NLP Ngrams
37 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
Bigram Model for NLP Beginners
No ratings yet
Bigram Model for NLP Beginners
2 pages
7 Exp
No ratings yet
7 Exp
6 pages
08 NLP - N-Gram Language Models
No ratings yet
08 NLP - N-Gram Language Models
65 pages
Introduction to N-grams in Language Modeling
No ratings yet
Introduction to N-grams in Language Modeling
97 pages
NLP with Trigram and Bigram Models
No ratings yet
NLP with Trigram and Bigram Models
5 pages
N-Gram Language Modeling Techniques
No ratings yet
N-Gram Language Modeling Techniques
87 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
Batch 2
No ratings yet
Batch 2
13 pages
A 02
No ratings yet
A 02
2 pages
N-Gram Language Models Explained
No ratings yet
N-Gram Language Models Explained
13 pages
N-Gram Probability Estimation
No ratings yet
N-Gram Probability Estimation
4 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
InfoSec Lab Manual for Students
No ratings yet
InfoSec Lab Manual for Students
25 pages
NLP Exp 4
No ratings yet
NLP Exp 4
2 pages
Generating N Grams
No ratings yet
Generating N Grams
10 pages
Understanding n-gram Models in AI
No ratings yet
Understanding n-gram Models in AI
32 pages
Language Models
No ratings yet
Language Models
59 pages
Linguistics & N-Gram Models
No ratings yet
Linguistics & N-Gram Models
47 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
Unit 2
No ratings yet
Unit 2
75 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
NLP Techniques for Word Prediction
No ratings yet
NLP Techniques for Word Prediction
77 pages
Pract Q
No ratings yet
Pract Q
6 pages
Roobet 2024 12 22 27253259837
No ratings yet
Roobet 2024 12 22 27253259837
2 pages
Academic Calendar
No ratings yet
Academic Calendar
1 page
Strategic Plan Managment
No ratings yet
Strategic Plan Managment
24 pages
Slot Machines2017
No ratings yet
Slot Machines2017
9 pages
Graduate Programs for 2024/25
No ratings yet
Graduate Programs for 2024/25
5 pages
Chittoor History: Dynasties & Culture
No ratings yet
Chittoor History: Dynasties & Culture
7 pages
Act 1 TUT2 - 08 - 12 - 14 - Gobbet and Guidelines
No ratings yet
Act 1 TUT2 - 08 - 12 - 14 - Gobbet and Guidelines
3 pages
English vs. Filipino Verb Tenses
No ratings yet
English vs. Filipino Verb Tenses
14 pages
Word Formation in Natural Language Processing Systems
No ratings yet
Word Formation in Natural Language Processing Systems
3 pages
1.640 ATP 2023-24 GR 6 Setswana HL Final DATED Edited
No ratings yet
1.640 ATP 2023-24 GR 6 Setswana HL Final DATED Edited
31 pages
Websites: Website Purpose
No ratings yet
Websites: Website Purpose
6 pages
Parent-Community Involvement Plan CTEL 503
No ratings yet
Parent-Community Involvement Plan CTEL 503
8 pages
Paolo Pre A1 Starters
No ratings yet
Paolo Pre A1 Starters
1 page
Ferlyn Mendoza's Personal and Academic Profile
No ratings yet
Ferlyn Mendoza's Personal and Academic Profile
4 pages
Unit 23
No ratings yet
Unit 23
2 pages
Examen en Linea Ci 2 (Beginners V) Pead
No ratings yet
Examen en Linea Ci 2 (Beginners V) Pead
8 pages
Overview of the Brāhmī Alphabet
No ratings yet
Overview of the Brāhmī Alphabet
5 pages
Class Activity 6
No ratings yet
Class Activity 6
3 pages
Arabic Travel Worksheet for Beginners
No ratings yet
Arabic Travel Worksheet for Beginners
4 pages
S1 - Term1 - Final With Answers
100% (1)
S1 - Term1 - Final With Answers
8 pages
Foreign Language Learning Insights
No ratings yet
Foreign Language Learning Insights
3 pages
Cover Letter Chla
No ratings yet
Cover Letter Chla
1 page
Railway Scouts & Guides Jobs 2019
No ratings yet
Railway Scouts & Guides Jobs 2019
3 pages
Maths Recovery Atp Grade 1 Term 2 2021 2023
No ratings yet
Maths Recovery Atp Grade 1 Term 2 2021 2023
53 pages
Class - XI Notice Writing
No ratings yet
Class - XI Notice Writing
4 pages
Future Perfect Tense
No ratings yet
Future Perfect Tense
2 pages
Lesson 2.1 Principles of Speech Writing
No ratings yet
Lesson 2.1 Principles of Speech Writing
9 pages
Effective Official Communication Guide
100% (1)
Effective Official Communication Guide
47 pages
Understanding Semantics and Pragmatics
No ratings yet
Understanding Semantics and Pragmatics
7 pages
Class IX Integrated Grammar-1
No ratings yet
Class IX Integrated Grammar-1
3 pages
Midterm Quiz Purcom
75% (4)
Midterm Quiz Purcom
6 pages
Les Previsions de Matieres Pour Roman New Times - 2
No ratings yet
Les Previsions de Matieres Pour Roman New Times - 2
37 pages
Lista de Verbos Irregulares: Infinitivo Pasado Participio Significado
No ratings yet
Lista de Verbos Irregulares: Infinitivo Pasado Participio Significado
4 pages
Đáp Án Tham Khảo Chuyên Anh Bắc Ninh 2025-2026
No ratings yet
Đáp Án Tham Khảo Chuyên Anh Bắc Ninh 2025-2026
3 pages
Placement Test - Model Answer - Year
No ratings yet
Placement Test - Model Answer - Year
3 pages

Quadgram Language Model Analysis

Uploaded by

Quadgram Language Model Analysis

Uploaded by

Assignment on N-gram Language

• We will use 50,000,000 words of the corpuse

• print("N-grams for n=1: ")

• print("N-grams for n=2: ")

• print("N-grams for n=3: ")

• print("N-grams for n=4: ")

• # Create word cloud

• Finding the probability of the sentence using

• Finding the probability of the sentence using

• Finding the probability of the sentence using

• Finding the probability of the sentence using

• 1. <b>Enhanced Contextual Relevance:</b> A

• We chose sentence completion as a task to

• We can use the functions that we created

You might also like