0% found this document useful (0 votes)

14 views12 pages

NLP Assignment

The document discusses the computational costs associated with training the Skip-gram model and introduces Negative Sampling as a solution to reduce these costs. It compares the Skip-gram model with FastText, highlighting their differences in handling word embeddings, particularly for rare and unseen words. Additionally, it emphasizes the advantages of Negative Sampling in improving training efficiency and performance in natural language processing tasks.

Uploaded by

harshakavati2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views12 pages

NLP Assignment

Uploaded by

harshakavati2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

K. HarshaVardhan

22 WU0104101

AIML-B

NLP ASSIGNMENT
1 . Computational Cost of Training the Skip-gram Model
and Negative Sampling
Core Function:

 The Skip-gram model, a key part of the word2vec framework, generates word
embeddings.

 It predicts surrounding words based on a given target word.

 Developed by Tomas Mikolov et al., it aims to capture semantic relationships

between words.

Computational Challenges:

 Training is extremely resource-intensive due to large vocabulary sizes.

 Probability calculations, especially with the softmax function, add to the complexity.

 Large NLP datasets exacerbate these computational demands.

Operational Mechanism:

 The model inputs a target word and outputs predicted context words.

 It calculates the probability of each context word appearing nearby.

 The softmax function, which considers every vocabulary word, is used for probability
calculation.

Illustrative Example:

 In a vocabulary of 1 million words, each training step requires summing probabilities

across all 1 million words.

 This extensive calculation significantly slows down the training process and demands
considerable computational power.
2

Factors Contributing to High Computational Cost:

 Large Vocabulary Size:

o NLP datasets commonly include millions of unique words.

o Calculating the softmax function across such extensive vocabularies

demands substantial memory and processing capabilities.

 Frequent Probability Calculations:

o The Skip-gram model optimizes the probability of correct word-context

pairings.

o This optimization requires iterating through all possible words, resulting in

significant computational overhead.

 Gradient Computation Complexity:

o Word embeddings are updated via backpropagation, necessitating

adjustments to each word's vector.

o With a large vocabulary, updating all embeddings in every step slows down
the training process.

 Storage and Memory Issues:

o Storing and updating vectors for millions of words requires considerable

memory resources.

o The large size of the embedding matrix itself creates training difficulties on
hardware with limited resources.

Addressing Computational Burden:

 Negative Sampling, introduced by Mikolov et al., aims to lessen the computational

load of Skip-gram training.

 It reduces the number of words processed during each training step.

How Negative Sampling Functions:

 Instead of calculating probabilities for the entire vocabulary, it uses a small set of
sampled words for model updates.

 Positive Sample Selection:

o A valid word-context pair from the actual text is designated as a positive

example.

 Negative Sample Selection:

o Rather than updating embeddings for all words, the model chooses a few
random words as negative examples (words unlikely to be context words).
3

o Typically, 5-20 negative samples are selected for each positive word-context
pair.
 Simplified Probability Calculation:
o The model updates embeddings only for the target word, the correct context
word, and the selected negative words.
o This significantly decreases the required number of calculations.
 Mathematical Formulation:
o Negative Sampling employs a binary classification objective, replacing the
softmax function.
Mitigation of Computational Load:
 Negative Sampling, introduced by Mikolov et al., is a method designed to lessen the
computational demands of Skip-gram training.
 It achieves this by decreasing the quantity of words requiring processing during each
training iteration.
Operational Mechanism of Negative Sampling:
 Rather than computing probabilities for every word within the vocabulary, Negative
Sampling updates the model using only a select number of sampled words.
 Positive Sample Selection:
o A word-context combination derived from actual text data is designated as a
positive example.
 Negative Sample Selection:
o Instead of updating embeddings for all vocabulary words, the model selects a
small set of random words as negative examples (words unlikely to appear as
context words).
o Generally, 5-20 negative samples are chosen for each positive word-context
pair.
 Simplified Probability Calculation:
o The model updates embeddings solely for the target word, the correct context
word, and the chosen negative words.
o This significantly reduces the number of calculations required.
 Mathematical Formulation:
o In place of the softmax function, Negative Sampling employs a binary
classification objective.

Advantages of Negative Sampling:

 Faster Training Time:
o Instead of processing millions of words, it updates only a small sample.
o This dramatically reduces the number of calculations per step.
 Lower Memory Consumption:
o The embedding matrix requires fewer updates, lessening memory demands.
 Scalability:
o Negative Sampling enables efficient handling of large datasets.
o It makes training practical on standard hardware.
 Improved Performance on Rare Words:
4

o Rare words receive updates through relevant negative examples, leading to

better representations.
Overall Impact:
 Negative Sampling offers an efficient alternative by focusing on sampled words,
significantly reducing training time and resource needs.
 This optimization allows word2vec to learn high-quality word representations
efficiently, contributing to its widespread use in NLP.
 It does have drawbacks when compared to smarter sampling methods that can
increase accuracy.
 In general, negative sampling greatly increases the speed of the sampling process.
5

2 . Comparison of Skip-gram and FastText Word

Embeddings
Introduction to Word Embeddings:

 Word embeddings in NLP convert words into vectors, capturing semantic

relationships.

 Key methods include the Skip-gram model (word2vec) and the FastText model.

 Both generate meaningful representations but differ in approach, efficiency, and

linguistic structure handling.

Overview of Skip-gram Model:

 Skip-gram encodes each word as a unique vector.

 It predicts context words from a target word by scanning a large text corpus.

 A neural network learns word relationships based on co-occurrence within a defined

window.

 Strengths:

o Generates good-quality embeddings with small training sets.

o Captures semantic and syntactic relationships.

 Limitations:

o Treats words independently, ignoring internal structure.

o Challenges with rare or out-of-vocabulary words.

o Limited to the information it was trained on.

Overview of FastText Model:

 FastText, developed by Facebook AI, extends Skip-gram with subword embeddings.

 It splits words into character n-grams (subword sequences).

 Example: "delaying" becomes "delaying", "de", "lay", "ing", etc.

 Benefits:

o Encodes morphological variations.

o Enhances word representations, especially for languages with complex word

forms (prefixes, suffixes, inflections).

o Useful for languages that combine words, like German.

Key Differences Between Skip-gram and FastText:

 Word Representation:

o Skip-gram: Assigns a single vector to each word, treating "run" and "running"
as distinct.

o FastText: Decomposes words into subwords, recognizing similarities between

"run," "running," and "runner."

o This subword approach is beneficial for morphologically rich languages like

German and Arabic.

o Example: German "Untergrundbahnhöfen" breaks down to "UNDER-

GROUND-TRACK-YARD".

 Handling Rare/Unseen Words:

o Skip-gram: Struggles with infrequent words; cannot generate embeddings for

out-of-vocabulary words.

o FastText: Infers meaning from subword components, approximating unseen

word meanings.
7

o FastText is more effective with rare and unique words.

 Computational Efficiency:

o Skip-gram: More computationally efficient due to simpler vector processing.

o Fasttext: Slower processing speeds due to subword processing.

Performance and Use Cases:

 Skip-gram:

o Performs well with large, frequent-word datasets.

o Suitable for tasks prioritizing speed and treating words as discrete units.

o Used in search engines, machine translation, and document classification.

 FastText:

o Useful with noisy or incomplete datasets (user-generated content, tweets,

reviews).

o Preferred for multilingual NLP and languages with complex word formations.

o Good for colloquial and social media language.

o If a project uses formal well structured text, skip-gram is sufficient, if the project
handles misspellings, or linguistic variation, FastText is preferred.

Conclusion:

 Both models have pros and cons, depending on the NLP task.

 Skip-gram:

o More computationally efficient, suitable for large-scale operations and speed.

o Limited by its inability to handle unseen or rare words.

 FastText:

o Slightly slower and more memory-intensive.

o Better generalization through subword information.

o Adept at handling morphologically rich languages and unseen words.

o Preferred when linguistic variance is a key consideration.

3 . Visualizing Word Embeddings with t-SNE – Skip-gram

vs. FastText
Okay, here's the paraphrased passage with bullet points:

• Word Embeddings and Visualization:

o Word embeddings represent word relationships as dense numerical vectors.

o These vectors exist in high-dimensional spaces (often hundreds of

dimensions), making direct interpretation challenging. o t-SNE (t-

distributed Stochastic Neighbor Embedding) is a common visualization

technique.

It reduces high-dimensional data to two or three dimensions.

▪ It preserves local structures, allowing for visual analysis of word

relationships.

Now, let's make a representation of this with some training:

Step 1: Scraping Data

10
11
12

Output:

Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Tugas NLP - 1152000052 1
No ratings yet
Tugas NLP - 1152000052 1
14 pages
Word2vector Paper PDF
No ratings yet
Word2vector Paper PDF
9 pages
Word Vectors I
No ratings yet
Word Vectors I
23 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
07 Word Embeddings Notes
No ratings yet
07 Word Embeddings Notes
23 pages
NLP Question Bank Overview
No ratings yet
NLP Question Bank Overview
21 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Lecture 4 Word Representation
No ratings yet
Lecture 4 Word Representation
48 pages
NLP with Deep Learning for Students
No ratings yet
NLP with Deep Learning for Students
45 pages
11 Word 2 Vec
No ratings yet
11 Word 2 Vec
21 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
Word2Vec CBOW vs Skip-gram Analysis
No ratings yet
Word2Vec CBOW vs Skip-gram Analysis
7 pages
Torralba Skip Thought Vectors
No ratings yet
Torralba Skip Thought Vectors
10 pages
Transactions of The Association For COmputational Linguistics PDF
No ratings yet
Transactions of The Association For COmputational Linguistics PDF
14 pages
NLP & AI Techniques Guide
No ratings yet
NLP & AI Techniques Guide
37 pages
1506 06726 PDF
No ratings yet
1506 06726 PDF
11 pages
Deep Unordered Composition Rivals Syntactic Methods For Text Classification
No ratings yet
Deep Unordered Composition Rivals Syntactic Methods For Text Classification
11 pages
Understanding Word2Vec and Dense Vectors
No ratings yet
Understanding Word2Vec and Dense Vectors
60 pages
NLP Deep Learning for Students
No ratings yet
NLP Deep Learning for Students
57 pages
Lab 5
No ratings yet
Lab 5
27 pages
Enriching Word Vectors with Subwords
No ratings yet
Enriching Word Vectors with Subwords
7 pages
5950 Skip Thought Vectors
No ratings yet
5950 Skip Thought Vectors
9 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
Foundations of Text Representation, LLMs and Transformers
No ratings yet
Foundations of Text Representation, LLMs and Transformers
87 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
Sheet 3
No ratings yet
Sheet 3
5 pages
Word 2 Vec
No ratings yet
Word 2 Vec
22 pages
Unit 5
No ratings yet
Unit 5
8 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Pipeline
No ratings yet
Pipeline
9 pages
Unit 6 Endsem PYQs
No ratings yet
Unit 6 Endsem PYQs
15 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
8 pages
Lecture#6 Skip Gram
No ratings yet
Lecture#6 Skip Gram
17 pages
INLP Assignment 3
No ratings yet
INLP Assignment 3
5 pages
Summaries of The Chapters
No ratings yet
Summaries of The Chapters
29 pages
Machine Learning for NLP: Tokenization & Features
No ratings yet
Machine Learning for NLP: Tokenization & Features
37 pages
Ment Analysis Text Classification
No ratings yet
Ment Analysis Text Classification
9 pages
Text Representation: Lecture # 6
No ratings yet
Text Representation: Lecture # 6
21 pages
Word2Vec for NLP Enthusiasts
100% (1)
Word2Vec for NLP Enthusiasts
12 pages
Deep Learning Techniques for IR
No ratings yet
Deep Learning Techniques for IR
136 pages
Deep Learning-5
No ratings yet
Deep Learning-5
5 pages
NLP Scheme for Mobile Forensics Exam
No ratings yet
NLP Scheme for Mobile Forensics Exam
6 pages
Transformer
No ratings yet
Transformer
5 pages
Genai Unit !
No ratings yet
Genai Unit !
71 pages
GML Part3
No ratings yet
GML Part3
49 pages
GML Part2
No ratings yet
GML Part2
48 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
No ratings yet
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
9 pages
NLP Using Deep Learning Handson
No ratings yet
NLP Using Deep Learning Handson
7 pages
Chapter II
No ratings yet
Chapter II
26 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Word2Vec: Vector Representations Explained
No ratings yet
Word2Vec: Vector Representations Explained
31 pages
Project Final Presentation
No ratings yet
Project Final Presentation
30 pages
Wu Mycamu Grade Card
No ratings yet
Wu Mycamu Grade Card
1 page
Break Even Analysis
No ratings yet
Break Even Analysis
2 pages
Se - Unit 2
No ratings yet
Se - Unit 2
106 pages
Software Engineering - Unit 3
No ratings yet
Software Engineering - Unit 3
104 pages
Se - Unit 1
No ratings yet
Se - Unit 1
54 pages
Se - Unit 4
No ratings yet
Se - Unit 4
25 pages
The Beginners Handbook Debian 12 1743372417
No ratings yet
The Beginners Handbook Debian 12 1743372417
272 pages
APM2613 - Assessment 3 - 2024
No ratings yet
APM2613 - Assessment 3 - 2024
3 pages
SmartCenter, Getting Started Eng Versie 2006
No ratings yet
SmartCenter, Getting Started Eng Versie 2006
67 pages
S7806 S0 Datasheet 20240102
No ratings yet
S7806 S0 Datasheet 20240102
3 pages
Digital Logic Design and VHDL Phadke
No ratings yet
Digital Logic Design and VHDL Phadke
246 pages
HP 580 Inkjet Printer Guide
No ratings yet
HP 580 Inkjet Printer Guide
105 pages
Git, GitHub & Eclipse Guide
No ratings yet
Git, GitHub & Eclipse Guide
32 pages
Web Application Security: Vulnerabilities, Attacks, and Countermeasures
No ratings yet
Web Application Security: Vulnerabilities, Attacks, and Countermeasures
72 pages
Digital Evidence Review 2016-2019
No ratings yet
Digital Evidence Review 2016-2019
32 pages
Client-Server vs Other Architectures
No ratings yet
Client-Server vs Other Architectures
4 pages
Object Oriented Programming Using C++ CST-157 Unit-Iii Pointers and Virtual Functions Chapter-7
No ratings yet
Object Oriented Programming Using C++ CST-157 Unit-Iii Pointers and Virtual Functions Chapter-7
17 pages
PowerLogic ION9000 Series User Guide 7EN02 0390 08
No ratings yet
PowerLogic ION9000 Series User Guide 7EN02 0390 08
317 pages
Pig Latin Language and Data Types Guide
No ratings yet
Pig Latin Language and Data Types Guide
10 pages
Cybertwin Se
No ratings yet
Cybertwin Se
36 pages
NiceLabel - Product Brochure - English
No ratings yet
NiceLabel - Product Brochure - English
2 pages
The It Leader'S Checklist For Saas Operations
No ratings yet
The It Leader'S Checklist For Saas Operations
19 pages
Oj1436 Manual 1.3 en
No ratings yet
Oj1436 Manual 1.3 en
92 pages
Complete FYP
No ratings yet
Complete FYP
104 pages
4G Paper PDF
No ratings yet
4G Paper PDF
3 pages
Le Corbeau (Henri-Georges Clouzot, 1943)
No ratings yet
Le Corbeau (Henri-Georges Clouzot, 1943)
1 page
Assignment 2
No ratings yet
Assignment 2
11 pages
RenderCV EngineeringResumes Theme 3
No ratings yet
RenderCV EngineeringResumes Theme 3
1 page
Data Representation
No ratings yet
Data Representation
2 pages
WildFire Administrator Guide-8.0
No ratings yet
WildFire Administrator Guide-8.0
189 pages
Final Demo LP
No ratings yet
Final Demo LP
5 pages
Powerpod 820: User's Manual Manual Del Usuario
No ratings yet
Powerpod 820: User's Manual Manual Del Usuario
36 pages
S4Hana Public Cloud User License
No ratings yet
S4Hana Public Cloud User License
3 pages
Discrete Mathematics: Set Theory & Functions
No ratings yet
Discrete Mathematics: Set Theory & Functions
10 pages
Networking
No ratings yet
Networking
36 pages
Building Accurate Geodatabase Using Total Station - ٠١٢٤٥٦
No ratings yet
Building Accurate Geodatabase Using Total Station - ٠١٢٤٥٦
19 pages