0% found this document useful (0 votes)
12 views8 pages

Computational Intelligence Endsem

The document outlines various text representation techniques in machine learning, including Bag of Words (BoW), TF-IDF, Word2Vec, GloVe, and BERT, detailing their methodologies, advantages, and applications. It also discusses the Seq2Seq model for neural machine translation and evaluation metrics like BLEU and BERT Score. Additionally, it covers Neural Style Transfer and highlights the benefits of BERT over traditional language models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

Computational Intelligence Endsem

The document outlines various text representation techniques in machine learning, including Bag of Words (BoW), TF-IDF, Word2Vec, GloVe, and BERT, detailing their methodologies, advantages, and applications. It also discusses the Seq2Seq model for neural machine translation and evaluation metrics like BLEU and BERT Score. Additionally, it covers Neural Style Transfer and highlights the benefits of BERT over traditional language models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Unit 5

Bag of Words (BoW)

- Bag of Words is a simple and commonly used method to represent text data in machine
learning.
- It creates a vocabulary of all unique words from a collection of documents and represents
each document as a vector based on word frequency.

How it works:
- It ignores grammar and word order.
- It only considers whether known words occur in the document and how often.
- Each document is represented as a vector.

Example:
Let’s say we have two sentences:
- Doc1: “I like NLP”
- Doc2: “NLP is fun”

Vocabulary = [I, like, NLP, is, fun]


BoW Vectors:
- Doc1: [1, 1, 1, 0, 0]
- Doc2: [0, 0, 1, 1, 1]

Advantages:
- Simple and easy to implement.
- Works well for small datasets.

Describe the TF-IDF (Term Frequency-Inverse Document Frequency)


weighting scheme and its significance in text representation?

- TF-IDF is a statistical measure that reflects how important a word is to a document in a


collection by combining its frequency in the document and its rarity across the corpus.
- It is the product of two components:

1. Term Frequency (TF)

- Measures how frequently a term occurs in a document.


- Formula:
- Purpose: Words appearing more times in a document are more relevant to that document.

2. Inverse Document Frequency (IDF)

- Measures how unique or rare a term is across all documents in the corpus.
- Formula:

- Purpose: Reduces the weight of common terms (e.g., "is", "the") and increases the
importance of rare or meaningful terms.

3. TF-IDF Score

- Formula:

Interpretation: A high TF-IDF score indicates a word is frequent in a specific document but rare
across the corpus — hence, it's important for that document.

Significance in Text Representation

1.​ Helps convert textual data into numerical vectors, suitable for machine learning models.
2.​ Reduces the influence of common words and highlights meaningful terms.
3.​ Commonly used in text classification, information retrieval, clustering, and search
engines.
4.​ Efficient and simple compared to deep learning methods for small and medium-sized
datasets.

Word2Vec

- Word2Vec is a neural network-based technique that converts words into dense vector
representations where semantically similar words are closer in the vector space.
- Word2Vec is a predictive model, it predicts surrounding words based on the current word (or
vice versa).
- It trains faster on smaller datasets.
- Uses low memory because it doesn’t store a full matrix.
- Word2Vec is famously used in the Google News Word2Vec Model.

Example:
"Paris" and "France" will be closer in vector space than "Paris" and "banana".

Advantages:
- Word2Vec can be applied to different types of text data (like news articles, social media posts,
etc.) and still learn meaningful representations.
- Dense and low-dimensional embeddings.

GloVe (Global Vectors)

- GloVe is a word embedding technique that uses global word co-occurrence statistics from a
corpus. It captures how often words appear together in the entire corpus.
- GloVe is a count-based model — it builds a co-occurrence matrix and learns embeddings from
it.
- It is efficient for large datasets but needs to compute a matrix first.
- Uses more memory since it stores and processes a large matrix during training.
- GloVe is developed by Stanford and used in Stanford’s GloVe model.

Example:
Words like “king” and “queen” will have similar vectors because they appear in similar contexts.

Advantages:
- Produces meaningful and consistent word vectors.
- Useful for many NLP applications.
- Works well with large datasets.

Compare and contrast Word2Vec and GloVe in terms of how they generate
word embeddings?

Aspect Word2Vec GloVe (Global Vectors)

Main Idea Learn word meanings from Learn word meanings from how often
context (neighbors) in a words appear together in a large text.
sentence.

Model Type Predictive – it predicts Count-based – it counts how often words


surrounding words based on the appear together in a matrix, then learns
current word (or vice versa). embeddings from that.

Focus Area Local context – looks at a small Global context – considers entire
window of words nearby. text/statistics.
Training Faster on smaller datasets. Efficient for large datasets (requires
Speed matrix computation first).

Memory Low, since it doesn’t store a full Higher, as it stores a large matrix during
Usage matrix. training.

Use Case Google News Word2Vec Model Stanford’s GloVe

Neural Word Embedding

- Unlike static methods, neural word embeddings like BERT generate contextualized
representations, meaning the same word will have different embeddings based on the sentence.

Example:
“He went to the bank to deposit money.”
“He sat by the river bank.”

The word “bank” has different meanings and will get different embeddings in BERT.

Advantages:
- Captures context and semantics accurately.
- Better for downstream tasks like translation, question answering, etc.

Disadvantages:
- Computationally expensive.
- Requires large resources for training and inference.

Explain the architecture of a Seq2Seq model and its role in neural machine
translation?

Seq2Seq Model Architecture (Sequence-to-Sequence)

- The Seq2Seq model is a type of encoder-decoder neural architecture designed to map


variable-length input sequences to variable-length output sequences.
- It's heavily used in tasks like Neural Machine Translation (NMT), text summarization, and
question answering.

It mainly consists of two parts: Encoder and Decoder

1. Encoder
- The encoder is responsible for processing the entire input sequence and encoding it into a
fixed-length context vector
- It is usually implemented using RNNs, LSTMs, or GRUs.
- The encoder reads the input sentence (source language) one word at a time.

2. Context Vector
- In the vanilla Seq2Seq model, the final hidden state of the encoder hTh_ThT​is called the
context vector.​
- It is intended to carry all the semantic information from the input sequence to guide the
decoder during output generation.​

3. Decoder
- The decoder takes the context vector from the encoder and starts generating the output
sentence (target language) one word at a time.
- It uses the context vector and the previously generated words to predict the next word.
- It continues until it produces the end-of-sequence token.

Role of Seq2Seq in Neural Machine Translation

1.​ Sentence Understanding: The encoder converts the full sentence into a context vector
that captures its meaning.
2.​ Language Generation: The decoder uses the context to generate the translated
sentence, one word at a time.
3.​ Flexible Input/Output Lengths: Can handle input and output sentences of different
lengths, which is common in translation.
4.​ Learnable from Data: Learns from parallel corpora (pairs of sentences in source and
target languages).

BLEU Score & BERT Score

BLEU Score

- BLEU is one of the oldest and most commonly used metrics for evaluating machine
translation output.
- It was introduced in 2002 by IBM researchers.
- The idea is simple: compare the machine’s output to one or more human reference
translations and check how much they match in terms of word sequences (n-grams).

How BLEU Works:

1.​ BLEU looks at how many n-grams (unigrams = 1 word, bigrams = 2 words, trigrams = 3
words, etc.) from the machine translation appear in the reference translation.
2.​ BLEU focuses on precision, i.e., how many of the words in the machine translation are
correct—not whether it captured everything from the reference.
3.​ If the machine translation is too short, BLEU applies a brevity penalty to reduce the
score.
4.​ The final BLEU score is a geometric mean of all n-gram precisions multiplied by the
brevity penalty.

BERT Score

- BERT Score is a modern evaluation metric that leverages deep learning models (like BERT)
to evaluate translation by checking the meaning (semantics) rather than just word overlap.
- It was introduced around 2019 to address the flaws in BLEU by using contextual embeddings.

How BERT Score Works:

1. BERT transforms both the candidate and reference sentences into contextual
embeddings—each word is represented as a vector that captures meaning based on context.

Example:
“He went to the bank to deposit money.” → BERT understands “bank” as a financial institution.
“He sat by the river bank.” → Now “bank” refers to a riverside. BERT captures this difference.

2. For each word in the candidate sentence, it finds the most similar word (in terms of meaning)
in the reference sentence using cosine similarity between their vector representations.

Neural Style Transfer

- Neural Style Transfer is a fascinating application of deep learning that involves the artistic
transformation of images by combining the content of one image with the style of another.
- The technique uses Convolutional Neural Networks (CNNs) to separate and recombine
content and style features from two input images.
- Neural Style Transfer typically involves a pre-trained CNN, where the content features are
extracted from the content image and the style features are extracted from the style image.
- The content and style features are then combined in a way that generates a new image with
the content of the first image but the artistic style of the second.

Example:
- Content Image: A photo of you standing in front of a building.
- Style Image: A painting by Leonardo da Vinci.
- Output: Your photo now looks like it was painted by Leonardo da Vinci, while still showing you
and the building clearly.

Applications:
1.​ Art Creation: Turn real photos into artwork styled like famous paintings.
2.​ Social Media Filters: Some Instagram and Snapchat filters use NST to give images
artistic effects.
3.​ Advertising & Design: Used for stylized visuals.
4.​ AI Art Tools: Apps like Prisma and DeepArt use NST.

How does BERT (Bidirectional Encoder Representations from


Transformers) work, and what are its advantages over traditional language
models?

- BERT is a pre-trained deep bidirectional Transformer-based language model introduced by


Google in 2018.
- Unlike traditional unidirectional models, BERT reads text in both directions (left-to-right and
right-to-left) using the Transformer encoder architecture.
- BERT is pretrained on a massive amount of text data, learning contextual embeddings for
each word in a sentence.
- This pretraining allows BERT to understand the relationships between words, making it
effective in a wide range of NLP tasks.

Working of BERT

1. Input Formatting​
- Adds special tokens: [CLS] at start, [SEP] between sentences.​
- Words are broken into sub-words using WordPiece.​

2. Embeddings​
- Each token gets Token + Segment + Position Embeddings.​

3. Transformer Encoder​
- Only the encoder is used.​
- Applies bidirectional self-attention to understand full context.​

4. Pre Training Tasks​


- Masked Language Modeling (MLM): Predict masked words.​
- Next Sentence Prediction (NSP): Check if sentence B follows sentence A.​

5. Fine-tuning​
- Model is adapted to specific tasks with task-specific layers.​

6. Output​
- [CLS] token embedding used for classification.​
- Full token embeddings used for token-level tasks.​

Advantages of BERT over Traditional Language Models ( BAPP)


1. Bidirectional Context Understanding
- Unlike traditional models that read text left-to-right or right-to-left, BERT reads both directions
at once, capturing full context of a word based on all surrounding words.

2. Ambiguity
- Since embeddings are context-aware, BERT can distinguish between different meanings of
the same word depending on the sentence.

3. Pre-trained
- BERT is pre-trained on massive data using unsupervised tasks, giving it a rich understanding
of language.

4. Parallel Processing
- Based on the Transformer architecture, BERT allows parallel computation, which speeds up
training and inference.

You might also like