0% found this document useful (0 votes)

77 views10 pages

NLP Midsem Paper August 2024 Regular Solution

The document outlines the Mid-Semester Test for the M.Tech. in AIML program at Birla Institute of Technology & Science, Pilani, covering Natural Language Processing. It includes details about the exam format, weightage, and specific questions related to NLP applications, n-gram language modeling, neural networks for sentiment classification, TF-IDF calculations, and Word2Vec updates. The test consists of five questions, each with a marking scheme and expected solutions.

Uploaded by

geetapillai1963

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views10 pages

NLP Midsem Paper August 2024 Regular Solution

Uploaded by

geetapillai1963

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Birla Institute of Technology & Science, Pilani

Work Integrated Learning Programmes Division

First Semester 2023-2024
M.Tech. in AIML

Mid-Semester Test
(EC-2 Regular Paper)

Course No. : AIMLCZG530

Course Title : Natural Language Processing
Nature of Exam : Closed Book
Weightage : 30% No. of Pages =3
Duration : 2 Hours No. of Questions = 5
Date of Exam : 21-01-2024_FN
Note to Students:
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.

Question 1. [4 Marks] Introduction

a) Identify the type of NLP application (i.e. Text categorization, Language Modeling, Named Entity
Recognition) for each of the functionalities mentioned below. [2 marks]

Classifying emails into spam or non-spam

“Look ahead” typing where user is prompted

with the next few words to type in an email

Sentiment Analysis

Classifying entities into pre-defined labels

Solution

Classifying emails into spam or non-spam Text categorization

“Look ahead” typing where user is prompted Language Modeling

with the next few words to type in an email

Sentiment Analysis Text categorization

Classifying entities into pre-defined labels Named entity recognition

b) Which of the following two sentences has a “Structural ambiguity”, and which one has a “Lexical
ambiguity” – explain the ambiguities in 1 or 2 sentences? [2 marks]
● She saw a girl with a binoculars on the beach
● I saw bats
Solution & Marking Scheme:
● She saw a girl with a binoculars on the beach
Structural ambiguity. – 1 mark
Interpretation 1: The girl was on the beach with binoculars and she saw her.
Interpretation 2:She was on beach with her binoculars and saw a girl.
● I saw bats
Lexical ambiguity. – 1 mark
Interpretation 1: Bats as mammal
Interpretation 2: Bats as a cricket bats

Question 2. [4 Marks] n –gram language modeling

a) A machine translation system has to decide which of the following is the right sequence of words
for a translation:
● I gave a book
● I book gave a
i. Given the following count of words in a corpus, compute the unigram probabilities for each
of the above sentences. Indicate which is the more likely sequence of words to be used
after translation. [2 marks]
ii. Now compute the bigram probabilities for each of the above sentences. Indicate which is
the more likely sequence of words. [2 marks]

Unigram count matrix

<s> I gave a book
4 2 2 3 2
Bigram count matrix
Wn
I gave a book </s>
Wn-1 <s> 1 0 0 0 0
I 0 1 1 0 0
gave 0 0 1 0 0
a 0 0 0 2 0
cherry 0 0 0 0 1
Unigram probabilities:
Unigram probability matrix
<s> I gave a book
4/13 2/13 2/13 3/13 2/13

P(I gave a book) = P(I) x P(gave) x P(a) x P(book)

= (2/13) x (2/13) x (3/13) x (2/13) = 24/(134) = 8.4E-4
P(I book gave a) = P(I) x P(book) x P(gave) x P(a)
= (2/13) x (2/13) x (2/13) x (3/13) = 8.4E-4
Since there is no difference in the probabilities of each of the unigrams, there is no way of figuring
out which is the right sequence of words for translation.
Bigram probabilities:
Wn
I gave a book </s>
Wn-1 <s> 1/4 0/4 0/4 0/4 0/4
I 0/2 1/2 1/2 0/2 0/2
gave 0/2 0/2 1/2 0/2 0/2
a 0/3 0/3 0/3 2/3 0/3
book 0/2 0/2 0/2 0/2 1/2

Question 3. [4 Marks]
One of the entries from a restaurant review website has the following statement: “The gulab jamoon is
really quite good here”. Use this statement as one of the entries in the training dataset, and the word
embeddings for each word in this statement as given in the Table below:

Explain how you will set-up each of the following:

a. A Neural Network based sentiment classifier, with a hidden layer consisting of 3 nodes, that can flag
the restaurant / food reviews into one of the following classes i) GOOD ii) BAD iii) AVERAGE.
[2 marks]
b. A Neural Network based “word predictor” that uses three context words and a 4 node hidden layer
to predict the immediately following word. Assume that the context words are “jamoon is really”,
and network has to be trained for the word that follows, “quite” [2 marks]

Your answer the question should contain the following:

i. The architecture diagram of the Neural Network, clearly indicating the input layer (using,
appropriately, values of the word embedding’s given above), and schematics of all the other
required intermediate steps / layers / connections and the outputs generated.
ii. Names of the activation function(s) used in the intermediate and output layers
iii. The expected output (y)

Note: You are NOT required to calculate the values of the weights and the outputs.

SOLUTIONS

Neural Network Based Sentiment Classifier [2 marks]

Important points to be looked for in the answer:

● Pooling of embeddings of all the words in the sentence into a single vector. Pooling can be done
using either the sum or the mean of all the embeddings
● Ensure that the hidden layer has 3 nodes
● Ensure that the activation function is mentioned w.r.t the output of the hidden layer. It can be
either SIGMOID or RELU
● Ensure that 3 nodes are shown in the output layer, and the activation function is SOFTMAX
● Ensure that the probability values in the y vector are correctly mentioned – GOOD should have an
associated probability value of ‘1’, other two should have ‘0’

Neural Network Based Word Predictor [2 marks]

Important points to be looked for in the answer:

● Input layer is formed by concatenating the embeddings of the 3 context words “jamoon is really”
● Ensure 4 nodes are shown in the hidden layer.
● Output from the hidden layer can be either SOFTMAX or RELU
● It is mentioned that the output layer has as many entries as the vocabulary
● Activation function at the output layer is SOFTMAX

In the y vector, only the word quite h

Question 4. (4 Marks)
Consider the following statements:
a) The cat saw the dog.
b) The dog barked at the cat.
Convert all the words to lowercase and WITHOUT further pre-processing the sentences (i.e. DO NOT
remove stop words / lemmatization / stemming / etc.) carry out the following tasks:
1. Create a vocabulary from the sentences 1
2. Establish the Term Frequency and Document Frequency 1
3. Establish the TF-IDF vector for each document 2

Note:
For Term Frequency use the simple formula (word frequency)/(sentence length)
Wherever required, use ‘Natural Logarithm’ in your calculations

SOLUTION
1. Vocabulary from the sentences [1]
V = [the,cat,saw,dog,barked,at]

2. Term Frequency and Document Frequency [1]

Term Frequency Calculation
" The cat saw the dog.” " The dog barked at the cat."

● the: 2/5 ● the: 2/6

● cat: 1/5 ● dog: 1/6
● saw: 1/5 ● barked: 1/6
● dog: 1/5 ● at: 1/6
● cat: 1/6

Document Frequency (DF) Calculation

● the: 2
● cat: 2
● saw: 1
● dog: 2
● barked:1
● at:1

3. TF-IDF Calculation for each statement [1]

Note: Use ‘Natural Logarithm’ in your calculations

TF-IDF Scores
" The cat saw the dog.” " The dog barked at the cat."
the: (2/5) * log(2/2) the: (2/6) * log(2/2)
cat: (1/5) * log(2/2) dog: (1/6) * log(2/2)
saw: (1/5) * log(2/1) barked: (1/6) * log(2/1)
dog: (1/5) * log(2/2) at: (1/6) * log(2/1)
cat: (1/6) * log(2/2)

TF-IDF Vectors

Question 5.
Consider the following sentence:
"I bank on my best friend to accompany me to the bank located near the river bank."
Your task is to train a classifier such that, given the tuple ("bank","located") where "bank" is the
target word and "located" is the candidate context word, the classifier returns the probability that
"located" is a real context word for "bank".
● Provide the updated Input weight matrix for the Target Word after one iteration of the
Word2Vec algorithm
Support your answer with detailed steps and rationale on the logic and computation. [5 marks]
The following additional information are provided:
● Use Word2Vec with Skip Gram Classifier with a Single Hidden Layer
● Negative Sampling words have been specified for you and they are "Purple", "Rain" and
● The One Hot Encoded Input Vectors are:

Bank [1 0 0 0 0]

Located [0 1 0 0 0]

Purple [0 0 1 0 0]

Rain [0 0 0 1 0]

d. Initial Embedding Matrix for the Single Hidden Layer

Bank 0.1 0.2 0.3

Located 0.2 0.3 0.4

Purple 0.3 0.4 0.5

Rain 0.4 0.5 0.5

e. Initial Embedding Matrix for the Output Layer

Bank 0.2 0.3 0.4

Located 0.3 0.4 0.5

Purple 0.4 0.5 0.6

Rain 0.5 0.4 0.6

Learning Rate = 0.05

Activation Function is Sigmoid

Solution:

Step 1 – Forward Propagation (Hidden Layer) [1 mark]

● The One Hot Encoded Input Matrix: (I)

[10000
01000
00100
00010]

● Initial Embedding Matrix (Winput)

[0.1 0.2 0.3

0.2 0.3 0.4
0.3 0.4 0.5
0.4 0.5 0.5]
● Hidden Layer (h) for Target word “bank” = WTinput * I
= [0.1
0.2
0.3]
Step 2 – Forward Propagation (Sigmoid Output Layer) [2 mark]

Woutput (context) for (located,purple,rain)

[0.3 0.4 0.5
0.4 0.5 0.6
0.5 0.4 0.6]

Output Layer = Woutput * h

= [0.26
0.32
0.31]

Applying Sigmoid Activation,

1
For Positive Samples: 𝜎(𝑥) = 1+𝑒 −𝑥

1
For Negative Samples: 𝜎(𝑥) = 1+𝑒 𝑥

= [0.5646
0.4207
0.4231]

Step 3 – Prediction Error [1 mark]

Prediction Error = – 1-hot encoded vector for context

= [0.5646 [1 [-0.4354

0.4207 -- 0 = 0.4207

0.4231 ] 0] 0.4231]

Backward Propagation (computing Winput) step:

Derivative of Loss with respect to Input Word Embeddings for the target word “bank”:

C * (Sig-t) =
[0.2492 0.2054 0.2886]

(Winput)

Step 4 - Updated Weight Matrix by applying Learning Rate [1 mark]

Learning Rate = 0.05 (given)

Wnewinput = [0.1 0.2 0.3] – 0.05 * Winput =

0.088 0.190 0.286

This the updated Input weight matrix for the Target Word “bank” after one iteration of the Word2Vec
algorithm

Question 6. [5 Marks]
a) Using HMM tagger to disambiguate the POS tag for the word “chase” in the following sentence,
given the transition probabilities and emission probabilities below: [5 marks]
● “Cut to the chase”

Emission probabilities

The Cut to chase

VB 0 0.5 0 0.5

TO 0 0 1 0

NN 0 0.5 0 0.5

Det 1 0 0 0

● Transition probabilities

VB TO NN Det

<s> 0.2 0.01 0.2 0.6

VB 0.0 0.35 0.47 0.70

TO 0.83 0 0.47 0.4

NN 0.40 0.2 0.2 0.2

Det 0.12 0.0 0.23 0

SOLUTION
Cut to the chase

Possible taggings are:

i. VB 🡪 TO 🡪 Det 🡪 NN
ii. VB 🡪 TO 🡪 Det 🡪 VB
iii. NN 🡪 TO 🡪 Det 🡪 NN
iv. NN 🡪 TO 🡪 Det 🡪 VB

We are interested in disambiguating only the word “chase” in the above phrase. Hence the
computations will be:

P(NN|Det) = 0.23
P(VB|Det) = 0.12
P(chase | NN) = 0.5
P(chase | VB) = 0.5

For tagging (i): P(NN|Det) x P(chase|NN) = 0.23 x 0.5 = 0.115

For tagging (ii): P(VB|Det) x P(chase|VB) = 0.12 x 0.5 = 0.06

Hence, tagging (i) is the preferred POS tag, i.e.

VB 🡪 TO 🡪 Det 🡪 NN

Question 7. [5 Marks]
Fill up the Viterbi table for the sentence – ‘I will’. The tag transition probabilities and word emission
probabilities, for the corpus used, are given below:

Tag
transition
MD VB PRP
probabilitie
s
MD 0.05 0.5 0.001
VB 0.007 0 0.01
PRP 0.91 0.01 0.0001
START 0.01 0.49 0.5

Word
I will
emission

MD 0 0.7
VB 0 0
PRP 1 0

Viterbi
I will
Table
VB
MD
PRP

PRP: PERSONAL PRONOUN

MD:MODAL
VB:VERB BASE FORM

Answer:

Viterbi
I will
Table
VB 0 0
MD 0 0.3185
PRP 0.5 0

Crackbitswilp - In: No. of Pages 3
No ratings yet
Crackbitswilp - In: No. of Pages 3
10 pages
Crackbitswilp - In: Instructions To Candidates
No ratings yet
Crackbitswilp - In: Instructions To Candidates
9 pages
NLP Exam Solutions for CSE Students
No ratings yet
NLP Exam Solutions for CSE Students
6 pages
NLP Midsem Paper Jan 2024 Regular Exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular Exam
4 pages
Test I
No ratings yet
Test I
10 pages
Gen Ai Lab Programs
No ratings yet
Gen Ai Lab Programs
15 pages
NLP Exam for SYMCA Students
No ratings yet
NLP Exam for SYMCA Students
4 pages
Final Fall00
No ratings yet
Final Fall00
9 pages
Week 9
No ratings yet
Week 9
6 pages
DS3001 - DAV - Final Exam - Fall23 - v3
No ratings yet
DS3001 - DAV - Final Exam - Fall23 - v3
14 pages
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
No ratings yet
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
7 pages
Ucs672 MST 23
No ratings yet
Ucs672 MST 23
3 pages
CS 540: Introduction To Artificial Intelligence: Final Exam: 8:15-9:45am, December 21, 2016 132 Noland
No ratings yet
CS 540: Introduction To Artificial Intelligence: Final Exam: 8:15-9:45am, December 21, 2016 132 Noland
8 pages
Word2Vec for NLP Enthusiasts
100% (1)
Word2Vec for NLP Enthusiasts
12 pages
Allnlp
No ratings yet
Allnlp
15 pages
02 Neural Lms
No ratings yet
02 Neural Lms
58 pages
0 Yqn EK3 VG 4 He OTv 089 KX SI1 Ij Wzu Ax T1 Ag Gev OKKJE
No ratings yet
0 Yqn EK3 VG 4 He OTv 089 KX SI1 Ij Wzu Ax T1 Ag Gev OKKJE
4 pages
Tutorial I
No ratings yet
Tutorial I
6 pages
Word Embedding & Language Modelling
No ratings yet
Word Embedding & Language Modelling
111 pages
N-Gram and Neural Language Models Quiz
No ratings yet
N-Gram and Neural Language Models Quiz
13 pages
18CS71
No ratings yet
18CS71
4 pages
WordRepresentation
No ratings yet
WordRepresentation
26 pages
OpenCV Color Conversion Guide
No ratings yet
OpenCV Color Conversion Guide
5 pages
Kami Export - Assignment - 2 - 20240709
No ratings yet
Kami Export - Assignment - 2 - 20240709
13 pages
Delhi Public School Bangalore North
No ratings yet
Delhi Public School Bangalore North
8 pages
IT M501 Artificial Intelligence
No ratings yet
IT M501 Artificial Intelligence
4 pages
Previous Question Papers
No ratings yet
Previous Question Papers
14 pages
NLP Comprehensive Study Guide Pokhara University Fall 2025
No ratings yet
NLP Comprehensive Study Guide Pokhara University Fall 2025
50 pages
NLP Notes-1
No ratings yet
NLP Notes-1
54 pages
NPTEL NLP Assignment 7.bin
No ratings yet
NPTEL NLP Assignment 7.bin
5 pages
18CS71 Model Question Paper Seventh Semester B.E. Degree Examination (2021-22)
No ratings yet
18CS71 Model Question Paper Seventh Semester B.E. Degree Examination (2021-22)
4 pages
NLP Question Bank Overview
No ratings yet
NLP Question Bank Overview
21 pages
NLP Mid Sem
No ratings yet
NLP Mid Sem
4 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
NLP ANONYMOUS QB Ans
No ratings yet
NLP ANONYMOUS QB Ans
21 pages
CS 540: Introduction To Artificial Intelligence: Final Exam: 12:25-2:25pm, May 16, 2013 Room 228 Educational Sciences
No ratings yet
CS 540: Introduction To Artificial Intelligence: Final Exam: 12:25-2:25pm, May 16, 2013 Room 228 Educational Sciences
10 pages
AI Search & Parsing Techniques Explained
No ratings yet
AI Search & Parsing Techniques Explained
4 pages
Mock MCQ 1
No ratings yet
Mock MCQ 1
6 pages
AI Midterm Exam: CS 540, Oct 2002
No ratings yet
AI Midterm Exam: CS 540, Oct 2002
12 pages
Spet 22
No ratings yet
Spet 22
20 pages
NLP Scheme for Mobile Forensics Exam
No ratings yet
NLP Scheme for Mobile Forensics Exam
6 pages
N-Gram Language Modeling Techniques
No ratings yet
N-Gram Language Modeling Techniques
87 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
27 pages
CS6314
No ratings yet
CS6314
2 pages
6.034 Quiz 2 November 17, 2003: Name Email
No ratings yet
6.034 Quiz 2 November 17, 2003: Name Email
20 pages
Question Bank
No ratings yet
Question Bank
29 pages
NLP Record 2
No ratings yet
NLP Record 2
18 pages
Exercises en Text Models 2
No ratings yet
Exercises en Text Models 2
5 pages
AI Exam Instructions
No ratings yet
AI Exam Instructions
11 pages
CS 224n Word2Vec Assignment Guide
No ratings yet
CS 224n Word2Vec Assignment Guide
4 pages
Semester Q NLP 2022-23
No ratings yet
Semester Q NLP 2022-23
2 pages
Embeddings
No ratings yet
Embeddings
3 pages
Unit 2
No ratings yet
Unit 2
48 pages
Theory Test I PDF
No ratings yet
Theory Test I PDF
11 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee
No ratings yet
Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee
8 pages
Faculty of Engineering: - Answer Any Four Full Questions Missing Data, If Any, May Be Assumed Suitably. 1. (A)
No ratings yet
Faculty of Engineering: - Answer Any Four Full Questions Missing Data, If Any, May Be Assumed Suitably. 1. (A)
2 pages
Applsci 15 06019
No ratings yet
Applsci 15 06019
57 pages
Machine Learning in Crypto Trading
No ratings yet
Machine Learning in Crypto Trading
22 pages
1 s2.0 S0952197623018018 Main
No ratings yet
1 s2.0 S0952197623018018 Main
11 pages
Unit II-NNDL
No ratings yet
Unit II-NNDL
19 pages
Ai - 3 BB
No ratings yet
Ai - 3 BB
15 pages
LLM Based Survey Text 1741015993
No ratings yet
LLM Based Survey Text 1741015993
20 pages
BatteryLife - A Comprehensive Dataset and Benchmark For Battery Life Prediction
No ratings yet
BatteryLife - A Comprehensive Dataset and Benchmark For Battery Life Prediction
12 pages
A Survey On Bias Detection in Online News Using Deep Learning
No ratings yet
A Survey On Bias Detection in Online News Using Deep Learning
8 pages
M.SC DS Syllabus Kalasalingam
No ratings yet
M.SC DS Syllabus Kalasalingam
43 pages
Meas S 25 06742
No ratings yet
Meas S 25 06742
24 pages
Dynamic Face Authentication Systems Deep Learning Verificat 2022 Computers
No ratings yet
Dynamic Face Authentication Systems Deep Learning Verificat 2022 Computers
15 pages
Use of AI in Civil Engineering, Its Problems and Solutions-1
No ratings yet
Use of AI in Civil Engineering, Its Problems and Solutions-1
17 pages
Chapter 3 - Artificial Intelligence
No ratings yet
Chapter 3 - Artificial Intelligence
5 pages
Nandini Et Al 2025 Deep Learning Based Classification of Malaria Through Parasitized and Uninfected Blood Smear Image
No ratings yet
Nandini Et Al 2025 Deep Learning Based Classification of Malaria Through Parasitized and Uninfected Blood Smear Image
18 pages
1-Advancing Civil Engineering With AI and Machine Learning From Structural Health
No ratings yet
1-Advancing Civil Engineering With AI and Machine Learning From Structural Health
36 pages
Virtual Mouse Using Hand Gestures
No ratings yet
Virtual Mouse Using Hand Gestures
6 pages
GNN Fin KK
No ratings yet
GNN Fin KK
406 pages
Credit Card Fraud Detection with Ensemble Methods
No ratings yet
Credit Card Fraud Detection with Ensemble Methods
19 pages
Tlm-Hccda Ai V 2.0
No ratings yet
Tlm-Hccda Ai V 2.0
15 pages
Breast Cancer Detection and Prevention Using Machi
No ratings yet
Breast Cancer Detection and Prevention Using Machi
21 pages
Lab Manual R20A6610 Deep Learning Year-IV Semester-I
No ratings yet
Lab Manual R20A6610 Deep Learning Year-IV Semester-I
68 pages
Statistical Data For AI - Prediction of Palmer Penguin Species
No ratings yet
Statistical Data For AI - Prediction of Palmer Penguin Species
12 pages
Deep Learning for Facial Expression Detection
No ratings yet
Deep Learning for Facial Expression Detection
6 pages
Day 1 All Attendance
No ratings yet
Day 1 All Attendance
8 pages
251002-Data Science Internship Document
No ratings yet
251002-Data Science Internship Document
62 pages
Advanced Seismic Characterization of A Geothermal Carbonate Reservoir
No ratings yet
Advanced Seismic Characterization of A Geothermal Carbonate Reservoir
54 pages
Zhang Robust Mixture-Of-Expert Training For Convolutional Neural Networks ICCV 2023 Paper
No ratings yet
Zhang Robust Mixture-Of-Expert Training For Convolutional Neural Networks ICCV 2023 Paper
12 pages
Unit 5
No ratings yet
Unit 5
15 pages
User Manual of DNN Simulator - V1.4 - v3 - Updated - 1229
No ratings yet
User Manual of DNN Simulator - V1.4 - v3 - Updated - 1229
34 pages
Final - Tristan PP Slides
No ratings yet
Final - Tristan PP Slides
14 pages