0% found this document useful (0 votes)
23 views11 pages

IR Midsem

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views11 pages

IR Midsem

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2/7/22, 8:34 AM IR Midsem

IR Midsem
Description:
1. The exam contains 24 MCQs.
2. There may be more than one option correct for each question.
3. Some questions are worth 1 point, the rest -> 2 points.
4. There is no partial marking. Full marks will be awarded for a question if and only if all
correct and no wrong options are selected.
5. No negative marking.

Important Guidelines:
1. Open book
2. You may use a calculator (**do not use mobile phone calculator)
3. Kindly ensure your videos are on.
4. No extension will be given.

[email protected] Switch account

Your email will be recorded when you submit this form

If we use bigram indexes, which of the following words would be falsely 2 points
enumerated by co*me?

come

comment

income

coulome

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 1/11
2/7/22, 8:34 AM IR Midsem

Minimum how many copies of data are maintained in HDFS ? 1 point

What is the idf of the term which occurs in every document? 1 point

log10(N)

log10(1/N)

Rank the following documents in decreasing order according to their tf- 2 points
idf score wrt query = “All vehicles including car auto bike bus are stopped
due to accident”. Vocabulary = {car, auto, bike, bus} (*Use tf-idf = tf x idf)

doc1, doc2 , doc3

doc2, doc3, doc1

doc3, doc2, doc1

doc1, doc3, doc2

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 2/11
2/7/22, 8:34 AM IR Midsem

In logarithmic merge, where n=3. We have 47 tokens to be processed. Find 2 points


which all indexes including auxiliary indexes (Z0, I0, I1, I2, I3, I4 ) are in use
after all the tokens are used. See the table for representation. (consider
Z0 < n).

1, 1, 1, 0, 1, 0

0, 1, 1, 1, 1, 0

1, 1, 1, 1, 1, 0

0, 0, 0, 1, 1, 1

Which of the following are the functions of parser in distributed indexing? 1 point

Sorts and writes to a posting list.

Writes pairs into k partitions, where k ∈ N.

Reads document at a time and emits a pair.

Assigns a split into an idle machine.

Collects all pairs for one partition

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 3/11
2/7/22, 8:34 AM IR Midsem

How would the wild card query qu*ry be expressed for lookup in the 1 point
permutation index?

ry$*qu

ry$qu*

$qu*ry

qu*ry$

Compute edit distance between “cats” and “fast”, (with insertion, deletion 2 points

and substitution only).

Which of the following does not improve the performance of distributed 2 points

processing?

None of above

maintaing checksum of data

replication of data

partitioning of data

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 4/11
2/7/22, 8:34 AM IR Midsem

Which of the following can not run on HDFS? 1 point

MapReduce

Spark

Oracle Database

Hbase

Real time processing is also called as 2 points

Processing group of events less than minute

Per day processing

Per event processing

Per hour processing

In which launguage MapReduce is written ? 1 point

Python

C++

Java

Scala

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 5/11
2/7/22, 8:34 AM IR Midsem

Observed word is “acress”. Use the below table for finding the most 2 points

suitable correct word. (Dictionary contains only candidate words)

across

actress

access

acres

Which of following is not a data ingestion tool? 2 points

spark

kafka

flume

sqoop

What is purpose of Namenode ? 2 points

Store data

None of the above

Store metadata

Schedule jobs

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 6/11
2/7/22, 8:34 AM IR Midsem

For the query 'bord', state the word from the dictionary which has the 2 points
second minimum Jaccard Coefficient using character 2-gram index.
Dictionary = {aboard, border, dropped, lord}.

border

lord

dropped

aboard

Edit distance between any two strings s1 and s2 is upper bounded by? (|s| 1 point
denotes the length of the string)

|s1| - |s2|

min( |s1| , |s2| )

max( |s1| , |s2| )

|s1| + |s2|

Can the tf-idf weight of term in a document exceed 1? 1 point

True

False

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 7/11
2/7/22, 8:34 AM IR Midsem

Let’s say the length of the embedding vectors of songs is directly 2 points

proportional to their popularity. You want to calculate the similarity


between songs. Which of the following is/are true ?

If you switch from cosine similarity to dot product, popular songs become more
similar to only other popular songs.

If you switch from cosine similarity to dot product, popular songs become more
similar to all songs in general.

If you switch from dot product to cosine similarity, popular songs become less similar
than less popular songs.

If you switch from dot product to cosine similarity, popular songs become more
similar than less popular songs.

No change in song similarities when switching from cosine similarity to dot product

No change in song similarities when switching from dot product to cosine similarity

Paragraph for the next 3 questions


Q-abcd
D1 - a a c c
D2 - b d
Here a,b,c,d are individual tokens.
For the above set of query(Q) and documents(D1, D2), use the lnc.ltc weighting scheme to compute the
ranking score and answer the following:
(Roundup each calculation up to 2 decimal places. Use log10)

Q1

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 8/11
2/7/22, 8:34 AM IR Midsem

Which of the following is/are true? 2 points

D2 has better/larger score than D1

Whichever has better score, it is by a low margin (|difference| <= 0.02)

Whichever has better score, it is by a high margin (|difference| > 0.02)

D1 has better/larger score than D2

Q2

Now, if we take the euclidean distance between the normalized vectors 2 points

(instead of product), which of the following is/are true? (The ranking order
we talk about in this question is the one we get after Q1)

The ranking order remains the same and the margin is low (|difference| <= 0.02)

The ranking order remains the same and the margin is high (|difference| > 0.02)

The ranking order remains the same

The ranking order reverses

Q3

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 9/11
2/7/22, 8:34 AM IR Midsem

Now, if we take the product without normalizing the document vectors, 2 points

which of the following is/are true? (The ranking order we talk about in this
question is the one we get after Q1)

The ranking order remains the same and the margin is low (|difference| <= 0.02)

The ranking order remains the same and the margin is high (|difference| > 0.02)

The ranking order reverses

The ranking order remains the same

Paragraph for the next 2 questions


Q-abc
D1 - a a d
D2 - b c a
D3 - a a
Here a,b,c,d are individual tokens.
While ranking the documents using Binary Independence Model (BIM), in a particular iteration, we get
user feedback which tells us that -
(i) All documents are relevant
(ii) A term/token is relevant to a document if the document contains that specific term/token.
Now for this particular iteration, answer the following:
(Use log10 wherever log is required)

Q1

Which of the following is/are true? (Hint: Use the contingency table. For 2 points
smoothing, add 0.5 to every count in the table)

The log-odds ratio for term ‘a’ is 0.845

The log-odds ratio for term ‘c’ is -0.14

The log-odds ratio for term ‘a’ is 0.645

The log-odds ratio for term ‘b’ is -0.22

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 10/11
2/7/22, 8:34 AM IR Midsem

Q2

Which of the following is/are true? (RSV(D) denotes the Retrieval Status 2 points
Value for document D)

RSV(D1) = 1.69

RSV(D1) = 1.29

RSV(D1) = RSV(D3)

RSV(D2) = 0.60

A copy of your responses will be emailed to [email protected].

Submit Clear form

This form was created inside of IIIT Delhi. Report Abuse

Forms

https://docs.google.com/forms/d/e/1FAIpQLSfLz0PuhiDjyQ5fV4QDqfagw-H3-9EuV4Iqvmn7ZZM_Qx0UJg/viewform 11/11

You might also like