1. What are language models?
CO2 BL2 2 Marks
2. Describe the n-gram model with a specific example. CO2 BL2 2 Marks
3. Write two differences between bi-gram and tri-gram models. CO3 BL2 2 Marks
4. What is chain rule of probability? CO2 BL2 2 Marks
5. When we are considering the bigram model what approximation/s do we make to the actual
formula to calculate probability? CO2 BL4 5 Marks
6. What is a Markov assumption? CO2 BL2 2 Marks
7. What is maximum likelihood estimation used for? CO2 BL2 2 Marks
8. Given a word wn and the previous word wn-1, how to normalize the count of bigrams? State
the formula for the same. CO2 BL4 10 Marks
9. What is relative frequency in n-gram model? CO2 BL2 2 Marks
10. What are the building blocks of semantic system? CO2 BL2 5 Marks
11. Discuss lexical ambiguity. CO2 BL2 5 Marks
12. Discuss semantic ambiguity. CO2 BL2 5 Marks
13. Discuss syntactic ambiguity. CO2 BL2 5 Marks
14. What is the need for meaning representation? CO2 BL2 5 Marks
15. What is the major difference between lexical analysis and semantic analysis in NLP? CO2
BL2 5 Marks
16. Name two language modelling toolkits. CO2 BL2 2 Marks
17. With examples explain the different types of parts of speech attributes. CO2 BL3 5
Marks
18. Explain extrinsic evaluation of the N-gram model and the difficulties related to it. CO2
BL3 10 Marks
19. With an example explain the path-based similarity check for two words. CO2 BL3 5
Marks
20. Define Homonymy, Polysemy and Synonymy with examples. CO2 BL2 5 Marks
21. How does WordNet assist in extracting semantic information from a corpus? CO2 BL2
5 Marks
22. How does NLP employ computational lexical semantics? Explain. CO2 BL2 5 Marks
23. What are the problems with basic path based similarity measure and how are the
reformation through information content similarity metrics? CO2 BL2 10 Marks
24. Explain extended Lesk algorithm with example. CO2 BL2 5 Marks
25. State the difference in properties of Rule based POS tagging and Stochastic POS tagging. CO2
BL3 5 Marks
26. What is stochastic POS tagging? What are the properties of stochastic POS tagging?
CO2 BL2 10 Marks
27. What is rule based POS tagging and what are the properties of the same? CO2 BL2
10 Marks
28. Give examples to illustrate how the n-gram approach is utilized in word prediction. CO2
BL2 10 Marks
29. Highlight transformation based tagging and the working of the same. CO2 BL3 10
Marks
30. State the difference between structured data and unstructured data. CO2 BL3 10
Marks
31. What is semi-structured data? Explain with an example. CO2 BL3 5 Marks
32. How does a supervised machine learning algorithm contribute to text classification? CO2
BL3 5 Marks
33. List the uses of emotion analytics. CO2 BL3 5 Marks
34. Say you are an employee of a renowned food delivery company and your superior has asked
you to do a market survey to search for potential competition and zero down to areas where
your company needs to improve to be the top company in the market. How will you
approach this task and accomplish the goal? CO2 BL6 10 Marks
35. Explain a classic search model with a diagram. CO2 BL5 5 Marks
36. Why is part-of-speech (POS) tagging required in NLP? CO2 BL3 5 Marks
37. What is vocabulary in NLP? CO2 BL2 2 Marks
38. What do you mean by Information Extraction? CO2 BL2 2 Marks
39. What is morphological parsing? Explain the steps of morphological parser? CO2 BL4
5 Marks
40. What is BOW (Bag of Words)? CO2 BL4 5 Marks
41. State the difference between formal language and natural language. CO2 BL4 5
Marks
42. Assume there are 4 topics namely, Cricket, Movies, Politics and Geography and 4 documents
D1, D2, D3 and D4, each containing equal number of words. These words are taken from a
pool of 4 distinct words namely, {Shah Rukh, Wicket, Mountains, Parliament} and there can
be repetitions of these 4 words in each document. Assume you want to recreate document
D3. Explain the process you would follow to achieve this and reason as how recreating
document D3 can help us understand the topic of D3 CO2 BL6 10 Marks
43. What is text parsing? CO2 BL2 2 Marks
44. Explain Sentiment analysis in market research? CO2 BL2 2 Marks
45. Describe Hidden Markov Models. CO2 BL2 2 Marks
46. State and explain in details the main advantage of Latent Dirichlet Allocation methodology
over Probabilistic Latent Semantic Analysis for building a Recommender system? CO2 BL3
10 Marks
47. Explain in details as how the Matrix Factorization technique used for building Recommender
Systems effectively boils down to solving a Regression problem. CO2 BL3 5 Marks
48. What are the two main approaches used in computational linguistics for Part of Speech
(POS) tagging? CO2 BL2 5 Marks
49. What is WordNet? CO2 BL2 2 Marks
50. Describe the hierarchy of relationships in WordNet. CO2 BL2 5 Marks
51. How are morphological operations applied in NLP? CO2 BL3 5 Marks
52. Explain the concept of hypernyms, hyponyms, heteronyms in WordNet. CO2 BL2 10
Marks
53. Discuss the advantages and disadvantages of CBOW and Skip-gram models. CO2 BL5
10 Marks
54. Explain the process of text classification, focusing on Naïve Bayes' Text Classification
algorithm. CO2 BL2 10 Marks
55. How do you use naïve bayes model for collaborative filtering? CO2 BL3 5 Marks
56. Is lexical analysis different from semantic? How? CO2 BL4 10 Marks
57. Define what N-grams are in the context of Natural Language Processing (NLP). CO2 BL2
5 Marks
58. What are word embeddings in the context of Natural Language Processing (NLP)? CO2
BL3 10 Marks
59. What is "vector semantics" in NLP, and why is it useful for understanding word meanings?
CO2 BL3 10 Marks
60. Discuss a significant limitation of TF-IDF. CO2 BL4 2 Marks
61. Discuss the application of regular expressions in Natural Language Processing (NLP),
emphasizing their role in text processing tasks. Provide examples. CO2 BL3 5
Marks
62. Explain the concept of N-grams in NLP and with examples discuss their importance in
language modelling to demonstrate how N-grams capture sequential patterns in text data
CO2 BL3 10 Marks
63. Explain the significance of n-grams in the design of any text classification system using
examples. CO2 BL3 5 Marks
64. Discuss the disadvantage of uni-gram in information extraction. CO2 BL1 5 Marks
65. Define homographs and provide an example. CO2 BL2 2 Marks
66. How is the Levenshtein distance algorithm used to find similar words to a given word? CO4
BL5 10 Marks
67. Define heteronyms and provide an example. CO2 BL2 2 Marks
68. Explain the concept of polysemy and provide an example. CO2 BL2 2 Marks
69. Define synonyms and antonyms and provide examples of each. CO2 BL2 2 Marks
70. We are given the following corpus:
<s> I am sam </s>
<s> Sam I am </s>
<s> I am Sam </s>
<s> I do not like green eggs and Sam</s>
Using a bigram language model with add-one smoothing, what is P(Sam | am)? Include <s> &
</s> in your counts just like any other token. CO2 BL5 10 Marks
71. Comment on the validity of the following statements:
a) Rule-based taggers are non-deterministic
b) Stochastic taggers are language independent
c) Brill’s tagger is a rule-based tagger CO2 BL5 10 Marks