Question Bank – NLP Course for Final Year Tech Students
Module 1: Regular Expressions, Tokenization, Edit Distance
Short Answer Questions
● Define regular expressions. Give two examples of NLP use cases.
● What is the difference between stemming and lemmatization?
● Explain the concept of edit distance with an example.
● What is sentence segmentation and why is it important in NLP?
● List the main types of tokenization used in modern NLP.
Long Answer Questions
● Compare and contrast word tokenization and subword tokenization.
● Explain how edit distance can be computed using dynamic programming.
● Discuss the importance of word normalization and provide examples of normalization
techniques.
Application-Based Questions
● Write a Python regular expression to extract all email addresses from a paragraph.
● Implement a function that calculates Levenshtein distance between two input strings.
Module 2: N-gram Language Models
Short Answer Questions
● Define an N-gram. What is the difference between bigram and trigram models?
● What is perplexity in language models?
● Define smoothing in N-gram models and list its types.
● How does backoff differ from interpolation in smoothing?
Long Answer Questions
● Explain the concept of overfitting in the context of language models.
● Describe how to evaluate the performance of an N-gram language model using test data.
● Derive the formula for perplexity and explain its relation to entropy.
Application-Based Questions
● Build a bigram language model using a corpus and compute its perplexity on test data.
● Implement Laplace (add-one) smoothing for an N-gram model in Python.
Module 3: Naive Bayes, Sentiment Analysis, Vector Semantics
Short Answer Questions
● What assumptions does Naive Bayes make about features?
● Define precision, recall, and F1-score.
● What is cross-validation and why is it important in NLP classification tasks?
● Define TF-IDF and explain its importance.
● What is Pointwise Mutual Information (PMI) and how is it computed?
Long Answer Questions
● Describe how Naive Bayes can be used for sentiment analysis.
● Discuss the harms that can arise from biased or unethical classification models.
● Compare vector semantics with traditional lexical semantics.
Application-Based Questions
● Implement a Naive Bayes classifier for binary sentiment classification on movie reviews.
● Use Scikit-learn to compute TF-IDF vectors for a set of text documents and compare
cosine similarities.
Module 4: RNNs, LSTMs, Transformers
Short Answer Questions
● What are the vanishing gradient problems in RNNs?
● Define an LSTM and its core components (input gate, forget gate, output gate).
● What is the Encoder-Decoder architecture?
● Explain the role of attention in the Transformer model.
Long Answer Questions
● Compare RNN, LSTM, and GRU architectures. When would you use each?
● Describe how positional encoding works in Transformers.
● Explain the concept of bidirectional RNNs with use cases.
Application-Based Questions
● Build a simple character-level language model using LSTM in PyTorch or TensorFlow.
● Implement scaled dot-product attention with NumPy.
Module 5: Large Language Models, Masked Language Models
Short Answer Questions
● What is meant by "pretraining" in LLMs?
● Define masked language modeling with an example.
● What are contextual embeddings? How do they differ from static embeddings?
● Describe Named Entity Recognition (NER) as a sequence labeling task.
Long Answer Questions
● How do large language models (LLMs) like GPT differ from traditional language models?
● Discuss how LLMs are scaled and the challenges involved.
● Explain the difference between causal language modeling and masked language
modeling.
Application-Based Questions
● Fine-tune a pre-trained BERT model for NER using Hugging Face Transformers.
● Use a pre-trained Transformer model (e.g., GPT-2) to generate text from a seed prompt.