0% found this document useful (0 votes)
3 views3 pages

Engineering

Machine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Engineering

Machine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Question Bank – NLP Course for Final Year Tech Students

Module 1: Regular Expressions, Tokenization, Edit Distance

Short Answer Questions


●​ Define regular expressions. Give two examples of NLP use cases.
●​ What is the difference between stemming and lemmatization?
●​ Explain the concept of edit distance with an example.
●​ What is sentence segmentation and why is it important in NLP?
●​ List the main types of tokenization used in modern NLP.

Long Answer Questions


●​ Compare and contrast word tokenization and subword tokenization.
●​ Explain how edit distance can be computed using dynamic programming.
●​ Discuss the importance of word normalization and provide examples of normalization
techniques.

Application-Based Questions
●​ Write a Python regular expression to extract all email addresses from a paragraph.
●​ Implement a function that calculates Levenshtein distance between two input strings.

Module 2: N-gram Language Models

Short Answer Questions


●​ Define an N-gram. What is the difference between bigram and trigram models?
●​ What is perplexity in language models?
●​ Define smoothing in N-gram models and list its types.
●​ How does backoff differ from interpolation in smoothing?

Long Answer Questions


●​ Explain the concept of overfitting in the context of language models.
●​ Describe how to evaluate the performance of an N-gram language model using test data.
●​ Derive the formula for perplexity and explain its relation to entropy.

Application-Based Questions
●​ Build a bigram language model using a corpus and compute its perplexity on test data.
●​ Implement Laplace (add-one) smoothing for an N-gram model in Python.

Module 3: Naive Bayes, Sentiment Analysis, Vector Semantics

Short Answer Questions


●​ What assumptions does Naive Bayes make about features?
●​ Define precision, recall, and F1-score.
●​ What is cross-validation and why is it important in NLP classification tasks?
●​ Define TF-IDF and explain its importance.
●​ What is Pointwise Mutual Information (PMI) and how is it computed?

Long Answer Questions


●​ Describe how Naive Bayes can be used for sentiment analysis.
●​ Discuss the harms that can arise from biased or unethical classification models.
●​ Compare vector semantics with traditional lexical semantics.

Application-Based Questions
●​ Implement a Naive Bayes classifier for binary sentiment classification on movie reviews.
●​ Use Scikit-learn to compute TF-IDF vectors for a set of text documents and compare
cosine similarities.

Module 4: RNNs, LSTMs, Transformers

Short Answer Questions


●​ What are the vanishing gradient problems in RNNs?
●​ Define an LSTM and its core components (input gate, forget gate, output gate).
●​ What is the Encoder-Decoder architecture?
●​ Explain the role of attention in the Transformer model.

Long Answer Questions


●​ Compare RNN, LSTM, and GRU architectures. When would you use each?
●​ Describe how positional encoding works in Transformers.
●​ Explain the concept of bidirectional RNNs with use cases.

Application-Based Questions
●​ Build a simple character-level language model using LSTM in PyTorch or TensorFlow.
●​ Implement scaled dot-product attention with NumPy.

Module 5: Large Language Models, Masked Language Models

Short Answer Questions


●​ What is meant by "pretraining" in LLMs?
●​ Define masked language modeling with an example.
●​ What are contextual embeddings? How do they differ from static embeddings?
●​ Describe Named Entity Recognition (NER) as a sequence labeling task.

Long Answer Questions


●​ How do large language models (LLMs) like GPT differ from traditional language models?
●​ Discuss how LLMs are scaled and the challenges involved.
●​ Explain the difference between causal language modeling and masked language
modeling.

Application-Based Questions
●​ Fine-tune a pre-trained BERT model for NER using Hugging Face Transformers.
●​ Use a pre-trained Transformer model (e.g., GPT-2) to generate text from a seed prompt.

You might also like