1.
Text Summarization Tool using NLP
a) What is NLP and how does it help? (1.5 marks)
NLP means teaching computers to understand human language. It helps to quickly read, understand, and
work with large text files like news articles.
b) Difference: Syntactic vs. Semantic analysis (1.5 marks)
- Syntactic = Grammar check.
Example: Finding subject and verb in She runs fast.
- Semantic = Meaning check.
Example: Understanding Apple means fruit or company depending on sentence.
c) How do n-grams help? (2 marks)
N-grams are word pairs/triples like global warming or new policy. They help find common phrases in the text
which can be used in summaries.
2. Spam Detection using NLP
a) What is NLP and why is it useful for spam detection? (1.5 marks)
NLP helps understand email text, so we can check if its spam based on words and patterns used.
b) Rule-based vs. Machine learning (1.5 marks)
- Rule-based: Uses fixed rules.
Example: Mark email as spam if it says You won a prize.
- ML-based: Learns from past spam emails.
Example: Naive Bayes model trained on many spam and not-spam emails.
c) How do n-grams help detect spam? (2 marks)
Spam emails often repeat word patterns like free money now. N-gram models learn these patterns to detect
spam easily.
3. Perplexity in Language Models
a) What is perplexity? (2 marks)
It checks how well a model can guess the next word in a sentence. Lower perplexity = better model.
b) Calculate perplexity (3 marks)
Given:
P(Dogs) = 0.25
P(bark | Dogs) = 0.15
P(at | bark) = 0.1
P(night | at) = 0.2
P(loudly | night) = 0.1
Multiply all:
0.25 0.15 0.1 0.2 0.1 = 0.000075
Take 5th root of 1 / 0.000075 Perplexity 7.92
4. Bigrams and Smoothing
a) Bigrams starting with AI: (1 mark)
AI solves, AI learns
b) Why raw bigrams are bad? (1 mark)
If a pair never appears, its chance becomes 0. The model will fail for new word pairs.
c) Add-1 smoothing: P(solves | AI) (3 marks)
Use formula:
P = (count(AI solves) + 1) / (count(AI) + total_words)
= (1+1)/(2+7) = 2/9 0.22
5. POS Tagging and NER
a) What is POS tagging? (2 marks)
It labels each words job in a sentence.
Examples:
- "She eats cake." eats = verb
- "The cake is tasty." cake = noun
b) What is NER? (2 marks)
NER finds names of people, places, etc.
Examples:
- "India" = Location
- "Elon Musk" = Person
c) How POS helps NER? (1 mark)
It shows which words are nouns or proper nouns, helping to find names more accurately.
6. CBOW (Word2Vec)
a) Goal of CBOW? (1.5 marks)
It learns word meanings by guessing a word using nearby words.
b) How CBOW works? (2 marks)
In The cat sat on the mat, to guess sat, CBOW looks at The, cat, on, the.
c) One advantage and one limitation (1.5 marks)
+ Fast training
Not great for rare words
7. Skip-gram (Word2Vec)
a) Goal of Skip-gram? (1.5 marks)
It guesses nearby words using the current word.
b) Example: (2 marks)
In AI solves problems, for word solves, skip-gram tries to guess AI and problems.
c) Advantage and limitation (1.5 marks)
+ Good for rare words
Slower than CBOW
8. Word Embeddings
a) What are embeddings vs. one-hot? (4 marks)
- One-hot: Only says if word is present like [0, 0, 1, 0]
- Embeddings: Real numbers that show word meaning like [0.2, -0.3, 0.7]
Embeddings help find similar words (e.g., king & queen).
b) How Word2Vec learns embeddings? (3 marks)
Trains a small neural network:
- CBOW: uses surrounding words to guess the middle word.
- Skip-gram: uses middle word to guess surrounding words.
c) vec("king") - vec("man") + vec("woman") vec("queen") (3 marks)
This shows word vectors capture meaning and gender.
Useful in search, translation, and chatbot understanding.