KOE088: Natural Language Processing
Unit V – Ambiguity Resolution
Unit V: Ambiguity Resolution
Natural Language Processing (NLP) mein ambiguity ek bahut bada challenge hota
hai. Human beings context aur common sense ka use karke easily correct
interpretation kar lete hain, lekin machines ke liye yeh kaafi mushkil hota hai. Is unit
mein hum ambiguity ke types, probabilistic grammar models, statistical methods,
aur best-first parsing jaise advanced techniques ko detail mein samjhenge.
1. Introduction to Ambiguity
Ambiguity ka matlab hota hai confusion ya multiple interpretations hona.
Example:
“She saw the man with the telescope.”
- Kya usne telescope se dekha?
- Ya jisko dekha uske paas telescope tha?
NLP system ko correct interpretation karne ke liye ambiguity resolve karni padti
hai.
2. Types of Ambiguity
a) Lexical Ambiguity:
Ek word ke multiple meanings hote hain.
Example:
“bank” - river bank ya financial bank
b) Syntactic Ambiguity:
Sentence ka structure multiple meanings allow karta hai.
Example:
“I saw the man with the telescope.”
c) Semantic Ambiguity:
Sentence ka meaning unclear hota hai.
Example:
“Visiting relatives can be boring.”
- Relatives who are visiting?
- The act of visiting relatives?
d) Pragmatic Ambiguity:
Context ya speaker’s intention unclear hoti hai.
Example:
“Can you pass the salt?” – Question hai ya request?
3. Statistical Methods for Ambiguity Resolution
Statistical models use karte hain historical data aur probabilities to predict the
correct meaning.
Example:
If 90% of the time “bank” means financial bank in news articles, NLP model waisa
interpret karega.
Statistical Models:
- Naive Bayes Classifier
- Logistic Regression
- Decision Trees
4. Probabilistic Language Processing
Probability ka use karke sentence ke possible interpretations ko rank kiya jaata hai.
Probabilistic Context-Free Grammar (PCFG):
Each rule in CFG is assigned a probability.
Example:
S → NP VP [1.0]
NP → Det N [0.8] | N [0.2]
VP → V NP [0.7] | V [0.3]
Parsing mein sabse high probability wala derivation choose kiya jaata hai.
5. Estimating Probabilities
Corpus ke basis par probabilities calculate hoti hain.
Example:
- Frequency(“the cat”) = 1000
- Frequency(“cat the”) = 2
Toh “the cat” ka probability zyada hoga.
Methods:
- Maximum Likelihood Estimation (MLE)
- Good-Turing Smoothing
- Laplace Smoothing
6. Part-of-Speech Tagging (POS Tagging)
POS tagging har word ko correct grammatical category assign karta hai: noun, verb,
adjective, etc.
Example:
“Time flies like an arrow.”
- “flies” verb ho sakta hai (as in birds fly)
- ya noun (type of insect)
Tagging ke liye models use hote hain:
- Hidden Markov Model (HMM)
- Conditional Random Fields (CRF)
- Deep Learning (Bi-LSTM, BERT)
7. Lexical Probabilities
Har word ke saath related probable meanings hote hain.
Example:
“bass”:
- Music instrument (70%)
- Type of fish (30%)
Lexical databases like WordNet aur corpus-based learning in probabilities ko
determine karte hain.
8. Probabilistic Context-Free Grammars (PCFGs)
PCFGs ambiguity handle karne ke liye CFG mein probability add karte hain.
Example:
Rule: VP → V NP [0.6]
VP → V [0.4]
Agar sentence ke context mein verb ke baad noun phrase aana zyada common hai,
toh parser woh structure choose karega.
PCFG Parsing Tools:
- Stanford Parser
- NLTK Toolkit
- spaCy (advanced NLP engine)
9. Best-First Parsing
Best-first parsing ek heuristic-based approach hai jisme sabse promising parse
pehle explore hota hai.
Steps:
- Sabhi possible parses ko generate karna
- Unko score assign karna based on probability
- Highest score wale parse ko select karna
Advantages:
- Faster than exhaustive parsing
- Better for real-time applications
10. Semantics and Logical Form
Semantics word aur sentence ka meaning define karta hai.
Logical form ek structured representation hoti hai jo computers ko meaning
samajhne mein help karti hai.
Example:
Sentence: “Every student loves a teacher.”
Possible logical forms:
- ∀x(Student(x) → ∃y(Teacher(y) ∧ Loves(x,y)))
- ∃y(Teacher(y) ∧ ∀x(Student(x) → Loves(x,y)))
Ambiguity arise hoti hai logical scope ke wajah se.
11. Word Sense Disambiguation (WSD)
WSD ka goal hai correct sense of word choose karna.
Example:
Word: “bark”
- Dog’s sound
- Tree’s outer layer
Approaches:
- Dictionary-based (WordNet)
- Supervised Learning
- Unsupervised Learning
- Knowledge-based (Lesk algorithm)
12. Encoding Ambiguity in Logical Form
Ambiguity ko encode karne ke liye multiple logical forms generate kiye jaate hain,
aur unmein se best choose hota hai.
Example:
“She gave a book to the girl with the red hair.”
- Kis ke paas red hair tha? Girl ya book giver?
Logical forms ambiguity resolve karte hain.
13. Tools for Ambiguity Resolution
- WordNet: Lexical Database
- NLTK: Toolkit for parsing and WSD
- spaCy: Industrial-strength NLP toolkit
- AllenNLP: Deep learning-based language models
14. Real-Life Applications
a) Chatbots:
User ke sentence ko correct interpret karna
b) Machine Translation:
Source language ambiguity ko resolve karke target language generate karna
c) Virtual Assistants (Siri, Alexa):
“Set an alarm at 6” vs “Set an alarm, add 6 tasks”
d) Grammar Checkers:
Correct suggestion ke liye ambiguity resolution
15. Deep Learning in Ambiguity Resolution
Models like BERT, GPT context ke basis par ambiguity ko resolve karte hain.
Example:
“I went to the bank to deposit money.” – BERT understands “bank” is financial
institution.
16. Challenges and Limitations
- Low-resource languages ke liye data kam hota hai
- Multiple languages me same word ka meaning alag hota hai
- Long-distance dependencies parsing me complex hote hain
17. Summary
Ambiguity resolution NLP systems ke liye fundamental hai. Iske liye statistical,
probabilistic, aur AI-based models ka use kiya jaata hai. Is unit mein humne
ambiguity ke types, probabilistic grammars, POS tagging, WSD, aur logical
representation ko deeply samjha with examples.
Ab NLP systems better predictions aur language understanding kar paate hain using
these models.