Module I Introduction Part 2
Module I Introduction Part 2
ASET
M-Tech, III
Module I Introduction
Dr. Sweta Srivastava
1
Module I : Introduction ASET
Two Approaches to
Natural Language
Processing
• Classical (Rule-Based) NLP
• Statistical (Data-Driven) NLP
2
Classical NLP – An ASET
Overview
•Human-authored rules
•Inspired by linguistic theories
•Modular architecture:
Classical NLP is structured in layers, and rules for processing are defined explicitly
by human experts.
Each stage processes a specific aspect of language, using grammar and domain
knowledge.
These rules are often written using the Backus-Naur Form (BNF) or context-free
grammar (CFG), which is a formal way to describe the syntax of a language.
3
Example of Classical NLP – ASET
5
The Rise of Statistical NLP ASET
Approaches
Classical Approach: Statistical Approach:
Rule-Based Data-Driven
8
Ambiguity in Language ASET
Feature-Based Disambiguation
Content:
•Features = clues used for classification
•Examples:
• Word position relative to the verb •Example:
• Case markers in free word order languages • English: “beat” =
• Morphological suffixes noun or verb?
• Hindi: “haraya”
Features are central to statistical NLP. For instance, English (verb + past tense
relies on word order, while Hindi uses case markers like “ne” suffix “aaya”)
and “ko” to resolve roles. removes ambiguity
English: Fixed word order (“France beat Brazil”) In Indian languages, verb
Hindi: “France ne Brazil ko haraya” = “Brazil ko France ne forms often include rich
haraya” morphological cues that
Case markers encode roles, not word order reduce ambiguity.
10
ASET
11
NLP as a Sequence ASET
Labeling Task
Words:
Part-of-Speech (POS) Tagging
Assign grammatical categories to each word.
Example:
• The cat sat on the mat.
→ The/DET cat/NOUN sat/VERB on/ADP the/DET mat/NOUN
Labeling Task
Phrases:
Chunking (Shallow Parsing)
• Group words into syntactically correlated units.
Example:
• The quick brown fox → [NP The quick brown fox]
• jumps over the lazy dog → [VP jumps] [PP over] [NP the lazy dog]
13
NLP as a Sequence ASET
Labeling Task
Sentences:
Parsing (Syntactic Parsing)
• Generate full syntactic structure of a sentence.
Example (Constituency Parse):
• (The dog chased the cat)
(S (NP The dog) (VP chased (NP the cat)))
14
NLP as a Sequence ASET
Labeling Task
Paragraphs:
Co-reference Resolution
• Identify when different expressions refer to the same entity.
Example:
• John went to the store. He bought some milk.
→ "He" refers to "John"
15
Importance of NER (Named ASET
Entity Recognition)
Machine Translation (MT)
Preserves the correct translation of names and entities across languages.
Summarization
Ensures that key entities remain in the summary.
Entity Recognition)
Question Answering (QA)
Identifies potential answers by locating named entities in documents.
Text: Dr. Smith will attend the conference in Tokyo on August 12.
• Extracted: Person: Dr. Smith, Location: Tokyo, Date: August 12
17
Part-of-Speech (POS) ASET
Tagging
•Assigns grammatical categories to each word in a sentence.
•Tags are drawn from a predefined set.
•POS tagging helps in understanding syntactic roles and structure.
, , Punctuation (comma)
Dr. NNP Proper noun (title)
20
Chunking vs Full Parsing for ASET
Sentiment Phrases
Evaluate which method better captures adjective–noun sentiment phrases (e.g.,
“amazing food”, “terrible service”) in user reviews.
Strengths: Limitations:
•Fast and lightweight Cannot model nested or long-distance
•Directly captures flat phrases like [JJ dependenciesMay miss sentiment
NN] when structure is more complexe.g.,
•Works well for extracting local “The food, though overpriced, was
sentiment spans surprisingly amazing.”
21
Chunking vs Full Parsing for ASET
Sentiment Phrases
Evaluate which method better captures adjective–noun sentiment phrases (e.g.,
“amazing food”, “terrible service”) in user reviews.
Strengths: Limitations:
• Handles complex structures, long
dependencies •Slower, more computationally
• Robust for syntactic-based opinion mining expensive
• Can extract non-adjacent sentiment elements •May overcomplicate simple
(e.g., “food that looked amazing”) extraction tasks
22
Parsing (Sentence ASET
Labeling)
Parsing is the task of analyzing the syntactic structure of a sentence and
representing it as a hierarchical tree of its grammatical components.
Labeling)
Feature Benefit
Recognizes sentence units Subject, predicate, objects
Captures nested phrases Handles complex grammar
Provides structure For reasoning, translation, QA
Enables deeper semantics Beyond POS or NER
24
Ambiguity in Named Entity ASET
Detection
Explore how NLP models handle ambiguous named entities in real-world
sentences — particularly when the same word can refer to multiple entity types
depending on context.
This sentence contains three instances of the word “Amazon”, each referring to a
different entity type:
Detection
import spacy Output:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Amazon delivers Amazon to Amazon.") Amazon ORG
for ent in doc.ents: Amazon ORG
print(ent.text, ent.label_) Amazon ORG
spaCy assigns the same label ("ORG") to all instances due to surface similarity
and lack of deeper context modeling.
Token Manually Annotated Label
Amazon (1st) ORG (company)
Amazon (2nd) PRODUCT or ORG (ambiguous)
Amazon (3rd) GPE or LOC (location)
27
The Noisy Channel Model: ASET
Core Idea
•Rooted in Information Theory: Originally developed by Claude Shannon in the
1940s and later applied to speech recognition and signal processing in the 1960s.
28
Noisy Channel Model in ASET
NLP
We observe a distorted output t (e.g., a misspelled word, noisy sentence, or
incorrect tagging), and want to infer the most probable original sequence W (e.g.,
the correct sentence or word).
(wₙ, wₙ₋₁, ..., w₁) ──[Noisy Channel]──> (tₘ, tₘ₋₁, ..., t₁)
Noisy transformation
Correct sequence
Guess at the correct sequence
29
Understanding argmax ASET
•argmax(f(x)):
This gives you the value of x for which the function f(x) is maximized. 30
argmax(f(x)) = 2 because the function reaches its peak value (4) when x = 2.
Bayes' Theorem ASET
Bayes’ Theorem tells us how to update our belief about a hypothesis (A)
after seeing some evidence (B). It reverses the direction of conditional
probability. You might know how likely A is if B happens (likelihood)
Where:
•W: Original sentence or word (what we want to recover)
•t: Observed input (possibly distorted)
•P(W): How likely is the original word? (Language model)
•P(t∣W): How likely is this word to get distorted into what we saw? (Error model)
31
Bayes' Theorem ASET
P (A⋂B) = P (B⋂A)
P (A) P (B\A) = P (B) P (A\B)
=> P(A\B) =P(A) P (B\A)
P (B)
Choose the value (e.g., label, tag, word) whose posterior probability is highest
given the observed data.This forms the mathematical basis for sequence
labeling tasks, which are abundant in NLP
32
Example ASET
We need to find
P(m|s): probability of meningitis given the stiff
neck
P(m∣s)<<P(∼m∣s)
Theory Principle
Bayesian Decision Theory Principle
"Decide in favour of that value of a random variable which is the highest among
other values of the variable, probabilistically“
34
Some Issues ASET
#(m∩s)
#s
Questions:
•Which is more reliable to compute, p(s/m) or p(m/s)?
•Which evidence is more sparse, p(s/m) or p(m/s)?
•Test of significance: The counts are always on a sample of population.
35
•Which probability count has sufficient statistics?
ASET
37
Argmax-Based Computation ASET
in Statistical NLP
Why the
Denominator P(B)
Disappears
•P(B) is constant for
all A
•Does not affect the
argmax decision
•Can be ignored in
the computation
Key Steps in Computing P(A)) and P(B∣A)
1. Look at the internal structures of A and B
1. In NLP, A and B are often long sequences (sentences, word sequences).
2. Break them into smaller components (e.g., words, n-grams).
2. Make independence assumptions
1. Full joint probability of sequences is hard to compute.
2. .Assume certain variables are independent to simplify estimation.
3. Assemble computation from smaller parts
1. Estimate probabilities for small components.
2. Combine them to approximate P(A)) and P(B∣A)
38
PoS Tagging: Example ASET
Tagged Output:
The/DET national/ADJ committee/NOU remarked/VRB on/PRP a/DET
number/NOU of/PRP other/ADJ issues/NOU
•The → Determiner → DET
•PoS tags are assigned based on word •national → Adjective (qualifies
properties, function in the sentence, and “committee”) → ADJ
relationships to other words. •committee → Noun → NOU
•remarked → Verb → VRB
•on → Preposition → PRP
•Some words can function in multiple roles (e.g., •a → Determiner → DET
adjectives functioning as nouns), requiring •number → Noun → NOU
disambiguation. Eg: committee/NOU •of → Preposition → PRP
•other → Adjective → ADJ
•issues → Noun → NOU
•Accurate PoS tagging is essential for
downstream NLP tasks.
39
Part-of-Speech (PoS) Tagging ASET
– Statistical Formulation
From Example to General Model Trigram Tagger
•Sentence → sequence of words: w=(w1,w2,...,wn) •Assumes tag of a word
depends only on the
•Goal: assign the best sequence of tags t=(t1,t2,...,tn) previous two tags.
•Tags determined by: •Known as a trigram
• Lexical properties of the words. tagger.
• Context (neighbouring tags and words).
HMM + Viterbi
Decoding
•Hidden Markov Model
(HMM) provides the
probabilistic
framework.
•Viterbi algorithm finds
the most probable tag
sequence efficiently.
40
PoS Tagging: Example ASET
41
ASET
42
Statistical Spell Checking — ASET
Argmax Formulation
•P(W): Prior probability —
how likely a word is in the
language.
•Acts as a filter: Rare or
invalid words get low
probability.
•P(T|W): Likelihood — how
likely it is to misspell W as
T.
43
Statistical Spell Checking —
ASET
Argmax Formulation
44
Confusion Matrices in ASET
Number of times
Substitution APPLE → AOPLE
letter x is replaced sub(P, O)
(sub(x,y)) (P replaced by O)
by letter y.
Number of times
FIGHT → FIGBHT
Insertion (ins(x,y)) letter y is inserted ins(G, B)
(B inserted after G)
after letter x.
Number of times
letter y is deleted APPLE → APPE (L
Deletion (del(x,y)) del(P, L)
when preceded by deleted after P)
letter x.
Number of times
Transposition APPLE → APLPE
letters x and y are trans(P, L)
(trans(x,y)) (P and L swapped) 45
swapped.
Confusion Matrices in ASET
Where
S= Substitution,
I= Insertion
T= Transposition
X= Deletion 46
Confusion Matrices in ASET
47
Example ASET
49
…example ASET
50
…example ASET
51
…example ASET
52
…example ASET
• General Discussion
53
ASET
Probabilistic Speech
Recognition
54
Speech recognition ASET
detecting where
identifying a
one word ends
single spoken
and the next
word
begins
•Example:
•"I found a key on the road today" → clear boundaries
•But ambiguity exists:
•"I got a plate very quickly"
•Could be “I got up late today”
•Or “I got a plate today”
Disfluency in Speech
•Fillers like “umm”, “aahh” help speakers think
•Hearers process these automatically
56
•These affect accurate boundary detection in machines
Isolated Word Recognition ASET
Steps:
1. Segmentation (word boundary detection)
2. Word identification
Training Process
•Collect speech corpus (dialogues, lectures, phone calls)
•Clean & mark boundaries
•Annotate:
• Word boundaries
• Parts of speech
• Stress/emphasis
•Example: “I went to the bank” → POS tags: pronoun, verb, preposition,
determiner, noun
57
Isolated Word Recognition ASET
Probability Estimation
•Use annotated data to estimate P(W | S)
•Match input signal to possible words
•Choose word with highest probability
Example
•Speech signal: “DOG” (Da – Aa – Ga)
•Correct output: DOG
•But may also output: DAG, DEG (errors possible)
•Highlights difficulty in exact matching
58
ASET
Questions
59
Question 1 ASET
You are given a simple email classification task using the Naive Bayes algorithm.
The conditional probabilities for each word given the class are:
Use the Naive Bayes classifier to compute the posterior probabilities (unnormalized) for each
class and determine the most likely class label for the email.
Assume the Naive Bayes bag-of-words model with conditional independence between words.
60
Solution 1 ASET
Solution
We compute the unnormalized posterior probability for each class:
61
….. Solution 1 ASET
Comparison:
- Spam: 0.00021
- Ham: 0.0000004
Conclusion:
Since 0.00021 > 0.0000004, the Naive Bayes classifier predicts the email as:
Spam
62
Question 2: ASET
Argmax-Based Computation in NLP
You receive the following email:
"Win a free lottery ticket now!"
You are given the following probabilities from a Naive Bayes classifier:
Compute the class (Spam or Ham) with the highest probability using the formula:
P(Class | Email) ∝ P(Word1 | Class) × P(Word2 | Class) × ... × P(Class)
63
Solution 2 ASET
Solution:
Spam Score = 0.07 × 0.10 × 0.05 × 0.4 = 0.00014
Ham Score = 0.01 × 0.02 × 0.005 × 0.6 = 0.0000006
64
Question 3:Spell Checking ASET
"hte"
• "the"
• "hat"
• "hate"
P("the") = 0.08
P("hat") = 0.005
P("hate") = 0.003
Solution
66
Question 4 ASET
Label all parts of speech clearly and show the hierarchical syntactic
structure.
67
Solution ASET
(S
(NP (DT The) (JJ quick) (JJ brown) (NN
fox))
(VP (VBZ jumps)
(PP (IN over)
(NP (DT the) (JJ lazy) (NN dog))))
)
68
Question 5: ASET
69
Solution ASET
Note
The cat sat on the mat.” . — This is the part-of-speech (POS) tag used
for any sentence-terminating punctuation
The sentence is parsed as: mark (period, question mark, exclamation
mark) in the Penn Treebank tag set.
(S The second . — Inside the parentheses, this
is the actual period character from the
(NP (DT The) (NN cat)) sentence "The cat sat on the mat."So (. .)
(VP (VBD sat) means “a period token, tagged as
punctuation”.
(PP (IN on)
(NP (DT the) (NN mat))))
(. .))
Or in bracketed linear form:
(S (NP (DT The) (NN cat)) (VP (VBD sat) (PP (IN on) (NP (DT the) (NN mat))))
(. .))
70
ASET
Thank You…!!
71