0% found this document useful (0 votes)
10 views71 pages

Module I Introduction Part 2

Uploaded by

s98230358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views71 pages

Module I Introduction Part 2

Uploaded by

s98230358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

ASET

ASET
M-Tech, III
Module I Introduction
Dr. Sweta Srivastava

1
Module I : Introduction ASET

Two Approaches to
Natural Language
Processing
• Classical (Rule-Based) NLP
• Statistical (Data-Driven) NLP

2
Classical NLP – An ASET

Overview
•Human-authored rules
•Inspired by linguistic theories
•Modular architecture:

Phonetics → Phonology → Morphology → Syntax →


Semantics → Pragmatics → Discourse

Classical NLP is structured in layers, and rules for processing are defined explicitly
by human experts.
Each stage processes a specific aspect of language, using grammar and domain
knowledge.
These rules are often written using the Backus-Naur Form (BNF) or context-free
grammar (CFG), which is a formal way to describe the syntax of a language.
3
Example of Classical NLP – ASET

Parsing Noun Phrases


•NP → N ("boy")
•NP → Adj + N ("little boy")
•NP → N + PP ("boy with toys")
•PP → P + NP ("with toys")

The importance of these rules is twofold:


•Linguistically, they model how natural language structures are built.
•Computationally, they enable machines to parse sentences into tree structures
(parse trees), which are necessary for understanding sentence meaning,
identifying relationships between entities, and performing downstream tasks like
translation or question answering.
4
The Rise of Statistical NLP ASET

Emerged with the Advent of the Web

Explosion of digital content


• The internet enabled the creation and sharing of vast amounts of text in
machine-readable formats.
• Websites, blogs, forums, and online publications provided rich linguistic data.
Shift from rule-based to data-driven methods
• Earlier NLP systems relied on handcrafted rules and lexicons.
• Statistical NLP replaced manual encoding with automated learning from real-
world text.
Empirical foundation for language analysis
• Researchers could now observe actual language usage across domains and
cultures.
• Language models became more grounded in natural usage patterns.

5
The Rise of Statistical NLP ASET

Applies Machine Learning to Uncover Language Patterns

Adoption of statistical and probabilistic models


• Techniques like Hidden Markov Models (HMMs), Maximum Entropy models, and
Naive Bayes became standard.
• These models predict linguistic structures (e.g., tags, chunks, parse trees) based
on probabilities.

Learning from annotated examples


• Supervised learning uses labeled datasets to teach models how language
behaves.
• Annotation efforts (e.g., POS tagging, named entity recognition) became
foundational to model training.

Foundation for modern deep learning NLP


• Statistical NLP paved the way for neural network approaches, including word
embeddings and transformer architectures.
• Current models like BERT and GPT are descendants of this statistical paradigm,
scaled up with deep learning. 6
The Rise of Statistical NLP ASET

Leverages Large Corpora (Machine-Readable Text)

Availability of large-scale text datasets


• Examples include the Penn Treebank, British National Corpus, and Common
Crawl.
• These corpora span diverse genres, registers, and linguistic styles. Example:
News archives, Wikipedia, annotated linguistic datasets
Data-driven discovery of patterns
• Statistical methods uncover regularities in syntax, semantics, and discourse.
• Frequency analysis, word co-occurrence, and collocation became central
techniques.

Supports training and evaluation of models


• Large corpora enable supervised and unsupervised learning.
• Standard benchmarks allow consistent comparison across models and tasks.
7
NLP Stages and the Two ASET

Approaches
Classical Approach: Statistical Approach:

Rule-Based Data-Driven

• Handcrafted linguistic rules and • Uses machine learning to infer


dictionaries. patterns from large text corpora.
• Deterministic systems, often • Probabilistic models and
language-specific. algorithms learn from examples.
• Strength: High precision in well- • Strength: Scalable, adaptable,
defined domains. and effective for noisy or
• Limitation: Difficult to scale and ambiguous input.
adapt to linguistic variability. • Limitation: Requires large
annotated datasets; may lack
interpretability.

8
Ambiguity in Language ASET

“Visiting aunts can be a nuisance.”

Types of Ambiguity: •Types of Ambiguity:


•Part-of-Speech (POS) Ambiguity
• “Visiting” can be: •Semantic Role Ambiguity
• A gerund (noun): "The • “Aunts” can be:
act of visiting aunts is a • The agent (the ones doing
nuisance." the visiting)
• An adjective: "Aunts • The object (the ones being
who are visiting can be a visited)
nuisance."

•Grammatical ambiguity – Does “visiting” function as a verb or an


adjective?
•Semantic ambiguity – Are the aunts the ones doing the visiting, or are
they being visited?
9
Ambiguity as Classification ASET

Ambiguity Resolution = Classification Task


•Classification in NLP = Assigning labels based on context:
• POS tags: “visiting” → adjective or gerund(verb functioning as noun)?
• Semantic roles: “aunts” → agent or object?

Feature-Based Disambiguation
Content:
•Features = clues used for classification
•Examples:
• Word position relative to the verb •Example:
• Case markers in free word order languages • English: “beat” =
• Morphological suffixes noun or verb?
• Hindi: “haraya”
Features are central to statistical NLP. For instance, English (verb + past tense
relies on word order, while Hindi uses case markers like “ne” suffix “aaya”)
and “ko” to resolve roles. removes ambiguity
English: Fixed word order (“France beat Brazil”) In Indian languages, verb
Hindi: “France ne Brazil ko haraya” = “Brazil ko France ne forms often include rich
haraya” morphological cues that
Case markers encode roles, not word order reduce ambiguity.
10
ASET

Sequence Labelling and Noisy


Channel

11
NLP as a Sequence ASET

Labeling Task
Words:
Part-of-Speech (POS) Tagging
Assign grammatical categories to each word.
Example:
• The cat sat on the mat.
→ The/DET cat/NOUN sat/VERB on/ADP the/DET mat/NOUN

Named Entity Recognition (NER)


Identify entities like persons, organizations, dates, etc.
Example:
• Barack Obama was born in Hawaii.
→ Barack Obama/PER was/O born/O in/O Hawaii/LOC
"Pooja ne Pooja ke liye phool kharida"
• 1st “Pooja” → Person
• 2nd “Pooja” → Act of Worship

Word Sense Disambiguation (WSD)

Determine which sense of a word is used in context.


Example:
• He went to the bank to fish. → "bank" = riverbank
• He went to the bank to deposit money. → "bank" = financial institution
12
NLP as a Sequence ASET

Labeling Task
Phrases:
Chunking (Shallow Parsing)
• Group words into syntactically correlated units.
Example:
• The quick brown fox → [NP The quick brown fox]
• jumps over the lazy dog → [VP jumps] [PP over] [NP the lazy dog]

13
NLP as a Sequence ASET

Labeling Task
Sentences:
Parsing (Syntactic Parsing)
• Generate full syntactic structure of a sentence.
Example (Constituency Parse):
• (The dog chased the cat)
(S (NP The dog) (VP chased (NP the cat)))

14
NLP as a Sequence ASET

Labeling Task
Paragraphs:
Co-reference Resolution
• Identify when different expressions refer to the same entity.
Example:
• John went to the store. He bought some milk.
→ "He" refers to "John"

15
Importance of NER (Named ASET

Entity Recognition)
Machine Translation (MT)
Preserves the correct translation of names and entities across languages.

• English: Apple announced new products in California.


• Incorrect MT: Fruit announced new products...
• Correct MT: Apple (company) is recognized and preserved as a named entity.

Information Retrieval (IR)


Improves relevance and accuracy by recognizing named entities in queries.

Query: Articles about Tesla


• Without NER: May retrieve documents about Nikola Tesla (person)
• With NER: Correctly focuses on Tesla (company)

Summarization
Ensures that key entities remain in the summary.

Original: Barack Obama met Angela Merkel in Berlin.


• Summary with NER: Obama and Merkel met in Berlin.
• Without NER: Might omit or misrepresent subjects entirely.
16
Importance of NER (Named ASET

Entity Recognition)
Question Answering (QA)
Identifies potential answers by locating named entities in documents.

• Question: Where was Einstein born?


• NER highlights: Ulm (LOC), enabling the system to extract a precise answer.

Information Extraction (IE)


Automatically extracts structured data (names, locations, dates) from unstructured text.

Text: Dr. Smith will attend the conference in Tokyo on August 12.
• Extracted: Person: Dr. Smith, Location: Tokyo, Date: August 12

17
Part-of-Speech (POS) ASET

Tagging
•Assigns grammatical categories to each word in a sentence.
•Tags are drawn from a predefined set.
•POS tagging helps in understanding syntactic roles and structure.

Tag Meaning Example


JJ Adjective new, red, tall
NN Common noun, singular student, tree
NNS Common noun, plural students, dogs
VB Verb, base form run, go, come
VBG Verb, gerund running, eating
VBZ Verb, 3rd person sing. is, runs
NNP Proper noun, singular UJF, September
Preposition/Subordinating
IN in, with, because
conjunction
DT Determiner the, a, an 18
POS- Example ASET

“During the conference, Dr. Singh presented groundbreaking research on


neural networks.”

POS Tagged Table:


Word POS Tag Tag Meaning
During IN Preposition
the DT Determiner

conference NN Common noun (singular)

, , Punctuation (comma)
Dr. NNP Proper noun (title)

Singh NNP Proper noun (person name)

presented VBD Verb (past tense)


groundbreaking JJ Adjective
research NN Common noun
on IN Preposition
neural JJ Adjective
networks NNS Common noun (plural)
19
. . Punctuation (period)
Name Identification vs. ASET

Named Entity Recognition


(NER)
NER (Named Entity
Aspect Name Identification
Recognition)
Detects if a word is a Identifies what kind of
What it does
name name it is
Labeled types → Person,
Output Binary → Name or Not
Org, Date
Level of understanding Surface-level Semantic-level
Filtering, capitalization Knowledge extraction,
Uses
checks QA, translation, etc.
Organization or
Example - "Amazon" Yes, it’s a name Location, depending on
context

20
Chunking vs Full Parsing for ASET

Sentiment Phrases
Evaluate which method better captures adjective–noun sentiment phrases (e.g.,
“amazing food”, “terrible service”) in user reviews.

These phrases are central to understanding opinions in texts like product,


restaurant, or service reviews.

Chunking (Shallow Parsing)


Groups tokens into base-level syntactic phrases (e.g., NP = Noun Phrase, VP =
Verb Phrase) without showing their internal hierarchical structure.
[NP amazing food] [VP was] [ADJP really good]

Strengths: Limitations:
•Fast and lightweight Cannot model nested or long-distance
•Directly captures flat phrases like [JJ dependenciesMay miss sentiment
NN] when structure is more complexe.g.,
•Works well for extracting local “The food, though overpriced, was
sentiment spans surprisingly amazing.”
21
Chunking vs Full Parsing for ASET

Sentiment Phrases
Evaluate which method better captures adjective–noun sentiment phrases (e.g.,
“amazing food”, “terrible service”) in user reviews.

These phrases are central to understanding opinions in texts like product,


restaurant, or service reviews.

Full Parsing (Constituency or Dependency):


Builds a complete syntactic tree, showing hierarchical relationships between all
elements in the sentence.

Constituency Parse: (NP (JJ amazing) (NN food))


Dependency Parse Example: amod(food, amazing) //adverbial modifier amod

Strengths: Limitations:
• Handles complex structures, long
dependencies •Slower, more computationally
• Robust for syntactic-based opinion mining expensive
• Can extract non-adjacent sentiment elements •May overcomplicate simple
(e.g., “food that looked amazing”) extraction tasks
22
Parsing (Sentence ASET

Labeling)
Parsing is the task of analyzing the syntactic structure of a sentence and
representing it as a hierarchical tree of its grammatical components.

•Parsing creates a syntax tree from a sentence


•It labels syntactic roles (like noun phrase, verb phrase) at the sentence level
•It enables machines to understand relationships between words and phrases
•Crucial for machine translation, question answering, and semantic analysis

“The UJF campus is beautiful.”


Bracketed Structure: Tree View (Indented Structure) Legend:
S •S = Sentence
├── NP •NP = Noun Phrase
•VP = Verb Phrase
│ ├── DT → The
(S │ ├── JJ → UJF
•ADJP = Adjective Phrase
(NP (DT The) (JJ UJF) (NN •DT = Determiner
│ └── NN → campus •JJ = Adjective
campus)) ├── VP •NN = Noun
(VP (VBZ is) │ ├── VBZ → is •VBZ = Verb (3rd person
(ADJP (JJ beautiful))) │ └── ADJP singular)
) │ └── JJ → beautiful 23
Parsing (Sentence ASET

Labeling)
Feature Benefit
Recognizes sentence units Subject, predicate, objects
Captures nested phrases Handles complex grammar
Provides structure For reasoning, translation, QA
Enables deeper semantics Beyond POS or NER

24
Ambiguity in Named Entity ASET

Detection
Explore how NLP models handle ambiguous named entities in real-world
sentences — particularly when the same word can refer to multiple entity types
depending on context.

Amazon delivers Amazon to Amazon

This sentence contains three instances of the word “Amazon”, each referring to a
different entity type:

Word Correct Entity Type Explanation


Amazon (1st) ORGANIZATION The company (subject)
PRODUCT or
Amazon (2nd) ORGANIZATION A package/item (object)
(debated)
A geographic region
Amazon (3rd) LOCATION
(destination)
25
Ambiguity in Named Entity ASET

Detection
import spacy Output:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Amazon delivers Amazon to Amazon.") Amazon ORG
for ent in doc.ents: Amazon ORG
print(ent.text, ent.label_) Amazon ORG

spaCy assigns the same label ("ORG") to all instances due to surface similarity
and lack of deeper context modeling.
Token Manually Annotated Label
Amazon (1st) ORG (company)
Amazon (2nd) PRODUCT or ORG (ambiguous)
Amazon (3rd) GPE or LOC (location)

• Ambiguity arises when a single token refers to multiple concepts depending


on sentence role (subject, object, location).
• Standard NER models struggle with coreference and contextual
disambiguation.
• Manual annotation reveals the semantic richness that NER pipelines often
26
miss.
ASET

Noisy Channel Model &


Argmax-Based Computation

27
The Noisy Channel Model: ASET

Core Idea
•Rooted in Information Theory: Originally developed by Claude Shannon in the
1940s and later applied to speech recognition and signal processing in the 1960s.

•Metaphor: Think of communication as sending a message through a “noisy”


channel—like talking over a crackly phone line. The original message gets distorted,
and the receiver must reconstruct the most likely original message from the garbled
version they hear.

28
Noisy Channel Model in ASET

NLP
We observe a distorted output t (e.g., a misspelled word, noisy sentence, or
incorrect tagging), and want to infer the most probable original sequence W (e.g.,
the correct sentence or word).

(wₙ, wₙ₋₁, ..., w₁) ──[Noisy Channel]──> (tₘ, tₘ₋₁, ..., t₁)

The goal is to find the most likely W* given the observed t

Noisy transformation
Correct sequence
Guess at the correct sequence

29
Understanding argmax ASET

To apply the noisy channel model, we use argmax computation:

•max(f(x)) → Gives the maximum value of a function.


•argmax(f(x)) → Returns the value of x for which f(x) is maximum.
•Example:
If f(x)=−x2+4x, then max(f(x)) = 4 (the highest output value of the function).

•argmax(f(x)):
This gives you the value of x for which the function f(x) is maximized. 30
argmax(f(x)) = 2 because the function reaches its peak value (4) when x = 2.
Bayes' Theorem ASET

Bayes’ Theorem tells us how to update our belief about a hypothesis (A)
after seeing some evidence (B). It reverses the direction of conditional
probability. You might know how likely A is if B happens (likelihood)

In spelling correction, translation, or speech recognition, we observe noisy data


(e.g., a misspelling, a foreign word, an unclear sound).

Where:
•W: Original sentence or word (what we want to recover)
•t: Observed input (possibly distorted)
•P(W): How likely is the original word? (Language model)
•P(t∣W): How likely is this word to get distorted into what we saw? (Error model)
31
Bayes' Theorem ASET

P (A⋂B) = P (B⋂A)
P (A) P (B\A) = P (B) P (A\B)
=> P(A\B) =P(A) P (B\A)
P (B)

Bayesian Decision Theory in NLP:

Choose the value (e.g., label, tag, word) whose posterior probability is highest
given the observed data.This forms the mathematical basis for sequence
labeling tasks, which are abundant in NLP

32
Example ASET

An example: it is known that in a population, 1 in 50000 has meningitis and 1 in 20


has stiff neck. It is also observed that 50% of the meningitis patients have stiff
neck.A doctor observes that a patient has stiff neck. What is the probability that the
patient has meningitis?(Mitchel, Machine Learning, 1997)

We need to find
P(m|s): probability of meningitis given the stiff
neck

P(m∣s)<<P(∼m∣s)

Less likely to have meningitis 33


Bayesian Decision ASET

Theory Principle
Bayesian Decision Theory Principle
"Decide in favour of that value of a random variable which is the highest among
other values of the variable, probabilistically“

choose that value as the decision whose probability is the highest.

34
Some Issues ASET

p(m/s) could have been found as

#(m∩s)
#s

Questions:
•Which is more reliable to compute, p(s/m) or p(m/s)?
•Which evidence is more sparse, p(s/m) or p(m/s)?
•Test of significance: The counts are always on a sample of population.
35
•Which probability count has sufficient statistics?
ASET

Why p(s∣m) is Often Preferred Note: When applying


statistical or machine
•Meningitis is rare (about 1 in 50,000 people). learning techniques (e.g.,
•To compute p(s∣m) we: noisy channel models,
• Start with the small, well-defined group of meningitis argmax computations),
patients. choose the probability
• Measure how many of them have stiff neck. direction that is more
•This requires fewer observations and produces a more reliable and easier to
reliable estimate. compute.
•Higher confidence comes
Why p(m∣s) is Harder to Estimate Directly from probabilities with
•Stiff neck is more common and often subjective. larger, more reliable
•To compute p(m∣s) directly, we: counts.
• Start with a large group of people with stiff neck. •Always assess test of
• Identify which ones have meningitis. significance before
•This requires much larger samples and is less reliable. generalizing sample results
to the whole population.
Practical Note (Medical Context)
•Doctors and medical professionals tend to rely more on p(s∣m) because:
• Meningitis diagnosis is definitive (lab tests).
• Stiff neck can be subjective or caused by many other conditions.
•Bayesian theorem lets us compute p(m∣s) from p(s∣m) and the prior p(m) 36
Application in NLP ASET

• Despite surface differences, many NLP problems reduce to sequence


labelling:
• Examples:
– Part-of-Speech (PoS) Tagging
– Statistical Spell Checking
– Automatic Speech Recognition
– Probabilistic Parsing
– Machine Translation

37
Argmax-Based Computation ASET

in Statistical NLP
Why the
Denominator P(B)
Disappears
•P(B) is constant for
all A
•Does not affect the
argmax decision
•Can be ignored in
the computation
Key Steps in Computing P(A)) and P(B∣A)
1. Look at the internal structures of A and B
1. In NLP, A and B are often long sequences (sentences, word sequences).
2. Break them into smaller components (e.g., words, n-grams).
2. Make independence assumptions
1. Full joint probability of sequences is hard to compute.
2. .Assume certain variables are independent to simplify estimation.
3. Assemble computation from smaller parts
1. Estimate probabilities for small components.
2. Combine them to approximate P(A)) and P(B∣A)
38
PoS Tagging: Example ASET

The national committee remarked on a number of other issues.

Tagged Output:
The/DET national/ADJ committee/NOU remarked/VRB on/PRP a/DET
number/NOU of/PRP other/ADJ issues/NOU
•The → Determiner → DET
•PoS tags are assigned based on word •national → Adjective (qualifies
properties, function in the sentence, and “committee”) → ADJ
relationships to other words. •committee → Noun → NOU
•remarked → Verb → VRB
•on → Preposition → PRP
•Some words can function in multiple roles (e.g., •a → Determiner → DET
adjectives functioning as nouns), requiring •number → Noun → NOU
disambiguation. Eg: committee/NOU •of → Preposition → PRP
•other → Adjective → ADJ
•issues → Noun → NOU
•Accurate PoS tagging is essential for
downstream NLP tasks.

39
Part-of-Speech (PoS) Tagging ASET

– Statistical Formulation
From Example to General Model Trigram Tagger
•Sentence → sequence of words: w=(w1,w2,...,wn) •Assumes tag of a word
depends only on the
•Goal: assign the best sequence of tags t=(t1,t2,...,tn) previous two tags.
•Tags determined by: •Known as a trigram
• Lexical properties of the words. tagger.
• Context (neighbouring tags and words).
HMM + Viterbi
Decoding
•Hidden Markov Model
(HMM) provides the
probabilistic
framework.
•Viterbi algorithm finds
the most probable tag
sequence efficiently.

40
PoS Tagging: Example ASET

41
ASET

42
Statistical Spell Checking — ASET

Argmax Formulation
•P(W): Prior probability —
how likely a word is in the
language.
•Acts as a filter: Rare or
invalid words get low
probability.
•P(T|W): Likelihood — how
likely it is to misspell W as
T.

•Probabilities give scores → induce a ranking over


candidates.
•Pick highest-ranked candidate as the correction.

43
Statistical Spell Checking —
ASET

Argmax Formulation

44
Confusion Matrices in ASET

Statistical Spell Checking


We assume only a single error per misspelt word. The error can be of
one of four types:
Error Type Definition Example Notation

Number of times
Substitution APPLE → AOPLE
letter x is replaced sub(P, O)
(sub(x,y)) (P replaced by O)
by letter y.

Number of times
FIGHT → FIGBHT
Insertion (ins(x,y)) letter y is inserted ins(G, B)
(B inserted after G)
after letter x.

Number of times
letter y is deleted APPLE → APPE (L
Deletion (del(x,y)) del(P, L)
when preceded by deleted after P)
letter x.
Number of times
Transposition APPLE → APLPE
letters x and y are trans(P, L)
(trans(x,y)) (P and L swapped) 45
swapped.
Confusion Matrices in ASET

Statistical Spell Checking

Where
S= Substitution,
I= Insertion
T= Transposition
X= Deletion 46
Confusion Matrices in ASET

Statistical Spell Checking


Keyboard Neighbour Influence
Some errors (especially insertions and substitutions) occur more often when letters
are adjacent on a QWERTY keyboard.
Example: G is near F, H, T, Y, V, B — so typing G might accidentally produce GB,
GF, or GY.

47
Example ASET

• A typist is prone to the following substitution errors (counts taken from


past data):
Intended (x) Typed as (y) Count
t r 4
t y 3
h g 2
e r 1
e w 2

You are given a misspelt word: thr

•The intended word is in the set { the, thy, thr }.


•Only one substitution error can happen per word.

Using the substitution counts as probabilities find the most probable


intended word.
48
Example ASET

49
…example ASET

• Step 1 — Likelihoods P(T∣W)

50
…example ASET

• Step 2 — Priors P(W)

51
…example ASET

• Step 3 — Compute scores


S(W)=P(W) P(T∣W)

52
…example ASET

• General Discussion

53
ASET

Probabilistic Speech
Recognition

54
Speech recognition ASET

•Speech recognition: Converting spoken signals


to text

Word Boundary Isolated Word


Recognition – Recognition –

detecting where
identifying a
one word ends
single spoken
and the next
word
begins

Importance: Words may sound similar; context helps disambiguate


55
Word Boundary Recognition ASET

•Humans often detect boundaries effortlessly

•Example:
•"I found a key on the road today" → clear boundaries
•But ambiguity exists:
•"I got a plate very quickly"
•Could be “I got up late today”
•Or “I got a plate today”

"The stuffy nose"


•Could be: "The stuffy nose" → blocked nasal passages
•Could be: "The stuff he knows" → the information he knows

•Hindi: "Aa Jaayenge"


•“Aaj aayenge” → will come today
•“Aa jaayenge” → we will come
•Spoken disfluencies complicate boundaries: "umm", "aah", pauses
•Listeners use these cues unconsciously

Disfluency in Speech
•Fillers like “umm”, “aahh” help speakers think
•Hearers process these automatically
56
•These affect accurate boundary detection in machines
Isolated Word Recognition ASET

•Problem: Given speech signal S, find word W


•Goal: argmax over P(W | S)
•Best possible means: most probable word given the signal

Steps:
1. Segmentation (word boundary detection)
2. Word identification

Training Process
•Collect speech corpus (dialogues, lectures, phone calls)
•Clean & mark boundaries
•Annotate:
• Word boundaries
• Parts of speech
• Stress/emphasis
•Example: “I went to the bank” → POS tags: pronoun, verb, preposition,
determiner, noun

57
Isolated Word Recognition ASET

Probability Estimation
•Use annotated data to estimate P(W | S)
•Match input signal to possible words
•Choose word with highest probability

Example
•Speech signal: “DOG” (Da – Aa – Ga)
•Correct output: DOG
•But may also output: DAG, DEG (errors possible)
•Highlights difficulty in exact matching

58
ASET

Questions

59
Question 1 ASET

You are given a simple email classification task using the Naive Bayes algorithm.

You want to classify the email:


"win a lottery now" The label set is:
Y = { spam, ham }
The prior probabilities are:
P(spam) = 0.6
P(ham) = 0.4

The conditional probabilities for each word given the class are:

| Word | P(word | spam) | P(word | ham) |


|-------- -|-------------- -|--------------|
| win | 0.05 | 0.01 |
| lottery | 0.10 | 0.005 |
| now | 0.07 | 0.02 |

Use the Naive Bayes classifier to compute the posterior probabilities (unnormalized) for each
class and determine the most likely class label for the email.

Assume the Naive Bayes bag-of-words model with conditional independence between words.
60
Solution 1 ASET

Solution
We compute the unnormalized posterior probability for each class:

1. For class = spam:

P(spam) * P("win" | spam) * P("lottery" | spam) * P("now" | spam)


= 0.6 * 0.05 * 0.10 * 0.07
= 0.6 * 0.00035
= 0.00021
2. For class = ham:

P(ham) * P("win" | ham) * P("lottery" | ham) * P("now" | ham)


= 0.4 * 0.01 * 0.005 * 0.02
= 0.4 * 0.000001
= 0.0000004

61
….. Solution 1 ASET

Comparison:

- Spam: 0.00021
- Ham: 0.0000004

Conclusion:
Since 0.00021 > 0.0000004, the Naive Bayes classifier predicts the email as:

Spam

62
Question 2: ASET
Argmax-Based Computation in NLP
You receive the following email:
"Win a free lottery ticket now!"
You are given the following probabilities from a Naive Bayes classifier:

Compute the class (Spam or Ham) with the highest probability using the formula:
P(Class | Email) ∝ P(Word1 | Class) × P(Word2 | Class) × ... × P(Class)
63
Solution 2 ASET

Solution:
Spam Score = 0.07 × 0.10 × 0.05 × 0.4 = 0.00014
Ham Score = 0.01 × 0.02 × 0.005 × 0.6 = 0.0000006

=> Argmax class = Spam

64
Question 3:Spell Checking ASET

A user types the word:

"hte"

Possible intended words are:

• "the"

• "hat"

• "hate"

You are given the following probabilities:

P("hte" | "the") = 0.6

P("hte" | "hat") = 0.01

P("hte" | "hate") = 0.005

P("the") = 0.08

P("hat") = 0.005

P("hate") = 0.003

Compute the most likely intended word using the formula: 65


Solution 3 ASET

Solution

P(word | "hte") ∞ P("hte" | word) × P(word)

Score("the") = 0.6 × 0.08 = 0.048


Score("hat") = 0.01 × 0.005 = 0.00005
Score("hate") = 0.005 × 0.003 = 0.000015

=> Argmax word = "the"

66
Question 4 ASET

Parsing Tree Assignment


• Draw the syntactic parsing tree for the following sentence:

"The quick brown fox jumps over the lazy dog."

Label all parts of speech clearly and show the hierarchical syntactic
structure.

67
Solution ASET

Below is the parsing tree (in bracketed


notation) of the sentence:

(S
(NP (DT The) (JJ quick) (JJ brown) (NN
fox))
(VP (VBZ jumps)
(PP (IN over)
(NP (DT the) (JJ lazy) (NN dog))))
)

68
Question 5: ASET

• Parse the following sentence using constituency grammar and represent it


in bracketed form:
• “The cat sat on the mat.”

69
Solution ASET

Note
The cat sat on the mat.” . — This is the part-of-speech (POS) tag used
for any sentence-terminating punctuation
The sentence is parsed as: mark (period, question mark, exclamation
mark) in the Penn Treebank tag set.
(S The second . — Inside the parentheses, this
is the actual period character from the
(NP (DT The) (NN cat)) sentence "The cat sat on the mat."So (. .)
(VP (VBD sat) means “a period token, tagged as
punctuation”.
(PP (IN on)
(NP (DT the) (NN mat))))
(. .))
Or in bracketed linear form:
(S (NP (DT The) (NN cat)) (VP (VBD sat) (PP (IN on) (NP (DT the) (NN mat))))
(. .))
70
ASET

Thank You…!!

71

You might also like