0% found this document useful (0 votes)

10 views71 pages

Module I Introduction Part 2

Uploaded by

s98230358

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views71 pages

Module I Introduction Part 2

Uploaded by

s98230358

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

ASET

ASET
M-Tech, III
Module I Introduction
Dr. Sweta Srivastava

1
Module I : Introduction ASET

Two Approaches to
Natural Language
Processing
• Classical (Rule-Based) NLP
• Statistical (Data-Driven) NLP

2
Classical NLP – An ASET

Overview
•Human-authored rules
•Inspired by linguistic theories
•Modular architecture:

Phonetics → Phonology → Morphology → Syntax →

Semantics → Pragmatics → Discourse

Classical NLP is structured in layers, and rules for processing are defined explicitly
by human experts.
Each stage processes a specific aspect of language, using grammar and domain
knowledge.
These rules are often written using the Backus-Naur Form (BNF) or context-free
grammar (CFG), which is a formal way to describe the syntax of a language.
3
Example of Classical NLP – ASET

Parsing Noun Phrases

•NP → N ("boy")
•NP → Adj + N ("little boy")
•NP → N + PP ("boy with toys")
•PP → P + NP ("with toys")

The importance of these rules is twofold:

•Linguistically, they model how natural language structures are built.
•Computationally, they enable machines to parse sentences into tree structures
(parse trees), which are necessary for understanding sentence meaning,
identifying relationships between entities, and performing downstream tasks like
translation or question answering.
4
The Rise of Statistical NLP ASET

Emerged with the Advent of the Web

Explosion of digital content

• The internet enabled the creation and sharing of vast amounts of text in
machine-readable formats.
• Websites, blogs, forums, and online publications provided rich linguistic data.
Shift from rule-based to data-driven methods
• Earlier NLP systems relied on handcrafted rules and lexicons.
• Statistical NLP replaced manual encoding with automated learning from real-
world text.
Empirical foundation for language analysis
• Researchers could now observe actual language usage across domains and
cultures.
• Language models became more grounded in natural usage patterns.

5
The Rise of Statistical NLP ASET

Applies Machine Learning to Uncover Language Patterns

Adoption of statistical and probabilistic models

• Techniques like Hidden Markov Models (HMMs), Maximum Entropy models, and
Naive Bayes became standard.
• These models predict linguistic structures (e.g., tags, chunks, parse trees) based
on probabilities.

Learning from annotated examples

• Supervised learning uses labeled datasets to teach models how language
behaves.
• Annotation efforts (e.g., POS tagging, named entity recognition) became
foundational to model training.

Foundation for modern deep learning NLP

• Statistical NLP paved the way for neural network approaches, including word
embeddings and transformer architectures.
• Current models like BERT and GPT are descendants of this statistical paradigm,
scaled up with deep learning. 6
The Rise of Statistical NLP ASET

Leverages Large Corpora (Machine-Readable Text)

Availability of large-scale text datasets

• Examples include the Penn Treebank, British National Corpus, and Common
Crawl.
• These corpora span diverse genres, registers, and linguistic styles. Example:
News archives, Wikipedia, annotated linguistic datasets
Data-driven discovery of patterns
• Statistical methods uncover regularities in syntax, semantics, and discourse.
• Frequency analysis, word co-occurrence, and collocation became central
techniques.

Supports training and evaluation of models

• Large corpora enable supervised and unsupervised learning.
• Standard benchmarks allow consistent comparison across models and tasks.
7
NLP Stages and the Two ASET

Approaches
Classical Approach: Statistical Approach:

Rule-Based Data-Driven

• Handcrafted linguistic rules and • Uses machine learning to infer

dictionaries. patterns from large text corpora.
• Deterministic systems, often • Probabilistic models and
language-specific. algorithms learn from examples.
• Strength: High precision in well- • Strength: Scalable, adaptable,
defined domains. and effective for noisy or
• Limitation: Difficult to scale and ambiguous input.
adapt to linguistic variability. • Limitation: Requires large
annotated datasets; may lack
interpretability.

8
Ambiguity in Language ASET

“Visiting aunts can be a nuisance.”

Types of Ambiguity: •Types of Ambiguity:

•Part-of-Speech (POS) Ambiguity
• “Visiting” can be: •Semantic Role Ambiguity
• A gerund (noun): "The • “Aunts” can be:
act of visiting aunts is a • The agent (the ones doing
nuisance." the visiting)
• An adjective: "Aunts • The object (the ones being
who are visiting can be a visited)
nuisance."

•Grammatical ambiguity – Does “visiting” function as a verb or an

adjective?
•Semantic ambiguity – Are the aunts the ones doing the visiting, or are
they being visited?
9
Ambiguity as Classification ASET

Ambiguity Resolution = Classification Task

•Classification in NLP = Assigning labels based on context:
• POS tags: “visiting” → adjective or gerund(verb functioning as noun)?
• Semantic roles: “aunts” → agent or object?

Feature-Based Disambiguation
Content:
•Features = clues used for classification
•Examples:
• Word position relative to the verb •Example:
• Case markers in free word order languages • English: “beat” =
• Morphological suffixes noun or verb?
• Hindi: “haraya”
Features are central to statistical NLP. For instance, English (verb + past tense
relies on word order, while Hindi uses case markers like “ne” suffix “aaya”)
and “ko” to resolve roles. removes ambiguity
English: Fixed word order (“France beat Brazil”) In Indian languages, verb
Hindi: “France ne Brazil ko haraya” = “Brazil ko France ne forms often include rich
haraya” morphological cues that
Case markers encode roles, not word order reduce ambiguity.
10
ASET

Sequence Labelling and Noisy

Channel

11
NLP as a Sequence ASET

Labeling Task
Words:
Part-of-Speech (POS) Tagging
Assign grammatical categories to each word.
Example:
• The cat sat on the mat.
→ The/DET cat/NOUN sat/VERB on/ADP the/DET mat/NOUN

Named Entity Recognition (NER)

Identify entities like persons, organizations, dates, etc.
Example:
• Barack Obama was born in Hawaii.
→ Barack Obama/PER was/O born/O in/O Hawaii/LOC
"Pooja ne Pooja ke liye phool kharida"
• 1st “Pooja” → Person
• 2nd “Pooja” → Act of Worship

Word Sense Disambiguation (WSD)

Determine which sense of a word is used in context.

Example:
• He went to the bank to fish. → "bank" = riverbank
• He went to the bank to deposit money. → "bank" = financial institution
12
NLP as a Sequence ASET

Labeling Task
Phrases:
Chunking (Shallow Parsing)
• Group words into syntactically correlated units.
Example:
• The quick brown fox → [NP The quick brown fox]
• jumps over the lazy dog → [VP jumps] [PP over] [NP the lazy dog]

13
NLP as a Sequence ASET

Labeling Task
Sentences:
Parsing (Syntactic Parsing)
• Generate full syntactic structure of a sentence.
Example (Constituency Parse):
• (The dog chased the cat)
(S (NP The dog) (VP chased (NP the cat)))

14
NLP as a Sequence ASET

Labeling Task
Paragraphs:
Co-reference Resolution
• Identify when different expressions refer to the same entity.
Example:
• John went to the store. He bought some milk.
→ "He" refers to "John"

15
Importance of NER (Named ASET

Entity Recognition)
Machine Translation (MT)
Preserves the correct translation of names and entities across languages.

• English: Apple announced new products in California.

• Incorrect MT: Fruit announced new products...
• Correct MT: Apple (company) is recognized and preserved as a named entity.

Information Retrieval (IR)

Improves relevance and accuracy by recognizing named entities in queries.

Query: Articles about Tesla

• Without NER: May retrieve documents about Nikola Tesla (person)
• With NER: Correctly focuses on Tesla (company)

Summarization
Ensures that key entities remain in the summary.

Original: Barack Obama met Angela Merkel in Berlin.

• Summary with NER: Obama and Merkel met in Berlin.
• Without NER: Might omit or misrepresent subjects entirely.
16
Importance of NER (Named ASET

Entity Recognition)
Question Answering (QA)
Identifies potential answers by locating named entities in documents.

• Question: Where was Einstein born?

• NER highlights: Ulm (LOC), enabling the system to extract a precise answer.

Information Extraction (IE)

Automatically extracts structured data (names, locations, dates) from unstructured text.

Text: Dr. Smith will attend the conference in Tokyo on August 12.
• Extracted: Person: Dr. Smith, Location: Tokyo, Date: August 12

17
Part-of-Speech (POS) ASET

Tagging
•Assigns grammatical categories to each word in a sentence.
•Tags are drawn from a predefined set.
•POS tagging helps in understanding syntactic roles and structure.

“During the conference, Dr. Singh presented groundbreaking research on

neural networks.”

POS Tagged Table:

Word POS Tag Tag Meaning
During IN Preposition
the DT Determiner

conference NN Common noun (singular)

, , Punctuation (comma)
Dr. NNP Proper noun (title)

Singh NNP Proper noun (person name)

presented VBD Verb (past tense)

groundbreaking JJ Adjective
research NN Common noun
on IN Preposition
neural JJ Adjective
networks NNS Common noun (plural)
19
. . Punctuation (period)
Name Identification vs. ASET

Named Entity Recognition

(NER)
NER (Named Entity
Aspect Name Identification
Recognition)
Detects if a word is a Identifies what kind of
What it does
name name it is
Labeled types → Person,
Output Binary → Name or Not
Org, Date
Level of understanding Surface-level Semantic-level
Filtering, capitalization Knowledge extraction,
Uses
checks QA, translation, etc.
Organization or
Example - "Amazon" Yes, it’s a name Location, depending on
context

20
Chunking vs Full Parsing for ASET

Sentiment Phrases
Evaluate which method better captures adjective–noun sentiment phrases (e.g.,
“amazing food”, “terrible service”) in user reviews.

These phrases are central to understanding opinions in texts like product,

restaurant, or service reviews.

Chunking (Shallow Parsing)

Groups tokens into base-level syntactic phrases (e.g., NP = Noun Phrase, VP =
Verb Phrase) without showing their internal hierarchical structure.
[NP amazing food] [VP was] [ADJP really good]

Strengths: Limitations:
•Fast and lightweight Cannot model nested or long-distance
•Directly captures flat phrases like [JJ dependenciesMay miss sentiment
NN] when structure is more complexe.g.,
•Works well for extracting local “The food, though overpriced, was
sentiment spans surprisingly amazing.”
21
Chunking vs Full Parsing for ASET

Sentiment Phrases
Evaluate which method better captures adjective–noun sentiment phrases (e.g.,
“amazing food”, “terrible service”) in user reviews.

These phrases are central to understanding opinions in texts like product,

restaurant, or service reviews.

Full Parsing (Constituency or Dependency):

Builds a complete syntactic tree, showing hierarchical relationships between all
elements in the sentence.

Constituency Parse: (NP (JJ amazing) (NN food))

Dependency Parse Example: amod(food, amazing) //adverbial modifier amod

Strengths: Limitations:
• Handles complex structures, long
dependencies •Slower, more computationally
• Robust for syntactic-based opinion mining expensive
• Can extract non-adjacent sentiment elements •May overcomplicate simple
(e.g., “food that looked amazing”) extraction tasks
22
Parsing (Sentence ASET

Labeling)
Parsing is the task of analyzing the syntactic structure of a sentence and
representing it as a hierarchical tree of its grammatical components.

•Parsing creates a syntax tree from a sentence

•It labels syntactic roles (like noun phrase, verb phrase) at the sentence level
•It enables machines to understand relationships between words and phrases
•Crucial for machine translation, question answering, and semantic analysis

“The UJF campus is beautiful.”

Bracketed Structure: Tree View (Indented Structure) Legend:
S •S = Sentence
├── NP •NP = Noun Phrase
•VP = Verb Phrase
│ ├── DT → The
(S │ ├── JJ → UJF
•ADJP = Adjective Phrase
(NP (DT The) (JJ UJF) (NN •DT = Determiner
│ └── NN → campus •JJ = Adjective
campus)) ├── VP •NN = Noun
(VP (VBZ is) │ ├── VBZ → is •VBZ = Verb (3rd person
(ADJP (JJ beautiful))) │ └── ADJP singular)
) │ └── JJ → beautiful 23
Parsing (Sentence ASET

Labeling)
Feature Benefit
Recognizes sentence units Subject, predicate, objects
Captures nested phrases Handles complex grammar
Provides structure For reasoning, translation, QA
Enables deeper semantics Beyond POS or NER

24
Ambiguity in Named Entity ASET

Detection
Explore how NLP models handle ambiguous named entities in real-world
sentences — particularly when the same word can refer to multiple entity types
depending on context.

Amazon delivers Amazon to Amazon

This sentence contains three instances of the word “Amazon”, each referring to a
different entity type:

Word Correct Entity Type Explanation

Amazon (1st) ORGANIZATION The company (subject)
PRODUCT or
Amazon (2nd) ORGANIZATION A package/item (object)
(debated)
A geographic region
Amazon (3rd) LOCATION
(destination)
25
Ambiguity in Named Entity ASET

Detection
import spacy Output:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Amazon delivers Amazon to Amazon.") Amazon ORG
for ent in doc.ents: Amazon ORG
print(ent.text, ent.label_) Amazon ORG

spaCy assigns the same label ("ORG") to all instances due to surface similarity
and lack of deeper context modeling.
Token Manually Annotated Label
Amazon (1st) ORG (company)
Amazon (2nd) PRODUCT or ORG (ambiguous)
Amazon (3rd) GPE or LOC (location)

• Ambiguity arises when a single token refers to multiple concepts depending

on sentence role (subject, object, location).
• Standard NER models struggle with coreference and contextual
disambiguation.
• Manual annotation reveals the semantic richness that NER pipelines often
26
miss.
ASET

Noisy Channel Model &

Argmax-Based Computation

27
The Noisy Channel Model: ASET

Core Idea
•Rooted in Information Theory: Originally developed by Claude Shannon in the
1940s and later applied to speech recognition and signal processing in the 1960s.

•Metaphor: Think of communication as sending a message through a “noisy”

channel—like talking over a crackly phone line. The original message gets distorted,
and the receiver must reconstruct the most likely original message from the garbled
version they hear.

28
Noisy Channel Model in ASET

NLP
We observe a distorted output t (e.g., a misspelled word, noisy sentence, or
incorrect tagging), and want to infer the most probable original sequence W (e.g.,
the correct sentence or word).

(wₙ, wₙ₋₁, ..., w₁) ──[Noisy Channel]──> (tₘ, tₘ₋₁, ..., t₁)

The goal is to find the most likely W* given the observed t

Noisy transformation
Correct sequence
Guess at the correct sequence

29
Understanding argmax ASET

To apply the noisy channel model, we use argmax computation:

•max(f(x)) → Gives the maximum value of a function.

•argmax(f(x)) → Returns the value of x for which f(x) is maximum.
•Example:
If f(x)=−x2+4x, then max(f(x)) = 4 (the highest output value of the function).

•argmax(f(x)):
This gives you the value of x for which the function f(x) is maximized. 30
argmax(f(x)) = 2 because the function reaches its peak value (4) when x = 2.
Bayes' Theorem ASET

Bayes’ Theorem tells us how to update our belief about a hypothesis (A)
after seeing some evidence (B). It reverses the direction of conditional
probability. You might know how likely A is if B happens (likelihood)

In spelling correction, translation, or speech recognition, we observe noisy data

(e.g., a misspelling, a foreign word, an unclear sound).

Where:
•W: Original sentence or word (what we want to recover)
•t: Observed input (possibly distorted)
•P(W): How likely is the original word? (Language model)
•P(t∣W): How likely is this word to get distorted into what we saw? (Error model)
31
Bayes' Theorem ASET

P (A⋂B) = P (B⋂A)
P (A) P (B\A) = P (B) P (A\B)
=> P(A\B) =P(A) P (B\A)
P (B)

Bayesian Decision Theory in NLP:

Choose the value (e.g., label, tag, word) whose posterior probability is highest
given the observed data.This forms the mathematical basis for sequence
labeling tasks, which are abundant in NLP

32
Example ASET

An example: it is known that in a population, 1 in 50000 has meningitis and 1 in 20

has stiff neck. It is also observed that 50% of the meningitis patients have stiff
neck.A doctor observes that a patient has stiff neck. What is the probability that the
patient has meningitis?(Mitchel, Machine Learning, 1997)

We need to find
P(m|s): probability of meningitis given the stiff
neck

P(m∣s)<<P(∼m∣s)

Less likely to have meningitis 33

Bayesian Decision ASET

Theory Principle
Bayesian Decision Theory Principle
"Decide in favour of that value of a random variable which is the highest among
other values of the variable, probabilistically“

choose that value as the decision whose probability is the highest.

34
Some Issues ASET

p(m/s) could have been found as

#(m∩s)
#s

Questions:
•Which is more reliable to compute, p(s/m) or p(m/s)?
•Which evidence is more sparse, p(s/m) or p(m/s)?
•Test of significance: The counts are always on a sample of population.
35
•Which probability count has sufficient statistics?
ASET

Why p(s∣m) is Often Preferred Note: When applying

statistical or machine
•Meningitis is rare (about 1 in 50,000 people). learning techniques (e.g.,
•To compute p(s∣m) we: noisy channel models,
• Start with the small, well-defined group of meningitis argmax computations),
patients. choose the probability
• Measure how many of them have stiff neck. direction that is more
•This requires fewer observations and produces a more reliable and easier to
reliable estimate. compute.
•Higher confidence comes
Why p(m∣s) is Harder to Estimate Directly from probabilities with
•Stiff neck is more common and often subjective. larger, more reliable
•To compute p(m∣s) directly, we: counts.
• Start with a large group of people with stiff neck. •Always assess test of
• Identify which ones have meningitis. significance before
•This requires much larger samples and is less reliable. generalizing sample results
to the whole population.
Practical Note (Medical Context)
•Doctors and medical professionals tend to rely more on p(s∣m) because:
• Meningitis diagnosis is definitive (lab tests).
• Stiff neck can be subjective or caused by many other conditions.
•Bayesian theorem lets us compute p(m∣s) from p(s∣m) and the prior p(m) 36
Application in NLP ASET

• Despite surface differences, many NLP problems reduce to sequence

labelling:
• Examples:
– Part-of-Speech (PoS) Tagging
– Statistical Spell Checking
– Automatic Speech Recognition
– Probabilistic Parsing
– Machine Translation

37
Argmax-Based Computation ASET

in Statistical NLP
Why the
Denominator P(B)
Disappears
•P(B) is constant for
all A
•Does not affect the
argmax decision
•Can be ignored in
the computation
Key Steps in Computing P(A)) and P(B∣A)
1. Look at the internal structures of A and B
1. In NLP, A and B are often long sequences (sentences, word sequences).
2. Break them into smaller components (e.g., words, n-grams).
2. Make independence assumptions
1. Full joint probability of sequences is hard to compute.
2. .Assume certain variables are independent to simplify estimation.
3. Assemble computation from smaller parts
1. Estimate probabilities for small components.
2. Combine them to approximate P(A)) and P(B∣A)
38
PoS Tagging: Example ASET

The national committee remarked on a number of other issues.

Tagged Output:
The/DET national/ADJ committee/NOU remarked/VRB on/PRP a/DET
number/NOU of/PRP other/ADJ issues/NOU
•The → Determiner → DET
•PoS tags are assigned based on word •national → Adjective (qualifies
properties, function in the sentence, and “committee”) → ADJ
relationships to other words. •committee → Noun → NOU
•remarked → Verb → VRB
•on → Preposition → PRP
•Some words can function in multiple roles (e.g., •a → Determiner → DET
adjectives functioning as nouns), requiring •number → Noun → NOU
disambiguation. Eg: committee/NOU •of → Preposition → PRP
•other → Adjective → ADJ
•issues → Noun → NOU
•Accurate PoS tagging is essential for
downstream NLP tasks.

39
Part-of-Speech (PoS) Tagging ASET

– Statistical Formulation
From Example to General Model Trigram Tagger
•Sentence → sequence of words: w=(w1,w2,...,wn) •Assumes tag of a word
depends only on the
•Goal: assign the best sequence of tags t=(t1,t2,...,tn) previous two tags.
•Tags determined by: •Known as a trigram
• Lexical properties of the words. tagger.
• Context (neighbouring tags and words).
HMM + Viterbi
Decoding
•Hidden Markov Model
(HMM) provides the
probabilistic
framework.
•Viterbi algorithm finds
the most probable tag
sequence efficiently.

40
PoS Tagging: Example ASET

41
ASET

42
Statistical Spell Checking — ASET

Argmax Formulation
•P(W): Prior probability —
how likely a word is in the
language.
•Acts as a filter: Rare or
invalid words get low
probability.
•P(T|W): Likelihood — how
likely it is to misspell W as
T.

•Probabilities give scores → induce a ranking over

candidates.
•Pick highest-ranked candidate as the correction.

43
Statistical Spell Checking —
ASET

Argmax Formulation

44
Confusion Matrices in ASET

Statistical Spell Checking

We assume only a single error per misspelt word. The error can be of
one of four types:
Error Type Definition Example Notation

Number of times
Substitution APPLE → AOPLE
letter x is replaced sub(P, O)
(sub(x,y)) (P replaced by O)
by letter y.

Number of times
FIGHT → FIGBHT
Insertion (ins(x,y)) letter y is inserted ins(G, B)
(B inserted after G)
after letter x.

Number of times
letter y is deleted APPLE → APPE (L
Deletion (del(x,y)) del(P, L)
when preceded by deleted after P)
letter x.
Number of times
Transposition APPLE → APLPE
letters x and y are trans(P, L)
(trans(x,y)) (P and L swapped) 45
swapped.
Confusion Matrices in ASET

Statistical Spell Checking

Where
S= Substitution,
I= Insertion
T= Transposition
X= Deletion 46
Confusion Matrices in ASET

Statistical Spell Checking

Keyboard Neighbour Influence
Some errors (especially insertions and substitutions) occur more often when letters
are adjacent on a QWERTY keyboard.
Example: G is near F, H, T, Y, V, B — so typing G might accidentally produce GB,
GF, or GY.

47
Example ASET

• A typist is prone to the following substitution errors (counts taken from

past data):
Intended (x) Typed as (y) Count
t r 4
t y 3
h g 2
e r 1
e w 2

You are given a misspelt word: thr

•The intended word is in the set { the, thy, thr }.

•Only one substitution error can happen per word.

Using the substitution counts as probabilities find the most probable

intended word.
48
Example ASET

49
…example ASET

• Step 1 — Likelihoods P(T∣W)

50
…example ASET

• Step 2 — Priors P(W)

51
…example ASET

• Step 3 — Compute scores

S(W)=P(W) P(T∣W)

52
…example ASET

• General Discussion

53
ASET

Probabilistic Speech
Recognition

54
Speech recognition ASET

•Speech recognition: Converting spoken signals

to text

Word Boundary Isolated Word

Recognition – Recognition –

detecting where
identifying a
one word ends
single spoken
and the next
word
begins

Importance: Words may sound similar; context helps disambiguate

55
Word Boundary Recognition ASET

•Humans often detect boundaries effortlessly

•Example:
•"I found a key on the road today" → clear boundaries
•But ambiguity exists:
•"I got a plate very quickly"
•Could be “I got up late today”
•Or “I got a plate today”

"The stuffy nose"

•Could be: "The stuffy nose" → blocked nasal passages
•Could be: "The stuff he knows" → the information he knows

•Hindi: "Aa Jaayenge"

•“Aaj aayenge” → will come today
•“Aa jaayenge” → we will come
•Spoken disfluencies complicate boundaries: "umm", "aah", pauses
•Listeners use these cues unconsciously

Disfluency in Speech
•Fillers like “umm”, “aahh” help speakers think
•Hearers process these automatically
56
•These affect accurate boundary detection in machines
Isolated Word Recognition ASET

•Problem: Given speech signal S, find word W

•Goal: argmax over P(W | S)
•Best possible means: most probable word given the signal

Steps:
1. Segmentation (word boundary detection)
2. Word identification

Training Process
•Collect speech corpus (dialogues, lectures, phone calls)
•Clean & mark boundaries
•Annotate:
• Word boundaries
• Parts of speech
• Stress/emphasis
•Example: “I went to the bank” → POS tags: pronoun, verb, preposition,
determiner, noun

57
Isolated Word Recognition ASET

Probability Estimation
•Use annotated data to estimate P(W | S)
•Match input signal to possible words
•Choose word with highest probability

Example
•Speech signal: “DOG” (Da – Aa – Ga)
•Correct output: DOG
•But may also output: DAG, DEG (errors possible)
•Highlights difficulty in exact matching

58
ASET

Questions

59
Question 1 ASET

You are given a simple email classification task using the Naive Bayes algorithm.

You want to classify the email:

"win a lottery now" The label set is:
Y = { spam, ham }
The prior probabilities are:
P(spam) = 0.6
P(ham) = 0.4

The conditional probabilities for each word given the class are:

| Word | P(word | spam) | P(word | ham) |

|-------- -|-------------- -|--------------|
| win | 0.05 | 0.01 |
| lottery | 0.10 | 0.005 |
| now | 0.07 | 0.02 |

Use the Naive Bayes classifier to compute the posterior probabilities (unnormalized) for each
class and determine the most likely class label for the email.

Assume the Naive Bayes bag-of-words model with conditional independence between words.
60
Solution 1 ASET

Solution
We compute the unnormalized posterior probability for each class:

1. For class = spam:

P(spam) * P("win" | spam) * P("lottery" | spam) * P("now" | spam)

= 0.6 * 0.05 * 0.10 * 0.07
= 0.6 * 0.00035
= 0.00021
2. For class = ham:

P(ham) * P("win" | ham) * P("lottery" | ham) * P("now" | ham)

= 0.4 * 0.01 * 0.005 * 0.02
= 0.4 * 0.000001
= 0.0000004

61
….. Solution 1 ASET

Comparison:

- Spam: 0.00021
- Ham: 0.0000004

Conclusion:
Since 0.00021 > 0.0000004, the Naive Bayes classifier predicts the email as:

Spam

62
Question 2: ASET
Argmax-Based Computation in NLP
You receive the following email:
"Win a free lottery ticket now!"
You are given the following probabilities from a Naive Bayes classifier:

Compute the class (Spam or Ham) with the highest probability using the formula:
P(Class | Email) ∝ P(Word1 | Class) × P(Word2 | Class) × ... × P(Class)
63
Solution 2 ASET

Solution:
Spam Score = 0.07 × 0.10 × 0.05 × 0.4 = 0.00014
Ham Score = 0.01 × 0.02 × 0.005 × 0.6 = 0.0000006

=> Argmax class = Spam

64
Question 3:Spell Checking ASET

A user types the word:

"hte"

Possible intended words are:

• "the"

• "hat"

• "hate"

You are given the following probabilities:

P("hte" | "the") = 0.6

P("hte" | "hat") = 0.01

P("hte" | "hate") = 0.005

P("the") = 0.08

P("hat") = 0.005

P("hate") = 0.003

Compute the most likely intended word using the formula: 65

Solution 3 ASET

Solution

P(word | "hte") ∞ P("hte" | word) × P(word)

Score("the") = 0.6 × 0.08 = 0.048

Score("hat") = 0.01 × 0.005 = 0.00005
Score("hate") = 0.005 × 0.003 = 0.000015

=> Argmax word = "the"

66
Question 4 ASET

Parsing Tree Assignment

• Draw the syntactic parsing tree for the following sentence:

"The quick brown fox jumps over the lazy dog."

Label all parts of speech clearly and show the hierarchical syntactic
structure.

67
Solution ASET

Below is the parsing tree (in bracketed

notation) of the sentence:

(S
(NP (DT The) (JJ quick) (JJ brown) (NN
fox))
(VP (VBZ jumps)
(PP (IN over)
(NP (DT the) (JJ lazy) (NN dog))))
)

68
Question 5: ASET

• Parse the following sentence using constituency grammar and represent it

in bracketed form:
• “The cat sat on the mat.”

69
Solution ASET

Note
The cat sat on the mat.” . — This is the part-of-speech (POS) tag used
for any sentence-terminating punctuation
The sentence is parsed as: mark (period, question mark, exclamation
mark) in the Penn Treebank tag set.
(S The second . — Inside the parentheses, this
is the actual period character from the
(NP (DT The) (NN cat)) sentence "The cat sat on the mat."So (. .)
(VP (VBD sat) means “a period token, tagged as
punctuation”.
(PP (IN on)
(NP (DT the) (NN mat))))
(. .))
Or in bracketed linear form:
(S (NP (DT The) (NN cat)) (VP (VBD sat) (PP (IN on) (NP (DT the) (NN mat))))
(. .))
70
ASET

Thank You…!!

Chapter23 - Natural Language Processing
No ratings yet
Chapter23 - Natural Language Processing
87 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Natural Language Processing Unit 1-2
No ratings yet
Natural Language Processing Unit 1-2
18 pages
Lect1 Intro 3jan08
No ratings yet
Lect1 Intro 3jan08
94 pages
AI M3 Merged PDF
No ratings yet
AI M3 Merged PDF
98 pages
NLP StudyMaterial
No ratings yet
NLP StudyMaterial
540 pages
UNIT-1 Notes
No ratings yet
UNIT-1 Notes
19 pages
NLP Challenges: Sparsity Explained
No ratings yet
NLP Challenges: Sparsity Explained
55 pages
Brocode OP
No ratings yet
Brocode OP
133 pages
Natural Language Processing: Some Screenshots Are Taken From NLP Course by Jufrasky - Used Only For Educational Purpose
No ratings yet
Natural Language Processing: Some Screenshots Are Taken From NLP Course by Jufrasky - Used Only For Educational Purpose
44 pages
NLP Applications and Techniques Overview
No ratings yet
NLP Applications and Techniques Overview
40 pages
Module 1 HMM (Autosaved)
No ratings yet
Module 1 HMM (Autosaved)
62 pages
NLP Unit 1
No ratings yet
NLP Unit 1
44 pages
Natural Language Processing
100% (6)
Natural Language Processing
49 pages
1 Intro To NLP
100% (1)
1 Intro To NLP
46 pages
Unit 1-Introduction To NLP
No ratings yet
Unit 1-Introduction To NLP
68 pages
A Beginner's Guide To Natural Language Processing - IBM Developer
No ratings yet
A Beginner's Guide To Natural Language Processing - IBM Developer
9 pages
Speech Processing System
No ratings yet
Speech Processing System
20 pages
Types of Ambiguity in NLP Explained
No ratings yet
Types of Ambiguity in NLP Explained
9 pages
Module I Introduction
No ratings yet
Module I Introduction
71 pages
Introduction to NLP and Word2Vec
No ratings yet
Introduction to NLP and Word2Vec
13 pages
Part01 Overview
No ratings yet
Part01 Overview
31 pages
Module 1
No ratings yet
Module 1
27 pages
Introduction To Natural Language Processing: Unit 1
No ratings yet
Introduction To Natural Language Processing: Unit 1
60 pages
Ai TXT Unit1
No ratings yet
Ai TXT Unit1
13 pages
NLP Introduction Overview
No ratings yet
NLP Introduction Overview
34 pages
Inflection and Derivational Morphology
No ratings yet
Inflection and Derivational Morphology
4 pages
NLP & Linguistics for Researchers
No ratings yet
NLP & Linguistics for Researchers
35 pages
Natural Language Processing Language Models? - Term...
No ratings yet
Natural Language Processing Language Models? - Term...
4 pages
NLP Merged
100% (1)
NLP Merged
975 pages
1.2 Chap NLP Intro-2
No ratings yet
1.2 Chap NLP Intro-2
46 pages
Introduction To NLP 2021
No ratings yet
Introduction To NLP 2021
13 pages
Mod 1
No ratings yet
Mod 1
71 pages
Module 1 Lecture 1
No ratings yet
Module 1 Lecture 1
29 pages
Main
No ratings yet
Main
72 pages
AI Chapter 5
No ratings yet
AI Chapter 5
37 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Intro NLP
No ratings yet
Intro NLP
47 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
NLP Unit 1
No ratings yet
NLP Unit 1
43 pages
Feature Eng
No ratings yet
Feature Eng
34 pages
R1 Handbook
No ratings yet
R1 Handbook
4 pages
NLP for AI and Business Solutions
No ratings yet
NLP for AI and Business Solutions
13 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Unit 1
No ratings yet
Unit 1
14 pages
Module 1.1
No ratings yet
Module 1.1
9 pages
Natural Language Processing For The Semantic Web (Diana Maynard, Kalina Bontcheva Etc.) (Z-Library)
100% (1)
Natural Language Processing For The Semantic Web (Diana Maynard, Kalina Bontcheva Etc.) (Z-Library)
184 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
5 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
31 pages
NLP Challenges & Techniques
No ratings yet
NLP Challenges & Techniques
45 pages
Artificial Intelligence For Natural Language Processing
No ratings yet
Artificial Intelligence For Natural Language Processing
132 pages
Phonology in Natural Language Processing
No ratings yet
Phonology in Natural Language Processing
35 pages
POStagging
No ratings yet
POStagging
72 pages
Natural Language Annotation For Machine Learning A Guide To Corpus Building For Applications James Pustejovsky PDF Download
No ratings yet
Natural Language Annotation For Machine Learning A Guide To Corpus Building For Applications James Pustejovsky PDF Download
52 pages
Overview of Natural Language Processing: Advanced AI CSCE 976 Amy Davis
No ratings yet
Overview of Natural Language Processing: Advanced AI CSCE 976 Amy Davis
54 pages
NLP Unit 1
No ratings yet
NLP Unit 1
56 pages
Natural Language Processing Seminar Overview
No ratings yet
Natural Language Processing Seminar Overview
21 pages
Shreya WPR7
No ratings yet
Shreya WPR7
4 pages
Shreya's PDF
No ratings yet
Shreya's PDF
33 pages
AI - File Copy 2
No ratings yet
AI - File Copy 2
42 pages
Sensor and Vision Systems in Robotics
No ratings yet
Sensor and Vision Systems in Robotics
11 pages
Coo PDF
No ratings yet
Coo PDF
60 pages
WPR - 2
No ratings yet
WPR - 2
3 pages
Cmo PDF
No ratings yet
Cmo PDF
71 pages
Rhce Ex294 Exam Solutions
No ratings yet
Rhce Ex294 Exam Solutions
26 pages
Projet D'activité Didactique: Limba Franceză Pentru Clasa A VI-a
No ratings yet
Projet D'activité Didactique: Limba Franceză Pentru Clasa A VI-a
6 pages
Language's Role in Power Dynamics
No ratings yet
Language's Role in Power Dynamics
20 pages
You Were Made For A Mission
No ratings yet
You Were Made For A Mission
4 pages
TCL-TK Quick Guide
100% (1)
TCL-TK Quick Guide
139 pages
Immigration Trends and Challenges
No ratings yet
Immigration Trends and Challenges
4 pages
Dennmark Free Corruption First Exam
No ratings yet
Dennmark Free Corruption First Exam
3 pages
Introduction to Morphology Concepts
No ratings yet
Introduction to Morphology Concepts
4 pages
Mock 6 Soln
No ratings yet
Mock 6 Soln
66 pages
Hamlet's Fate Essay
No ratings yet
Hamlet's Fate Essay
4 pages
What Happened To Narrative Preaching
No ratings yet
What Happened To Narrative Preaching
7 pages
De Nicola The Chobanidsof Kastamonu
No ratings yet
De Nicola The Chobanidsof Kastamonu
275 pages
Part of Speech
100% (1)
Part of Speech
17 pages
Sistem Informasi Opera di Reservasi Hotel
No ratings yet
Sistem Informasi Opera di Reservasi Hotel
11 pages
40th Anniversary of Sacrosanctum Concilium
No ratings yet
40th Anniversary of Sacrosanctum Concilium
5 pages
Traduction PDF Lesson2
No ratings yet
Traduction PDF Lesson2
4 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
Literature Theory Course Guide
No ratings yet
Literature Theory Course Guide
5 pages
First and Second Conditional
No ratings yet
First and Second Conditional
4 pages
IT Asset Management System Overview
No ratings yet
IT Asset Management System Overview
78 pages
Web Application Architecture: Basics, Components, Design and Development
No ratings yet
Web Application Architecture: Basics, Components, Design and Development
22 pages
Building Jazz Vocabulary
100% (10)
Building Jazz Vocabulary
11 pages
Class IX English Life Skills Guide
No ratings yet
Class IX English Life Skills Guide
16 pages
The Sound of Music - by Deobarah Cowly
No ratings yet
The Sound of Music - by Deobarah Cowly
3 pages
Cambridge IGCSE: English As A Second Language 0510/52
No ratings yet
Cambridge IGCSE: English As A Second Language 0510/52
24 pages
Introduction To Polynomial
No ratings yet
Introduction To Polynomial
23 pages
2023 Ter Mod 2
No ratings yet
2023 Ter Mod 2
4 pages
Dream Report Communication Drivers 2020 R2 1
No ratings yet
Dream Report Communication Drivers 2020 R2 1
4 pages
Unit 4 Total Revision
No ratings yet
Unit 4 Total Revision
11 pages
Research Methodology Guide
No ratings yet
Research Methodology Guide
4 pages

Module I Introduction Part 2

Uploaded by

Module I Introduction Part 2

Uploaded by

ASET

Phonetics → Phonology → Morphology → Syntax →

Parsing Noun Phrases

The importance of these rules is twofold:

Emerged with the Advent of the Web

Explosion of digital content

Applies Machine Learning to Uncover Language Patterns

Adoption of statistical and probabilistic models

Learning from annotated examples

Foundation for modern deep learning NLP

Leverages Large Corpora (Machine-Readable Text)

Availability of large-scale text datasets

Supports training and evaluation of models

• Handcrafted linguistic rules and • Uses machine learning to infer

“Visiting aunts can be a nuisance.”

Types of Ambiguity: •Types of Ambiguity:

•Grammatical ambiguity – Does “visiting” function as a verb or an

Ambiguity Resolution = Classification Task

Sequence Labelling and Noisy

Named Entity Recognition (NER)

Word Sense Disambiguation (WSD)

Determine which sense of a word is used in context.

• English: Apple announced new products in California.

Information Retrieval (IR)

Query: Articles about Tesla

Original: Barack Obama met Angela Merkel in Berlin.

• Question: Where was Einstein born?

Information Extraction (IE)

Tag Meaning Example

“During the conference, Dr. Singh presented groundbreaking research on

POS Tagged Table:

conference NN Common noun (singular)

Singh NNP Proper noun (person name)

presented VBD Verb (past tense)

Named Entity Recognition

These phrases are central to understanding opinions in texts like product,

Chunking (Shallow Parsing)

These phrases are central to understanding opinions in texts like product,

Full Parsing (Constituency or Dependency):

Constituency Parse: (NP (JJ amazing) (NN food))

•Parsing creates a syntax tree from a sentence

“The UJF campus is beautiful.”

Amazon delivers Amazon to Amazon

Word Correct Entity Type Explanation

• Ambiguity arises when a single token refers to multiple concepts depending

Noisy Channel Model &

•Metaphor: Think of communication as sending a message through a “noisy”

The goal is to find the most likely W* given the observed t

To apply the noisy channel model, we use argmax computation:

•max(f(x)) → Gives the maximum value of a function.

In spelling correction, translation, or speech recognition, we observe noisy data

Bayesian Decision Theory in NLP:

An example: it is known that in a population, 1 in 50000 has meningitis and 1 in 20

Less likely to have meningitis 33

choose that value as the decision whose probability is the highest.

p(m/s) could have been found as

Why p(s∣m) is Often Preferred Note: When applying

• Despite surface differences, many NLP problems reduce to sequence

The national committee remarked on a number of other issues.

•Probabilities give scores → induce a ranking over

Statistical Spell Checking

Statistical Spell Checking

Statistical Spell Checking

• A typist is prone to the following substitution errors (counts taken from

You are given a misspelt word: thr

•The intended word is in the set { the, thy, thr }.

Using the substitution counts as probabilities find the most probable

• Step 1 — Likelihoods P(T∣W)

• Step 2 — Priors P(W)

• Step 3 — Compute scores

•Speech recognition: Converting spoken signals

Word Boundary Isolated Word

Importance: Words may sound similar; context helps disambiguate

•Humans often detect boundaries effortlessly

"The stuffy nose"

•Hindi: "Aa Jaayenge"