0% found this document useful (0 votes)
25 views71 pages

Module I Introduction

The document provides an introduction to Natural Language Processing (NLP), covering its definition, goals, key applications, and biological underpinnings related to speech processing. It discusses various techniques for word boundary detection and the significance of phonetics and articulation in NLP. Additionally, it highlights examples of NLP applications, including machine translation, speech recognition, and chatbots.

Uploaded by

s98230358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views71 pages

Module I Introduction

The document provides an introduction to Natural Language Processing (NLP), covering its definition, goals, key applications, and biological underpinnings related to speech processing. It discusses various techniques for word boundary detection and the significance of phonetics and articulation in NLP. Additionally, it highlights examples of NLP applications, including machine translation, speech recognition, and chatbots.

Uploaded by

s98230358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ASET

ASET
NLP
Module I Introduction
Dr. Sweta Srivastava

1
Module I : Introduction ASET

We are going to see an overview of

• What is NLP
• Modern day concepts of NLP
• Biology of Speech Processing
• Place and Manner of Articulation
• Word Boundary Detection
• Argmax based computations
• HMM and Speech Recognition

2
NLP ASET

computer
artificial science,
intelligence
(AI),

linguistics

Natural Language
Processing

3
NLP ASET

• Natural Language Processing (NLP) is a field at


the intersection of computer science, artificial
intelligence (AI), and linguistics.

• It focuses on the interaction between computers


and human (natural) languages, enabling
machines to read, understand, interpret, and
generate human language.

4
Goals of NLP ASET

Understand: Comprehend spoken or written language

Interpret: Extract meaning, sentiment, and intent

Generate: Create language that is fluent and meaningful

Translate: Convert between languages

Interact: Enable natural conversation with machines

5
Key Applications of NLP ASET

Machine Translation (Google Translate, DeepL)

Speech Recognition (Siri, Alexa, Google Assistant)

Text Classification (Spam filters, sentiment analysis)

Chatbots and Virtual Assistants

Information Extraction (News summarization, knowledge graphs)

Search Engines (Google Search, semantic search)

Text Generation (ChatGPT, content creation tools)


6
NLP Example ASET

7
NLP Example ASET

• Kalkeya tribe in Baahubali: The Beginning.


• This involved developing a vocabulary of 3000 words and a unique
script with 22 symbols.
• The language's creation demonstrates NLP principles by establishing
a structured system of communication, even though it's fictional.

•NLP Principles:
•The process of creating Kiliki mirrors NLP techniques, such as:Lexicon
Development: Defining the words (morphemes) and their meanings (semantics).
•Syntax and Grammar: Establishing rules for how words are combined to form
meaningful sentences (morphosyntax).
•Phonology and Morphology: Determining the sounds and structure of words.
8
NLP Example ASET

Example: Dynamic Non-Player Character.


Conversations
•Games like The Elder Scrolls, Cyberpunk 2077, and
upcoming RPGs use NLP for non-player characters
(NPCs) to:
• Understand player text or voice input
• Generate adaptive dialogue based on story context
9
• Maintain character personality over time
NLP Example ASET

• Deepfake Voice & Lip Sync for Dubbing


• NLP + Computer Vision enables:
– Voice cloning of actors in multiple languages
– Lip sync correction using phoneme prediction
– Emotional speech synthesis to match original acting
• Used in shows like The Mandalorian (Luke Skywalker scenes).

10
NLP Example ASET

• Example: Translating Ancient Texts


• NLP is used to digitize and analyze ancient Indian texts (e.g.,
Vedas, Tamil Sangam literature).
• Tools do:
– Optical Character Recognition (OCR)
– Named Entity Recognition (NER) for gods, kings, places
– Syntax parsing for poetic structure

11
NLP Example ASET

• Rap, Poetry, and Lyrics Generation


• Example: AI-generated Indian classical compositions or rap
lyrics
• NLP models trained on Bhajans, Shayari, or rap can:
– Generate rhythmically accurate lyrics
– Mimic poetic meter or classical ragas
• Tools like MuseNet, Jukebox, or fine-tuned GPT models are
used.

12
NLP Example ASET

• Mental Health Chatbots


• Example: Wysa, Woebot
• Real-time conversation analysis to detect stress, anxiety, or
depression.
• Use sentiment analysis, emotion detection, and intent
recognition.
• Personalized and evolving conversations over time.

13
NLP Example ASET

• Real-time Content Moderation


• Social media platforms like YouTube, Facebook, Instagram use
NLP to:
– Detect hate speech, nudity, or dangerous content in real time.
– Auto-flag or take action on livestreams or comments.

• Real-Time Multilingual Chat in Movies

• Imagine two characters speaking different languages (say


Telugu and Hindi), and NLP enables:Instant understanding
with on-screen or audio translationDynamic cultural
adaptation of idioms, slang, and metaphors

• Real-time Dubbing & Subtitling


• Example: Netflix or YouTube Auto-Dub/SubNLP + Speech
Recognition + Machine Translation = auto-generated dubs/subs
in multiple [Link] models like Whisper (for speech)
and MarianMT, M2M-100 (for translation). 14
ASET

Biology of Speech Processing

15
Biology of Speech ASET

Processing
• Speech processing in humans involves a complex interaction
between auditory perception, language comprehension, and
motor control. Key components include:
– Auditory Input & Perception
– Speech Recognition & Language Processing
– Semantic & Syntactic Processing
– Motor Output for Speech

16
Biology of Speech ASET

Processing
1. Auditory Input & Perception
• Cochlea (inner ear): Converts sound waves into neural signals.
• Auditory nerve: Transmits signals to the brainstem and auditory
cortex.
• Primary auditory cortex (in temporal lobe): Analyzes basic sound
features like pitch and loudness.
2. Speech Recognition & Language Processing
• Wernicke’s area (posterior temporal lobe):
• Interprets words and meaning.
• Critical for understanding spoken language.
• Broca’s area (inferior frontal gyrus):
• Involved in speech production and grammatical processing.
• Helps organize words into sentences.

17
Biology of Speech ASET

Processing
Semantic & Syntactic Processing
• Angular gyrus & supramarginal gyrus:
• Interface between auditory and visual language input.
• Prefrontal cortex:
• Higher-order planning, working memory, and pragmatics
of language.

4. Motor Output for Speech


• Motor cortex: Controls muscles for articulation (lips,
tongue, larynx).
• Basal ganglia & cerebellum: Coordinate fine motor
control for fluent speech.
18
ASET

19
ASET

20
Relevance to NLP & ASET

Speech Technology
1. Speech Recognition
• Systems like automatic speech recognition (ASR) mimic how the brain decodes audio
into linguistic units (phonemes, words).
• Neural networks (especially CNNs and RNNs) simulate hierarchical auditory processing.

2. Representation Learning
• Self-supervised learning (e.g., wav2vec 2.0 by Facebook AI) learns speech features
similar to how the auditory cortex builds internal representations.
• Contrastive learning in these models is akin to how the brain differentiates between similar
sounds in context.

3. Cognitive-Inspired Models
• Models like Transformer-based ASR and language models reflect cortical hierarchies in
processing sequential and contextual information.
4. Multimodal Integration
• Human speech understanding is often multisensory (visual cues like lip movement),
which is being incorporated into multimodal NLP models.
21
ASET

Phases of NLP

22
Phases of NLP ASET

Phonetic and Lexical


Morphology,
Phonology, Analysis,

Semantic Syntactic
Pragmatics
Analysis, Analysis,

Discourse.

23
Phonetic and Phonology ASET

• Phonetics is the branch of linguistics that studies the physical production


and perception of speech sounds. It focuses on how sounds are made,
transmitted, and heard, regardless of any specific language.

Articulatory Phonetics Acoustic Phonetics Auditory Phonetics


– Studies how speech – Studies the physical – Studies how sounds
sounds are produced by the properties of speech are perceived by the
human vocal tract. sounds as sound waves. ear and brain
– Focuses on the movement
and position of: – Involves: – Deals with the
• Tongue • Frequency (pitch) hearing process,
• Lips • Amplitude including:
• Teeth (loudness) • Ear structure
• Vocal cords • Duration • Auditory nerve
• Velum (soft palate), • Spectrogram signals
etc. analysis • Brain interpretation

24
Phonology ASET

• Phonology is the branch of linguistics that studies how sounds


function within a particular language or languages. It focuses on the
abstract, mental representations of sounds and the rules that govern
how sounds behave and interact.
Phoneme: Allophones:
The smallest sound unit that can Variants of a phoneme that do not change
distinguish meaning in a language. meaning.
Often occur in specific environments and are
Example:In English, /p/ vs. /b/ predictable.
changes meaning: pat /pæt/ vs. bat Example:The /p/ sound in:pin [pʰɪn]
/bæt/→ So /p/ and /b/ are separate (aspirated) spin [spɪn] (unaspirated)→ [pʰ]
phonemes. and [p] are allophones of the same phoneme
/p/ in English.
Minimal Pairs Prosodic Features
A pair of words that differ by only one Stress, intonation, tone, and rhythm also fall
sound in the same position and have under phonology. Example:REcord
different meanings. Example:bit /bɪt/ (noun) vs. reCORD (verb) — stress changes
vs. bat /bæt/→ Shows that /ɪ/ and /æ/ meaning. 25
are separate phonemes.
Place and Manner of ASET

Articulation
• Place of articulation (where in the vocal tract the constriction
occurs)
• Manner of articulation (how the airstream is affected as it flows
through the vocal tract)
• Voicing (whether the vocal cords vibrate)

This kind of chart is fundamental in Natural Language Processing


(NLP) for tasks involving phonetics, speech recognition, and text-to-
speech systems.

26
Place of Articulation ASET

Bilabial (both lips, e.g., [p], [b], [m])

Labiodental (lip and teeth, e.g., [f], [v])

Dental (tongue and teeth, e.g., [θ], [ð])

Alveolar (tongue and alveolar ridge, e.g., [t], [d], [s], [z], [n], [l])

Post-alveolar (just behind the alveolar ridge, e.g., [ʃ], [ʒ])

Palatal (hard palate, e.g., [j])

Velar (soft palate, e.g., [k], [g], [ŋ])

Glottal (glottis, e.g., [h], [ʔ])


27
Manner of Articulation ASET

Plosive (Stop): Complete blockage of airflow ([p], [b], [t], [d], [k], [g])

Nasal: Airflow through the nose ([m], [n], [ŋ])

Trill: Rapid vibration ([r], less common in English)

Tap or Flap: Quick touch of articulators ([ɾ])

Fricative: Partial blockage, turbulent airflow ([f], [v], [s], [z], [ʃ], [ʒ], [θ], [ð], [h])

Affricate: Plosive followed by fricative ([tʃ], [dʒ])

Approximant: Slight constriction ([w], [j], [ɹ])

Lateral Approximant: Air escapes around the sides ([l])


28
Voicing ASET

Voiced sounds: vocal cords vibrate (e.g., [b], [d], [g], [v],
[z])

Voiceless sounds: vocal cords do not vibrate (e.g., [p], [t],


[k], [f], [s])

Applications in NLP:
•Speech Recognition: Understanding articulatory features helps in mapping
audio to phonemes.
•Speech Synthesis: Text-to-speech systems use these to mimic realistic
pronunciation.
•Phoneme Embedding: Used in neural networks for modeling phonetic
features.
•Accent Detection and Correction: Identifying misarticulations based on this
chart.
29
ASET

Word Boundary Detection

30
Word Boundary Detection ASET

• Word boundary detection is the task of identifying where words


begin and end in a text or speech stream.
• This is crucial in Natural Language Processing (NLP) for
languages and contexts where words are not clearly separated (e.g.,
in speech or in languages like Chinese or Thai).

•In English, spaces separate words, so boundary detection is often


straightforward.

•In speech recognition, text normalization, morphological analysis, or


languages with no whitespace (e.g., Chinese, Japanese, Thai), word
boundaries are not obvious and must be inferred.

31
Word Boundary Detection ASET

Accurate detection impacts:

Tokenization
• The process of breaking text into smaller units like words, subwords, or
sentences.
Parsing
• Analyzing the grammatical structure of a sentence to identify
relationships between words.
Named entity recognition
• Identifying and classifying proper names in text into categories like
person, organization, or location.
Machine translation

32
Techniques for Word ASET

Boundary Detection
• Rule-Based Methods
– Use handcrafted rules based on grammar or known word lists.
– Pros: Simple, interpretable.
– Cons: Not scalable or robust to new data/languages.

• Rule-based approach involves applying a particular set of rules or patterns to


capture specific structures, extract information, or perform tasks such as text
classification and so on.

• Some common rule-based techniques include regular expressions and


pattern matches.

33
Techniques for Word Boundary ASET

Detection:
Rule based Methods
Rule Creation:
Based on the desired tasks, domain-specific linguistic rules are created
such as grammar rules, syntax patterns, semantic rules or regular
expressions.
Rule Application:
The predefined rules are applied to the inputted data to capture matched
patterns.
Rule Processing:
The text data is processed in accordance with the results of the matched
rules to extract information, make decisions or other tasks.
Rule refinement:
The created rules are iteratively refined by repetitive processing to
improve accuracy and performance. Based on previous feedback, the
rules are modified and updated when needed. 34
Techniques for Word ASET

Boundary Detection
• Dictionary-Based Matching
– Greedy algorithms (e.g., Maximum Matching) scan text to match known
words.
– Limitations: Struggle with unknown or ambiguous words.

Greedy Approach:
These algorithms make the locally optimal choice at each step, aiming for the
longest possible word match. For example, in the phrase "themendinehere", a
greedy algorithm might first match "theme" as the longest word, then continue
from "n".

35
Techniques for Word ASET

Boundary Detection:
Dictionary-Based Matching

[Link]
36
segmentation-maximal-matching-1-ed2ad5cab4fc
Techniques for Word ASET

Boundary Detection
• Statistical Approaches
– n-gram models: Estimate the likelihood of word sequences.
– Bayesian segmentation: Infers most probable word boundaries given a
probabilistic model.

Bayesian segmentation aims to find the optimal way to divide a sequence (e.g.,
a string of characters) into meaningful units (e.g., words).
How it works:
This approach uses a probabilistic model and Bayes' theorem to calculate the
probability of different segmentations. It looks for the segmentation that is most
likely given the model and the input sequence.
Example:
In a sequence like "thisisatest", Bayesian segmentation might infer that the
most likely word boundaries are "this is a test".
37
Techniques for Word ASET

Boundary Detection
• Machine Learning/ Deep Learning Approaches
– Supervised learning: Trained on labeled corpora (e.g., CRFs,
HMMs).
– Features used: Character sequences, POS tags, punctuation,
context windows.
– RNNs / LSTMs / Transformers:
– Learn context-aware features automatically.
– Can handle noisy or unseen data better than traditional ML.
– Often used in languages without whitespace or for speech data.

38
Techniques for Word ASET

Boundary Detection

39
Morphology ASET

• Morphology is a fundamental linguistic concept used in Natural Language


Processing (NLP) to analyze the structure of words.
• It helps machines understand how words are formed and related to each
other—crucial for tasks like machine translation, lemmatization, and text
classification.

40
Types of Morphological ASET

Processing in NLP
1. Stemming
•Reduces a word to its base/root form by chopping off affixes.
•Often crude; may produce non-words.
•Example:
• "connection", "connected" → "connect"

2. Lemmatization
•Returns the dictionary base form (lemma) of a word.
•Considers context and POS (Part-of-Speech).
•Example:
• "was" → "be", "better" → "good"

41
Types of Morphological ASET

Processing in NLP

42
Challenges in ASET

Morphological Analysis
Ambiguity:
One form may map to multiple lemmas
Word: saw
Possible lemmas:
• see (past tense of the verb): "I saw a bird."
• saw (noun, a tool): "He used a saw to cut wood."
Complex Word Forms:
Some languages can have entire sentences as one word.
uncharacteristically
• un- = not
• character = root noun
• -istic = relating to
• -ally = adverb form
In a way that is not typical of someone’s character

Sparse Data:
Rare or unseen morphological variants in training data.
Low-frequency or constructed words.
43
Lexical Analysis in NLP ASET

• Lexical analysis in NLP is the first step in processing natural


language, where raw text is broken down into meaningful units
called lexemes or tokens.
• It’s foundational to understanding and interpreting human language
computationally.
• Refers to dictionary access and obtaining the properties of the word

•e.g. dog Lexical property: Its part of speech


• noun (lexical property) (noun).
• take-‘s’-in-plural (morph Morphological property: How it
property) changes form, such as taking “-s” in the
• animate (semantic plural.
property) Semantic property: Meaning-related
• 4-legged (-do-) features like being animate.
• carnivore (-do-) World knowledge (-do-): Contextual
Challenge: Lexical or word sense properties like being 4-legged or a
disambiguation carnivore.
44
Lexical Analysis in NLP ASET
Dictionary as a Core Component:
In NLP, the dictionary (lexicon) plays a central role by storing root words and embedding
important linguistic and semantic properties (e.g., "dog" is a noun, takes 's' in plural, is
animate, etc.).

Morphological Processing:

It breaks down words into root forms and affixes. For example, "dogs" is reduced to the
root "dog" by stripping the plural suffix.
Role in Advanced NLP Tasks:

Lexical knowledge is essential for tasks like Question Answering, where machines must
understand meaning and infer real-world knowledge (e.g., knowing that a dog lives a lifespan
implies it is a living being).

Need for Semantic Understanding:

Answering questions like "How many years does a dog live?" requires deep semantic
information and inferencing, which comes from rich lexical resources.
Main Challenge – Word Sense Disambiguation (WSD):

A key difficulty in lexical analysis is determining the correct meaning of a word in context,
45
making WSD one of the most complex problems in NLP.
Lexical Disambiguation in ASET

NLP
Part of Speech (POS) Disambiguation
Words often belong to multiple parts of speech depending on context.
Example: “Dog”
•Noun: “The dog barked loudly.” (animal)
•Verb: “He dogged her footsteps.” (to pursue)

Goal: Identify the correct grammatical category of a word in context.

Sense Disambiguation
A single word may have multiple meanings (senses) within the same
part of speech.
Example: “Dog” as a noun
•Sense 1: A domesticated animal.
•Sense 2: A despicable or detestable person.
Goal: Determine the correct sense of the word given its semantic
context.
46
Lexical Disambiguation in ASET

NLP
Importance of Word Relationships in Context
Understanding meaning requires looking at how words relate to each
other.
Examples:
•“The chair emphasized the need for adult education.”
→ "Chair" = person (not furniture)
•“Watch what you want, when you want.”
→ “Watch” =
•Verb (to view)
•Noun (TV or timepiece)
•“Groundbreaking ceremony” vs. “Groundbreaking research”
→ Same word ("groundbreaking"), different interpretations (event vs.
innovative idea)
•Context is key: Without analyzing surrounding words, disambiguation is
not possible.
47
Language Change Driven ASET

by Technology
• Language evolves constantly, and technological advancements are
one of the biggest drivers of new terms, redefined meanings, and
nuanced expressions.
• This is especially relevant in NLP and linguistics, where word
meanings must be understood in a contemporary context.

"Justify" – "Xeroxed" – Verb from "Digital Trace"


Word Processing Context Brand – Internet Age
Original meaning: to validate Origin: Xerox machine, a ExpressionRefers to a
or defend something. brand for photocopiers. user's activity footprint on
New meaning (due to word New usage:➤ To the internet.
processors):➤ To align text to photocopy Includes visited URLs,
the left, right, centre, or both [Link]: “I clicked links, browsed
margins. Example: “Justify Xeroxed the document.” content, etc.
the right margin” = align text ➤ A brand name Google and other search
to the right. transformed into a engines can track this
commonly used verb. behavior to build user
profiles.
48
Language Change Driven ASET

by Technology
"Communifaking" – Mobile Age "Helicopter Parenting" – Metaphor for
Behavior Overparenting
•Coined from “communication” + •Derived from helicopter hovering overhead.
“faking”. •Means:
•Meaning: Pretending to talk on a ➤ Parents who constantly monitor or interfere in their
mobile phone when you're not. children’s lives.
•Use case:
➤ A humorous example is using this "Texto" and "Speako" – Inspired by "Typo"
to escape a boring meeting. •Typo = a typing mistake.
•Reflects how mobile culture •Texto = a mistake while texting/SMS.
influences social behavior and •Speako = a mistake while speaking.
language. •Emerging from casual communication and mobile
culture.

"Discomgooglation" – Anxiety Netigen is a blend of the words “Net” (referring to the


from No Internet Internet) and “Citizen”.
•Blend of “discomfort” + “Google”. It refers to a person who is native to the internet,
•Refers to: someone who:
➤ Anxiety or discomfort caused by •Is digitally savvy
being unable to access the internet or •Actively participates in online communities and
use search engines. platforms 49
ASET

Syntax Processing Stage in


Natural Language Processing

50
Syntax Processing Stage ASET

• After the system has processed individual words, the next major step is
understanding the structure of a sentence — how words combine to form
phrases and sentences.
• This is essential for the system to derive meaning, relationships, and hierarchical
structure from natural language input.
• This goes beyond word-level analysis and looks at how words form phrases,
clauses, and sentences according to grammatical rules.

Key Components of Syntax Processing


Syntactic Categories Constituency vs. Dependency

These are grammatical groupings: •Constituency Grammar: Based on


•Noun Phrase (NP): e.g., "the cat", "I" tree structures (e.g., Phrase
•Verb Phrase (VP): e.g., "likes mangoes" Structure Trees)
•Prepositional Phrase (PP): e.g., "on the
table" •Dependency Grammar: Focuses on
•Adjective Phrase (ADJP): e.g., "very big"
binary relations between words (e.g.,
•Sentence (S): Composed of NP + VP (in
“like” is the head of “mangoes”)
English) 51
Structure Detection: Example ASET

Breakdown
Let’s consider the sentence:

"I like mangoes“

We break it down into its


syntactic components using
a tree structure:

•S → Sentence
•NP → Noun Phrase
•VP → Verb Phrase Dependency Structure:
•V → Verb like → I (subject)
•N → Noun like → mangoes (object)
Key Concept: Parsing
•Parsing = converting a sentence into a tree-like syntactic structure
•Enables machines to understand grammar rules, phrase dependencies, and
relationships between words 52
Syntax Processing ASET

Parsing Challenges in NLP:


•Ambiguity: Multiple valid structures possible for the same sentence
•Context-sensitivity: Meaning and structure can depend on context
•Grammar rules: Need to be defined clearly for different languages

Parser Type Description Real-life Application:


Starts from the root (S)
Top-down parser and tries to match the •Chatbots
input sentence •Machine Translation
•Grammar checkers
Starts from the words •Voice Assistants (e.g., Siri, Alexa)
Bottom-up
and builds up to the
parser
root S → NP VP
A stack-based method NP → Det N | Pronoun
Shift-reduce VP → V NP
to construct trees
parser Det → a | the
incrementally
N → mangoes | apple
Builds relationships
Dependency Pronoun → I
between head words
parser V → like | eat
and dependents
53
Challenges in Syntactic ASET

Processing: Structural Ambiguity


• Structural ambiguity arises when a sentence can be interpreted in more
than one way due to multiple possible syntactic structures.

Scope Ambiguity
Occurs when it’s unclear how far a modifier or quantifier extends.

Example 1:
“The old men and women were taken to safe locations.”
•Reading A: [The old men] and women → Only men are old
•Reading B: The old [men and women] → Both men and women are old
Ambiguity: Does "old" modify both "men and women" or just "men"?

Example 2:
“No smoking areas will allow hookahs inside.”
•Meaning A: Areas where smoking is not allowed → won’t allow hookahs
•Meaning B: No areas (whatsoever) that allow smoking will allow hookahs
Ambiguity: What does “No” apply to — the smoking areas or the allowance? 54
Challenges in Syntactic ASET

Processing: Structural Ambiguity


Prepositional Phrase (PP) Attachment Ambiguity
Occurs when a prepositional phrase can be attached to multiple parts of the
sentence.
Example 1:
“I saw the boy with a telescope.”
•Does “with a telescope” describe:
• The boy? (he has a telescope)
• Or how I saw him? (I used a telescope)
Example 2:
“I saw the mountain with a telescope.”
•Unlikely for the mountain to possess a telescope → likely describes the
instrument
Example 3:
“I saw the boy with the ponytail.”
•More likely: “the boy with the ponytail” (modifies the boy)
•Unlikely: instrument of seeing = ponytail
Role of World Knowledge: Helps resolve ambiguity in real-world contexts.
55
Garden Path Sentences: A ASET

Case of Misleading Structure


• Garden path sentences are grammatically correct sentences that lead the
reader to an incorrect initial interpretation.
• They force backtracking when the true syntactic structure becomes clear.

•Garden path sentences are grammatically correct but temporarily confusing.


•Parsing algorithms fail or slow down due to incorrect initial parse.
•They must re-evaluate the parse tree, costing time and computational effort.
•Used as test cases in evaluating NLP parsers.

Example: “The horse raced past the garden fell.”


Explanation:
•Initial parse: The horse raced past the garden (complete sentence?)
•Surprise: fell doesn’t fit!
•Required reanalysis:
• Correct interpretation: The horse [that was raced past the garden] fell.
• ‘Which was’ is elided (dropped) under specific grammatical rules.

56
A Case of Misleading ASET

Structure
Example: “The old man the boat.”
Explanation:
•Initial parse: The old man → assumed as noun phrase
•Problem: Where’s the verb?
•Reanalysis:
• “man” = verb (to operate/steer)
• “the old” = noun (adjective used as noun → old people)
• Correct interpretation:
The old [people] man [steer] the boat.
Key Learning:
•Part-of-speech ambiguity causes garden pathing.
•"man" as noun or verb.

57
Core Parsing Challenge ASET

•NLP parsers often use greedy strategies


•Garden path sentences expose parser inflexibility
•Requires:
•Lookahead mechanisms
•Probabilistic re-ranking
•Backtracking capabilities

[Link] modern transformers like BERT handle garden path sentences well?
[Link] strategies can be added to a parser to avoid backtracking errors?
[Link] would you rewrite garden path sentences to make them clearer?

58
Implications in NLP ASET

• Parser must consider multiple interpretations.


• Disambiguation needs:
• Syntactic rules
• Semantic knowledge
• World knowledge (context matters!)
• Errors in parsing lead to misinterpretation, incorrect outputs, or
biases in systems like chatbots or machine translation.

59
Quiz ASET

• Exercise 1: Analyze the sentence “I saw the man with the


binoculars in the park.”

Possible Interpretations:
1.I used binoculars to see the man in the park.
➤ "with the binoculars" modifies the verb saw → instrument of seeing.
[Link] man I saw had the binoculars.
➤ "with the binoculars" modifies the noun man → describes the man.
[Link] binoculars are in the park.
➤ "in the park" modifies binoculars (less likely, but grammatically
possible).
4.I saw the man while I was in the park.
➤ "in the park" modifies the verb saw → location of action.
[Link] man was in the park, and I saw him.
➤ "in the park" modifies man → describes the man's location.
Ambiguity arises from Prepositional Phrase (“with the binoculars”, “in
the park”) and their attachment points (verb vs. noun).

60
Quiz ASET

• Exercise 2: Fix this ambiguity


““The cameraman shot the man with the gun when he was near
Tendulkar.”

Ambiguity Type Question Options


(a) Took a photo (camera)
Lexical What does "shot" mean?
(b) Fired a weapon (gun)
(a) The man
PP Attachment Who has the gun?
(b) The cameraman
(a) The man
Clause Reference Who was near Tendulkar?
(b) The cameraman

• Meaning 1:
• The cameraman took a photo of a man who had a gun, while the man was near
Tendulkar.
• Meaning 2:
• The cameraman took a photo of a man using his (the cameraman's) own (camera’s)
gun, while he (cameraman) was near Tendulkar.
• Meaning 3:
• The cameraman used a gun to shoot the man, and the man was near Tendulkar. 61
Structural Ambiguity ASET

Definition:
Occurs when a sentence can be parsed in more than one way due to its
structure.

Examples:
• I did not know my PDA(personal digital assistant) had a phone for 3 months.
• The cameraman shot the man with the gun when he was near Tendulkar.
• Jill had rubbed ointment on Mike the Irish Terrier, taken a look at the goldfish
belonging to the cook, which had caused anxiety in the kitchen by refusing its
ant’s egg.
• (Times of India) Aid for kins of cops killed in terrorist attacks.

This ambiguity arises because the phrase "for 3 months" can grammatically attach to either:
the verb "did not know" (modifying the knowing)
Or
the verb phrase "had a phone" (modifying the possession)
A parser must determine which attachment makes more sense based on context—something that’s often
unclear without world knowledge or pragmatics. 62
Major Sources of ASET

Sentence Ambiguity
•Multiple meanings of words
•Multiple attachment points of prepositional phrases
•Clause attachment points
Example - Multiple Example - Prepositional Example - Clause
Meanings of Words Phrase Attachment Attachment

Heading: Heading: Heading:


Lexical Ambiguity Attachment Ambiguity – Clause Attachment
Example Sentence: Prepositional Phrases Ambiguity
He saw the bat in the Example Sentence: Example Sentence:
cave. I saw the man with the She said he left
telescope. yesterday.
Explanation: Interpretations: Interpretations:
•"Bat" → Animal or 1.I used a telescope to see [Link] made the
sports equipment the man. statement yesterday.
•"Saw" → Perceived or [Link] man I saw was [Link] left yesterday.
used a tool holding a telescope. 63
Disambiguation in NLP: Role ASET

of Higher-Level Knowledge
Why Disambiguation is Hard in NLP?
•Sentences can have multiple valid parses due to:
•Prepositional phrase (PP) attachments
•Clause attachment ambiguity

•Syntactic and semantic structures alone may not resolve ambiguity.

•Context and background knowledge are crucial.


Example :
“I saw the boy with a ponytail.”
Ambiguity Source: Prepositional Phrase Attachment
•Meaning 1: The boy has a ponytail.
•Meaning 2: The seeing was done using a ponytail (illogical).
A machine parser won’t know which meaning is correct unless it
understands semantics. 64
Need for Higher-Level ASET

Knowledge
•Semantics – Word meaning and logic

•Pragmatics – Real-world plausibility

•Discourse – Use of surrounding sentences to build context

65
Semantics for ASET

Disambiguation
Sentence:
“I saw the boy with a ponytail.”

Parser’s output:
•Option A (correct): “The boy has a ponytail.”
•Option B (illogical): “I used a ponytail to see the boy.”

Why is Option B incorrect?

•World knowledge: A ponytail is not an instrument for seeing.


•Therefore, option b is excluded by semantics.

Semantics resolves structural ambiguity when lexical meanings contradict


certain interpretations.

66
Pragmatics for ASET

Disambiguation
Sentence:
“Old men and women were taken to safe locations.”

Two interpretations:
1.(Old men) and (women) → only men are old.
2.(Old men and women) → both men and women are old.
Real world reasoning:
Younger individuals may be expected to assist in active roles, such as helping with
evacuation or response efforts.
Older individuals, are typically prioritized for relocation to safe locations due to
greater vulnerability.
Therefore, it's more plausible that the word "old" qualifies only the group that is
less likely to assist actively.

Semantics also accepts both (both men and women can be old).
But pragmatics prefers the interpretation where only men are old.
Pragmatics uses contextual expectations and real-world knowledge to resolve 67
ambiguity.
Discourse-Level Ambiguity ASET

Sentence:
“No smoking areas allow hookahs inside.”
Two readings:
[Link] A:
•Areas labeled “No smoking” do allow hookahs (but not cigars).
[Link] B:
•No area allows hookahs inside at all (strict ban).
This ambiguity is resolved by adding context (discourse).
follow-up sentences to clarify:
Sentence A:
“No smoking areas allow hookahs inside, except the one in Hotel Grand.”
•This implies that no area allows hookahs, except one exception.
•Supports Interpretation B: Hookahs are banned, except at Hotel Grand.

68
Q&A ASET

• Explain the steps of text processing with an example


• What is tokenization in text processing?
• Differentiate between stemming and lemmatization.
• What are stop words? Give two examples.
• What is morphological and lexical analysis in NLP? Mention some challenges in each
• Explain the role of part-of-speech tagging in lexical analysis with an example
• What is the purpose of parsing?
• For the sample sentence: "The cat sat on the mat." Show how parsing is done in the form of a)
hierarchical tree of phrases b) bracketed form
• Show the dependency list and complete parse tree for the example sentence “"The quick brown
fox jumps over the lazy dog."
• What are the important aspects of named entity tagging (NET)? Show with an example sentence
how NET would be used in a translation task? What are the main challenges?

69
NLP Project Assignments ASET
1. POS Tagging with Rule-Based vs NLTK Models
Implement a basic rule-based part-of-speech (POS) tagger using simple grammar rules
for English, and compare its performance against NLTK’s statistical POS tagger. Write a
short analysis on how both systems handle ambiguous words like “book” or “can.”

1. Extracting Noun Phrases Using Grammar Rules and spaCy


Write a set of grammar rules to extract English noun phrases (NPs) like “the tall boy,”
and compare the results with noun phrases extracted using spaCy. Evaluate and
discuss the strengths and limitations of both approaches.

1. Named Entity Recognition with Simple Rules


Build a basic rule-based Named Entity Recognition (NER) tool that detects entities like
names, dates, and places from raw text (e.g., news headlines) using pattern matching
(e.g., capital letters, date formats).

1. Hindi Verb Suffix Analyzer


Create a Python program to identify suffixes in Hindi verbs that indicate tense and
aspect (e.g., “aya,” “ti,” “te”). Use a small sample of Hindi sentences and annotate each
verb with the correct tense label. 70
ASET

Thank You…!!

71

You might also like