0% found this document useful (0 votes)

25 views17 pages

NLP - Notes

Uploaded by

Tejas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views17 pages

NLP - Notes

Uploaded by

Tejas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

NLP Notes

MOD 1

Q. What is NLP? Applications of NLP?

Natural Language Processing (NLP) is a branch of Artificial Intelligence that

enables computers to understand, interpret, and generate human language in a
meaningful way, both in text and speech form.

The main purpose of NLP is to read and understand the human language and
deliver the output accordingly.

Applications :

1.Machine Translation – NLP enables automatic conversion of text or speech

from one language to another while maintaining the meaning and grammar. This
is useful in communication between people who speak different languages.
Example: Google Translate converting English sentences into Hindi, Marathi,
or any other language.

2.Sentiment Analysis – NLP techniques help identify the sentiment or opinion

expressed in text, such as positive, negative, or neutral. This is widely used in
businesses to understand customer satisfaction from reviews, social media
posts, or surveys.
Example: Analyzing Twitter posts to know public opinion about a new movie.

3.Chatbots & Virtual Assistants – NLP powers AI-based chat systems to

understand user queries and respond intelligently in human language. These
assistants can answer questions, perform tasks, and provide customer support
without human intervention.
Example: Amazon Alexa, Apple Siri, and WhatsApp business chatbots.

4.Text Summarization – It automatically produces a shorter and more concise

version of long documents while keeping the important information. This helps
in quickly reading large content.
Example: Summarizing a 10-page research paper into key bullet points for
quick review.
5.Speech Recognition – NLP with speech processing allows machines to
convert spoken words into written text. This is used in voice-controlled devices,
transcription services, and accessibility tools for differently-abled users.
Example: Voice typing in Google Docs or dictating messages on smartphones.

6.Information Retrieval – NLP helps search engines and databases to

understand the meaning of keywords and find the most relevant results. This
improves search accuracy and saves time for the user.
Example: Google search returning relevant articles when typing “best tourist
places in India.”

7.Spam Detection – NLP filters out unwanted or harmful messages by

analyzing text patterns, keywords, and structure. It protects users from phishing
attempts, scams, and irrelevant messages.
Example: Gmail automatically moving fraudulent emails to the spam folder.

Q. What is Ambiguity? Types?

Ambiguity occurs when a sentence or word has more than one possible
meaning, making it unclear which interpretation is correct without additional
context.

Types of Ambiguity in NLP :

1. Lexical Ambiguity
This occurs when a single word has more than one possible meaning. The
correct meaning can only be understood from context.
Example: “He went to the bank.” – Here, bank could mean a financial
institution or the side of a river.
NLP systems use techniques like Word Sense Disambiguation (WSD)
to resolve such cases.

2. Syntactic Ambiguity
This happens when the structure or grammar of a sentence allows more
than one interpretation. It is also called structural ambiguity.
Example: “I saw the man with the telescope.” – It’s unclear whether I had
the telescope or the man did.
Parsers and grammar rules are used in NLP to handle this type.
3. Semantic Ambiguity
This occurs when the meaning of a sentence is unclear, even though its
grammar is correct.
Example: “Visiting relatives can be boring.” – This could mean relatives
who visit you are boring, or that visiting them is boring.
Context understanding and meaning representation are required to solve
it.

4. Anaphoric Ambiguity
This happens when a pronoun or a reference word can refer to more than
one noun in the sentence.
Example: “John told Peter that he passed.” – It’s not clear whether John
or Peter passed.
NLP uses coreference resolution to find the correct reference.

5. Pragmatic Ambiguity
This occurs when the intended meaning depends on the situation,
background knowledge, or speaker’s intention.
Example: “Can you open the door?” – Literally asks about ability, but
usually it’s a polite request.
Understanding this requires context and real-world knowledge.
2ND MOD

Q. NGram Numerical & Theory

Definition

An N-gram is a sequence of N words that appear together in a sentence or text.

For example, a 2-gram (bigram) looks at pairs of words, and a 3-gram (trigram)
looks at three words in a row. It is used to understand patterns in language.

Working

The model counts how often word sequences occur in a given text or corpus.
Using these counts, it calculates the probability of the next word given the
previous words (like P(next word | previous word(s))). This helps in predicting
or generating the next likely word. Used in Text prediction (like mobile
keyboard suggestions) , Speech recognition , Machine translation , Spelling
correction.

Example

Corpus: "The dog runs fast"

• Unigrams (N=1): [The], [dog], [runs], [fast]

• Bigrams (N=2): [The dog], [dog runs], [runs fast]

• Trigrams (N=3): [The dog runs], [dog runs fast]

If we know "The dog", the model predicts the next word is "runs" with the
highest probability.
Numerical :
Porter Stemmer

• Definition (3–4 lines):

The Porter Stemmer is a rule-based algorithm in NLP used to reduce
words to their root form by stripping common suffixes. It uses linguistic
rules, based on consonant (C) and vowel (V) sequences, to decide how
much of a word can be safely removed.

• Working (3–4 lines):

The algorithm defines words as sequences of consonants (C) and vowels
(V). For example, tree → (C V), trouble → (C V C V C). It then applies
multiple steps of rules like removing “ing”, “ed”, “ly”, etc., but only
when a certain C–V pattern condition is satisfied. This ensures
meaningful stems are produced.

• Concept of Consonants & Vowels (3–4 lines):

o Vowels: a, e, i, o, u (and sometimes 'y' depending on position).

o Consonants: All other letters.

o The algorithm checks word structure using patterns of vowels and
consonants (e.g., m = number of VC sequences), and rules are
applied only if m is large enough (to avoid over-stemming).

• Example:

o Caresses → caress (rule: sses → ss)

o Ponies → poni (rule: ies → i)

o Troubling → troubl (rule: ing → if word has VC before)

Algorithm of Porter Stemmer

Step 1: Remove common plural and past participle forms

• If the word ends with sses → replace with ss

• If the word ends with ies → replace with i

• If the word ends with ss → keep as ss

• If the word ends with s → remove s

Example:

• caresses → caress

• ponies → poni

• cats → cat

Step 2: Remove suffixes like -ed, -ing

• If the word ends with ed or ing and the stem contains a vowel → remove
ed/ing

• If after removal, the word ends with at → add e (e.g., hopping → hope)

• If ends with double consonant (like tt, ss, ll) → remove last consonant

• If word is short (CVC form, e.g., hop) → add e

Example:

• hopping → hop

• hoped → hope

• tanned → tan

Step 3: Replace suffixes (-ational, -izer, -fulness, etc.)

• ization → ize

• ational → ate

• fulness → ful

• ousness → ous

Example:

• rationalization → rationalize

• hopefulness → hopeful

Step 4: Remove suffixes like -ic, -able, -ant

• icate → ic

• ative → (remove)

• alize → al

Example:

• communicate → communic

• relational → relat
Step 5: Remove final suffixes

• If word ends in e → remove it if the word is long enough

• If double consonant at end → remove one consonant

Example:

• probate → probat

• rate → rate (unchanged, because short word)

Final Result: The word is reduced to its stem/root.

Q. Difference between Stemming and Lemmatization.

Aspect Stemming Lemmatization

Process of chopping off Process of reducing a word

1. Definition prefixes/suffixes to reduce a to its base/dictionary form
word to its stem. (lemma).

Rule-based, works by cutting Dictionary + morphological

2. Approach
affixes. analysis.

Always returns a
3. Output Stemmed word may not have
meaningful word (e.g.,
Meaning a real meaning (e.g., comput).
compute).

Considers POS and

4. Grammar Ignores Part-of-Speech (POS)
grammar for accurate
Awareness and grammar.
results.

More accurate,
5. Accuracy Less accurate, more crude.
linguistically correct.

Slower due to dictionary

6. Speed Faster because it’s simple.
lookups and analysis.

7. Example Studies → Studi Studies → Study

Useful when speed > Useful where correctness

8. Use Cases accuracy (e.g., search matters (e.g., chatbots, NLP
engines). tasks).

9. Resource Requires linguistic

Requires minimal resources.
Requirement resources like WordNet.

Porter Stemmer, Snowball

WordNet Lemmatizer,
10. Algorithm Stemmer, Lancaster
spaCy Lemmatizer.
Stemmer.
Q. What is Edit Distance Algorithm?

It is a way to measure how different two words (or strings) are by counting the
minimum number of edits needed to transform one word into the other.

The allowed edits are:

1. Insertion → Add a character.

o Example: cat → cart (insert "r").

2. Deletion → Remove a character.

o Example: cart → cat (delete "r").

3. Substitution → Replace one character with another.

o Example: cat → cut (substitute "a" → "u").

Example : "kitten" → "sitting"

• kitten → sitten (substitute "k" → "s")

• sitten → sittin (substitute "e" → "i")

• sittin → sitting (insert "g")

Minimum edits = 3

Working with Autocorrect:

In autocorrect systems (like in your phone or search engines), when you

misspell a word, the system calculates the edit distance between your input and
valid dictionary words.

• The word(s) with the lowest edit distance are suggested as corrections.

• Example: You type “recieve” → Autocorrect checks the dictionary.

o recieve → receive (edit distance = 2: swap "i" and "e")

o Since the distance is small, it suggests “Did you mean: receive?”

Applications :

1. Spell Checking → Fixing typos by finding closest words.

2. Search Engines → "Did you mean …?" suggestions.

3. Text Prediction → Suggesting likely next words.

4. Natural Language Processing (NLP) → Matching noisy input with

correct forms.

Q. What are Collocations?Significance?

Definition:
Collocations are pairs or groups of words that frequently appear together in
natural language, more often than would be expected by chance. They sound
“natural” to native speakers.

Examples

• "fast food" (common) vs "quick food" (rarely used)

• "make a decision" vs "do a decision"

• "strong tea" vs "powerful tea"

Types of Collocations

1. Adjective + Noun → strong tea, heavy rain

2. Verb + Noun → make a decision, commit a crime

3. Verb + Adverb → whisper softly, argue strongly

4. Noun + Noun → data mining, credit card

5. Adverb + Adjective → deeply concerned, highly recommended

Significance of Collocations in NLP

1. Improves Naturalness in Language Generation

o Collocations make machine-generated text sound more natural and

fluent.

o Example: "Strong tea" sounds natural, but "powerful tea" does not.

2. Enhances Machine Translation

o Correct collocations help in choosing context-appropriate

translations.

o Example: "Heavy rain" in English should not be translated word-

for-word as "strong rain."

3. Better Information Retrieval (IR)

o Search engines can use collocations to improve relevance.

o Example: Searching "machine learning algorithm" gives better

results than searching each word separately.

4. Context Understanding in NLP Models

o Collocations provide semantic clues about meaning.

o Example: "Fast food" has a different meaning than just "fast" +

"food."

5. Speech Recognition & Text Prediction

o Predictive text keyboards rely on collocations to suggest the next

word.

o Example: After typing "Happy," it suggests "birthday" instead of

"elephant."
3RD MOD

Q. Discuss HMM (Hidden Markov Model ) with an example.

Hidden Markov Model (HMM)

1. Definition

A Hidden Markov Model (HMM) is a statistical model used to represent

systems that have hidden (unobservable) states but produce observable outputs.
It assumes that the system follows a Markov process, where the next state
depends only on the current state, not on past history.
The "hidden" part means that we cannot directly see the states, but we can
estimate them from the observed data.

Components of Hidden Markov Model (HMM)

An HMM consists of the following main components:

1. States (Hidden States)

• These represent the underlying system conditions that are not directly
observable.

• Example: In speech recognition, the hidden states may represent

phonemes (sounds); in weather prediction, they may represent sunny,
rainy, cloudy.

• At any point in time, the system is in one of these states.

2. Observations

• These are the visible outputs that we can measure or record.

• Each observation is generated from a corresponding hidden state.

• Example: In speech recognition, the sound waves or acoustic signals we

capture are the observations.

• Observations provide indirect evidence about which hidden state the

system is in.
3. Transition Probabilities (A matrix)

• These are the probabilities of moving from one hidden state to another.

• Represented as a matrix A, where each entry a_ij = probability of moving

from state i to state j.

• Example: If yesterday was "Rainy," the probability of today being

"Sunny" might be 30%, and "Rainy" again might be 70%.

• This captures the temporal dependency (sequence nature) of the states.

4. Emission Probabilities (B matrix)

• These represent the probability of an observation being generated from a

particular hidden state.

• Each state produces observations with certain likelihoods.

• Example: If the hidden state is "Rainy," the observation could be "people

carrying umbrellas" with probability 0.8.

• This models the relationship between hidden states and observed

outputs.

5. Initial State Distribution (π vector)

• This defines the probabilities of the system starting in each possible

hidden state.

• Example: At the beginning of the week, the probability of it starting as

"Sunny" may be 0.6, "Rainy" 0.3, and "Cloudy" 0.1.

• Important for initializing the model before observations begin.

Q. Explain Context Free Grammar(CFG) in detail.

Definition :

A Context Free Grammar (CFG) is a formal grammar used in computational

linguistics and computer science to describe the syntax of programming
languages and natural languages. It generates strings (sentences) from a
language by applying production rules. In CFG, every production rule replaces
a single non-terminal symbol with a sequence of non-terminals and/or
terminals.

Components of CFG

A CFG is formally represented as a 4-tuple G = (V, Σ, R, S) where:

1. V (Variables / Non-Terminals):

o These are symbols that can be replaced using production rules.

o They act as placeholders for patterns in the language.

o Example: S, NP, VP where S = Sentence, NP = Noun Phrase, VP =

Verb Phrase.

2. Σ (Sigma - Terminals):

o These are the actual alphabet of the language (words or tokens) that
appear in the final sentences.

o They cannot be replaced further.

o Example: dog, eats, food.

3. R (Production Rules):

o A set of transformation rules of the form A → α,

where A is a non-terminal and α is a string consisting of terminals
and/or non-terminals.

o These rules define how sentences can be constructed step by step.

o Example: S → NP VP.
4. S (Start Symbol):

o A special non-terminal from where the derivation begins.

o It represents the entire sentence or structure.

o Example: S.

Working / Example

Let’s define a simple CFG for a basic English-like grammar:

• Variables (V): {S, NP, VP}

• Terminals (Σ): {dog, cat, runs, eats}

• Rules (R):

1. S → NP VP

2. NP → dog | cat

3. VP → runs | eats

• Start symbol (S): S

Derivation Example:

• Start: S

• Apply Rule 1: S → NP VP

• Replace NP: dog VP

• Replace VP: dog runs

Final sentence: dog runs

NLP Ia1
No ratings yet
NLP Ia1
7 pages
Text Analytics and Natural Language Processing - KAI073
No ratings yet
Text Analytics and Natural Language Processing - KAI073
24 pages
NLP Sem Imp
No ratings yet
NLP Sem Imp
46 pages
NLP Applications in Healthcare
No ratings yet
NLP Applications in Healthcare
71 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
54 pages
Grapheme:: Morpheme
No ratings yet
Grapheme:: Morpheme
20 pages
Introduction To Natural Language Processing: Unit 1
No ratings yet
Introduction To Natural Language Processing: Unit 1
60 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Natural Language Processing Questions
No ratings yet
Natural Language Processing Questions
5 pages
NLP Questions
No ratings yet
NLP Questions
26 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
45 pages
NLP Presentation1
No ratings yet
NLP Presentation1
25 pages
Chap 2
No ratings yet
Chap 2
70 pages
History of NLP
No ratings yet
History of NLP
7 pages
NLP m2
No ratings yet
NLP m2
71 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Solution NLP UT1
No ratings yet
Solution NLP UT1
7 pages
NLP Final
No ratings yet
NLP Final
27 pages
ML QBF
No ratings yet
ML QBF
13 pages
What Is NLP?: Components of An FSA
No ratings yet
What Is NLP?: Components of An FSA
16 pages
CAT King Study Material 5
No ratings yet
CAT King Study Material 5
21 pages
NLP
No ratings yet
NLP
21 pages
SemVII NaturalLanguageProcessing
No ratings yet
SemVII NaturalLanguageProcessing
32 pages
Week 12 Topic 8 NLP
No ratings yet
Week 12 Topic 8 NLP
31 pages
NLP 2
No ratings yet
NLP 2
13 pages
NLP IA1 Question Bank: Concept
No ratings yet
NLP IA1 Question Bank: Concept
10 pages
Module 2 Complete
No ratings yet
Module 2 Complete
134 pages
NLP Module 1
No ratings yet
NLP Module 1
12 pages
Ai NLP
No ratings yet
Ai NLP
9 pages
Adnan Amin
No ratings yet
Adnan Amin
19 pages
NLP - Shortnotes Unit 1 & 2
100% (1)
NLP - Shortnotes Unit 1 & 2
16 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
43 pages
Unit 6 - AI (NLP)
No ratings yet
Unit 6 - AI (NLP)
37 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
23 pages
NLP Notes
No ratings yet
NLP Notes
56 pages
Introduction to NLP: Key Concepts
No ratings yet
Introduction to NLP: Key Concepts
36 pages
Introduction to NLP and NLTK Basics
No ratings yet
Introduction to NLP and NLTK Basics
23 pages
PDF NLP
No ratings yet
PDF NLP
7 pages
NLP Study Notes
No ratings yet
NLP Study Notes
13 pages
Introduction To NLPAbebe Zerihun
No ratings yet
Introduction To NLPAbebe Zerihun
45 pages
Unit1 (Part1)
No ratings yet
Unit1 (Part1)
49 pages
Morphological Analysis in NLP
No ratings yet
Morphological Analysis in NLP
30 pages
NLP Pyq Solutions
No ratings yet
NLP Pyq Solutions
59 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
NLP and Python Course Overview
No ratings yet
NLP and Python Course Overview
121 pages
UNIT-1 Notes
No ratings yet
UNIT-1 Notes
19 pages
NLP Unit1
No ratings yet
NLP Unit1
24 pages
Unit 1-NLP
No ratings yet
Unit 1-NLP
62 pages
Unit Ii NLP Notes Final
No ratings yet
Unit Ii NLP Notes Final
6 pages
Module 1.1
No ratings yet
Module 1.1
9 pages
Poeter Stemmer Algorithm
No ratings yet
Poeter Stemmer Algorithm
57 pages
Important Questions-Answers Text Analytics and Natural Language Processing (KAI073)
No ratings yet
Important Questions-Answers Text Analytics and Natural Language Processing (KAI073)
37 pages
1.1 NLP Introduction
No ratings yet
1.1 NLP Introduction
64 pages
Lecture 1
No ratings yet
Lecture 1
33 pages
NLP Unit-I
No ratings yet
NLP Unit-I
7 pages
Introduction To NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
No ratings yet
Introduction To NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
35 pages
JD - Software Developer (Quick Progress Track)
No ratings yet
JD - Software Developer (Quick Progress Track)
1 page
VAPT Exp11 Minor
No ratings yet
VAPT Exp11 Minor
3 pages
SmartMeet Review Finalllly
100% (1)
SmartMeet Review Finalllly
36 pages
October 2025
No ratings yet
October 2025
1 page
Week3be A
No ratings yet
Week3be A
1 page
Atharva Belkar - Adt23socb0257-Asssignment 1
No ratings yet
Atharva Belkar - Adt23socb0257-Asssignment 1
7 pages
Tejas Java Resume
No ratings yet
Tejas Java Resume
1 page
MIS Notes
No ratings yet
MIS Notes
9 pages
SET I-Question Paper - UT I
No ratings yet
SET I-Question Paper - UT I
1 page
VR Notes
No ratings yet
VR Notes
15 pages
Quantifiers
83% (6)
Quantifiers
3 pages
Icaew Cfab BTF 2018 Sample Exam
100% (1)
Icaew Cfab BTF 2018 Sample Exam
24 pages
Elwell Macrosociology
100% (1)
Elwell Macrosociology
98 pages
Human Biology Concepts and Current Issues 7th Edition by Johnson A
No ratings yet
Human Biology Concepts and Current Issues 7th Edition by Johnson A
311 pages
SAP MM Consultant Profile
No ratings yet
SAP MM Consultant Profile
4 pages
Joong (2021) Perception T and S Mexico Secondary Reform
No ratings yet
Joong (2021) Perception T and S Mexico Secondary Reform
33 pages
Cost Concepts and Classification Exercises
No ratings yet
Cost Concepts and Classification Exercises
11 pages
Customer Loyalty Programs: Are They Fair To Consumers?
No ratings yet
Customer Loyalty Programs: Are They Fair To Consumers?
7 pages
Indian Office Life Satire
No ratings yet
Indian Office Life Satire
17 pages
What Is Article Writing Format?
No ratings yet
What Is Article Writing Format?
8 pages
Ch06 - The 10 Commandment
No ratings yet
Ch06 - The 10 Commandment
2 pages
CB - XI - English - A Photograph - General AS
No ratings yet
CB - XI - English - A Photograph - General AS
3 pages
Material Safety Data Sheet: All Weather Hydraulic Oil MV 32
No ratings yet
Material Safety Data Sheet: All Weather Hydraulic Oil MV 32
4 pages
The Cry of Balintawak
No ratings yet
The Cry of Balintawak
2 pages
The Hunger Games Characters: Name
No ratings yet
The Hunger Games Characters: Name
4 pages
CT1 PDF
No ratings yet
CT1 PDF
6 pages
1948 War's Impact on Palestine's Landscape
No ratings yet
1948 War's Impact on Palestine's Landscape
31 pages
GDM Nice Guidelines
No ratings yet
GDM Nice Guidelines
56 pages
Total Leukocyte Count by Hemocytometer
No ratings yet
Total Leukocyte Count by Hemocytometer
4 pages
Testimonies For Covenant Day of Vengeance Service-1
No ratings yet
Testimonies For Covenant Day of Vengeance Service-1
3 pages
Leadership To Overcome Resistance To Change
No ratings yet
Leadership To Overcome Resistance To Change
22 pages
Consumer Preferences and Behavior
No ratings yet
Consumer Preferences and Behavior
18 pages
Practice Test 6 (2025 Format)
No ratings yet
Practice Test 6 (2025 Format)
5 pages
MTC Teacher Job Opportunity in Kinshasa
No ratings yet
MTC Teacher Job Opportunity in Kinshasa
1 page
Work EM
No ratings yet
Work EM
30 pages
Kisho Kurokawa - The Philosophy of Symbiosis
75% (4)
Kisho Kurokawa - The Philosophy of Symbiosis
262 pages
Boyd, D. & Ellison, N. (2007) - Social Network Sites
No ratings yet
Boyd, D. & Ellison, N. (2007) - Social Network Sites
20 pages
Jana's Resume
100% (1)
Jana's Resume
5 pages
BW Spooky Haunted House Escape Room
No ratings yet
BW Spooky Haunted House Escape Room
15 pages
Adaptive Project Planning by Louise Worsley, Christopher Worsley PDF
100% (1)
Adaptive Project Planning by Louise Worsley, Christopher Worsley PDF
187 pages