0% found this document useful (0 votes)

1 views40 pages

Module2 Lecture3 POS

Part-of-speech (POS) tagging involves classifying words into their respective categories, such as nouns, verbs, and adjectives, based on their morphological and distributional properties. Various tag sets exist, with the choice depending on the application, and POS tagging is crucial for tasks like parsing and sentiment analysis. The accuracy of POS tagging in English is around 97%, with common algorithms including Hidden Markov Models and Conditional Random Fields.

Uploaded by

Sachin Kumar N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views40 pages

Module2 Lecture3 POS

Uploaded by

Sachin Kumar N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Part-of-Speech Tagging

Part of Speech

 Each word belongs to a word class. The word class of a word is known
as part-of-speech (POS) of that word.
 Most POS tags implicitly encode fine-grained specializations of eight basic
parts of speech:
 noun, verb, pronoun, preposition, adjective, adverb,
conjunction, article
 These categories are based on morphological and distributional
similarities (not semantic similarities).
 Part of speech is also known as:
 word classes
 morphological classes
 lexical tags

2
Part of Speech (cont.)

 A POS tag of a word describes the major and minor word

classes of that word.
 A POS tag of a word gives a significant amount of information about
that word and its neighbours. For example, a possessive pronoun (my,
your, her, its) most likely will be followed by a noun, and a personal
pronoun (I, you, he, she) most likely will
be followed by a verb.
 Most of words have a single POS tag, but some of them have more
than one (2,3,4,…)
 For example, book/noun or book/verb
 I bought a book.
 Please book that flight.
3
Tag Sets

 There are various tag sets to choose.

 The choice of the tag set depends on the nature of the application.
 We may use small tag set (more general tags) or
 large tag set (finer tags).
 Some of widely used part-of-speech tag sets:
 Penn Treebank has 45 tags
 Brown Corpus has 87 tags
 C7 tag set has 146 tags
 In a tagged corpus, each word is associated with a tag from
the used tag set.
4
English Word Classes

 Part-of-speech can be divided into two broad categories:

 closed class types -- such as prepositions
 open class types -- such as noun, verb
 Closed class words are generally also function words.
 Function words play important role in grammar
 Some function words are: of, it, and, you
 Functions words are most of time very short and frequently occur.
 There are four major open classes.
 noun, verb, adjective, adverb
 a new word may easily enter into an open class.
 Word classes may change depending on the natural language, but all natural
languages have at least two word classes: noun and verb.
5
Nouns

 Nouns can be divided as:

 proper nouns -- names for specific entities such as Ankara, John,
Ali
 common nouns
 Proper nouns do not take an article but common nouns may take.
 Common nouns can be divided as:
 count nouns -- they can be singular or plural -- chair/chairs
 mass nouns -- they are used when something is conceptualized
as a homogenous group -- snow, salt
 Mass nouns cannot take articles a and an, and they can not be plural.
6
Verbs

 Verb class includes the words referring actions and processes.

 Verbs can be divided as:
 main verbs -- open class -- draw, bake
 auxiliary verbs -- closed class -- can, should
 Auxiliary verbs can be divided as:
 copula -- be, have
 modal verbs -- may, can, must, should
 Verbs have different morphological forms:
 non-3rd-person-sg eat
 3rd-person-sg - eats
 progressive -- eating
 past -- ate
 past participle -- eaten

7
Adjectives

 Adjectives describe properties or qualities

 for color -- black, white
 for age -- young, old
 In Turkish, all adjectives can also be used as noun.
 kırmızı kitap red book
 kırmızıyı the red one (ACC)

8
Adverbs

 Adverbs normally modify verbs.

 Adverb categories:
 locative adverbs -- home, here, downhill
 degree adverbs -- very, extremely
 manner adverbs -- slowly, delicately
 temporal adverbs -- yesterday, Friday
 Because of the heterogeneous nature of adverbs, some
adverbs such as Friday may be tagged as nouns.
9
Major Closed Classes

 Prepositions -- on, under, over, near, at, from, to, with

 Determiners -- a, an, the
 Pronouns -- I, you, he, she, who, others
 Conjunctions -- and, but, if, when
 Participles -- up, down, on, off, in, out
 Numerals -- one, two, first, second

10
Prepositions

 Occur before noun phrases

 indicate spatial or temporal relations
 Example:
 on the table
 under chair
 They occur so often. For example, some of the frequency
counts in a 16 million word corpora (COBUILD).
 of 540,085
 in 331,235
 for 142,421
 to 125,691
 with 124,965
 on 109,129
 at 100,169

11
Particles

 A particle combines with a verb to form a larger unit

called phrasal verb.
 go on
 turn on
 turn off
 shut down

12
Articles

 A small closed class

 Only three words in the class: a an the
 Marks definite or indefinite
 They occur so often. For example, some of the frequency
counts in a 16 million word corpora (COBUILD).
 the 1,071,676
 a 413,887
 an 59,359
 Almost 10% of words are articles in this corpus.
13
Conjunctions

 Conjunctions are used to combine or join two phrases, clauses

or sentences.
 Coordinating conjunctions -- and or but
 join two elements of equal status
 Example: you and me
 Subordinating conjunctions -- that who
 combines main clause with subordinate clause
 Example:
I thought that you might like milk
14
Part of Speech Tagging

 Part of speech tagging is simply assigning the correct part

of speech for each in an input sentence
 We assume that we have the following:
A set of tags (our tag set)
A dictionary that tells us the possible tags for each
word (including all morphological variants).
A text to be tagged.

15
Pronouns

 Shorthand for referring to some entity or event.

 Pronouns can be divided:
 personal you she I
 possessive my your his
 wh-pronouns who what -- who is the president?

16
Part-of-Speech Tagging
Map from sequence x1,…,xn of words to y1,…,yn of POS tags
Nivre et al. 2016
"Universal Dependencies" Tagset
Why Part of Speech Tagging?

 Can be useful for other NLP tasks

 Parsing: POS tagging can improve syntactic parsing
 MT: reordering of adjectives and nouns (say from Spanish to English)
 Sentiment or affective tasks: may want to distinguish adjectives or other POS
 Text-to-speech (how do we pronounce “lead” or "object"?)

 Or linguistic or language-analytic computational tasks

 Need to control for POS when studying linguistic change like creation of new words, or meaning shift
 Or control for POS in measuring meaning similarity or difference
How difficult is POS tagging in English?

 Roughly 15% of word types are ambiguous

• Hence 85% of word types are unambiguous
• Janet is always PROPN, hesitantly is always ADV
 But those 15% tend to be very common.
 So ~60% of word tokens are ambiguous
 E.g., back
earnings growth took a back/ADJ seat
a small building in the back/NOUN
a clear majority of senators back/VERB the bill
enable the country to buy back/PART debt
I was twenty-one back/ADV then
POS tagging performance in English
 How many tags are correct? (Tag accuracy)
 About 97%
 Hasn't changed in the last 10+ years
 HMMs, CRFs, BERT perform similarly .
 Human accuracy about the same

 But baseline is 92%!

 Baseline is performance of stupidest possible method
 "Most frequent class baseline" is an important baseline for many tasks
 Tag every word with its most frequent tag

 (and tag unknown words as nouns)

 Partly easy because

 Many words are unambiguous
Sources of information for POS tagging
Janet will back the bill
AUX/NOUN/VERB? NOUN/VERB?
 Prior probabilities of word/tag
• "will" is usually an AUX
 Identity of neighboring words
• "the" means the next word is probably not a verb
 Morphology and wordshape:
 Prefixes unable: un-  ADJ
 Suffixes importantly: -ly  ADJ
 Capitalization Janet: CAP  PROPN
Standard algorithms for POS tagging

 Supervised Machine Learning Algorithms:

• Hidden Markov Models
• Conditional Random Fields (CRF)/ Maximum Entropy
Markov Models (MEMM)
• Neural sequence models (RNNs or Transformers)
• Large Language Models (like BERT), finetuned
 All required a hand-labeled training set, all about equal
performance (97% on English)
 All make use of information sources we discussed
• Via human created features: HMMs and CRFs
• Via representation learning: Neural LMs
Recalling Rule Based Tagging -
ENGTWOL

• Two – level Morphology

• Lexical and Disambiguation Rules
• 56000 English Word Stems
ENGTWOL LEXICON
First Stage: Two Level Transducer => POS
Second Stage
• 1,100 constraints are applied to remove incorrect POS
Stochastic Tagger – HMM
• Intuition : Pick the most likely tag
• For a given word, HMM tagger picks a tag that maximises:

• HMM taggers choose a tag sequence for a whole sentence rather a

single word
Bigram HMM
tagger
• Tagger chooses a tag 𝑡𝑖 for word 𝑤𝑖 that is most probable given the
previous tag 𝑡𝑖−1 and the current word 𝑤𝑖

• This can be simplified to

Exampl
e
• Consider the sentences (Brown Corpus):
Working
:
• Consider the we only have these (2) sub-sequences:

• Lets recall the HMM tagger equation:

• We need to determine, which of these two has highest likelihood?

•
Using FC from Brown and
Switchboard Corpora

• Now, we look at second part of Tagger eqn:

• The probability counterintuitively answers the question: “If we were

expecting a verb, how likely is it that this verb would be race?”
Final
Answer:
Generative Tagging
Models
Generative Tagging
Models

Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
Chapter Two Natural Language Processing
No ratings yet
Chapter Two Natural Language Processing
141 pages
NLP Unit III Notes
No ratings yet
NLP Unit III Notes
30 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
Print Lect6 Pos
No ratings yet
Print Lect6 Pos
11 pages
Comprehensive Guide to POS Tagging
No ratings yet
Comprehensive Guide to POS Tagging
48 pages
Parts of Speech Tagging
No ratings yet
Parts of Speech Tagging
62 pages
Lec04 2 PartOfSpeechTagging
No ratings yet
Lec04 2 PartOfSpeechTagging
56 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
No ratings yet
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
69 pages
Understanding Part-Of-Speech Tagging
No ratings yet
Understanding Part-Of-Speech Tagging
53 pages
NLP Lecture 9 and 10 Week 5
No ratings yet
NLP Lecture 9 and 10 Week 5
10 pages
4 Pos
No ratings yet
4 Pos
62 pages
Lecture 5 Part of Speech Tagging
No ratings yet
Lecture 5 Part of Speech Tagging
39 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
Module 2 HMMPPT
No ratings yet
Module 2 HMMPPT
31 pages
3 Natural Language Processing-PoS Tagging
No ratings yet
3 Natural Language Processing-PoS Tagging
14 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
NLP 3
No ratings yet
NLP 3
25 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Lecture6 2022
No ratings yet
Lecture6 2022
101 pages
POS Tagging: Techniques and Challenges
No ratings yet
POS Tagging: Techniques and Challenges
75 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
Part-of-Speech Tagging Techniques
No ratings yet
Part-of-Speech Tagging Techniques
83 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
36 pages
Speech Recognition Systems Guide
No ratings yet
Speech Recognition Systems Guide
13 pages
3.1 Chap NLP Pos - Tagging - Lecture3
No ratings yet
3.1 Chap NLP Pos - Tagging - Lecture3
38 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
Language Structure
No ratings yet
Language Structure
10 pages
Understanding POS Tagging and Classes
No ratings yet
Understanding POS Tagging and Classes
14 pages
Tagging and Its Types
No ratings yet
Tagging and Its Types
3 pages
Part of Speech Tagging
No ratings yet
Part of Speech Tagging
13 pages
SPR 07 Nltk2
No ratings yet
SPR 07 Nltk2
30 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
NLP Unit-Ii - Mma
No ratings yet
NLP Unit-Ii - Mma
19 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
Lecture7 Pos Tagging
No ratings yet
Lecture7 Pos Tagging
33 pages
Module-2 NLP
No ratings yet
Module-2 NLP
50 pages
Speech and Language Processing: SLP Chapter 5
No ratings yet
Speech and Language Processing: SLP Chapter 5
56 pages
Intro to Syntactic Processing
No ratings yet
Intro to Syntactic Processing
56 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
NLP Notes Unit2 & Unit3
No ratings yet
NLP Notes Unit2 & Unit3
22 pages
3 POS Viet
No ratings yet
3 POS Viet
81 pages
PoS Tagging and HMM in NLP
No ratings yet
PoS Tagging and HMM in NLP
50 pages
NLP Session 6
No ratings yet
NLP Session 6
5 pages
POStagging
No ratings yet
POStagging
72 pages
8 POSNER Intro May 6 2021
No ratings yet
8 POSNER Intro May 6 2021
26 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
L11-POS - Tagging - II
No ratings yet
L11-POS - Tagging - II
43 pages
POS Tagging and HMM in NLP
No ratings yet
POS Tagging and HMM in NLP
93 pages
POS Tagging for NLP Enthusiasts
No ratings yet
POS Tagging for NLP Enthusiasts
47 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
Parts of Speech
No ratings yet
Parts of Speech
26 pages
Components
No ratings yet
Components
29 pages
Module5 Lecture1 LanguageModelling
No ratings yet
Module5 Lecture1 LanguageModelling
52 pages
Module3 Lecture1 ContextFreeGrammar
No ratings yet
Module3 Lecture1 ContextFreeGrammar
11 pages
Module2 Lecture4 HMM
No ratings yet
Module2 Lecture4 HMM
20 pages
Module2 Lecture2 FST
No ratings yet
Module2 Lecture2 FST
69 pages
Questions - Worksheet - 8th - English - 2023-10-15T04 - 52
No ratings yet
Questions - Worksheet - 8th - English - 2023-10-15T04 - 52
7 pages
History of Indian Literature in English
No ratings yet
History of Indian Literature in English
25 pages
IIT Madras Non-Academic Recruitment Norms
No ratings yet
IIT Madras Non-Academic Recruitment Norms
128 pages
Ut-1 (2025-26) MPS International
No ratings yet
Ut-1 (2025-26) MPS International
1 page
505) Adjective (Kunal) Kshitij
No ratings yet
505) Adjective (Kunal) Kshitij
25 pages
Subject - Predicate Agreement
No ratings yet
Subject - Predicate Agreement
3 pages
Artikel Ilmiah Tugas Uas (MZW - 5a)
No ratings yet
Artikel Ilmiah Tugas Uas (MZW - 5a)
28 pages
Un Pass
No ratings yet
Un Pass
1 page
CSEC English A Course Overview 2022-2023
No ratings yet
CSEC English A Course Overview 2022-2023
9 pages
Units 1 My Friends
No ratings yet
Units 1 My Friends
13 pages
English Adjectives Guide
100% (1)
English Adjectives Guide
26 pages
Linguistic Autobiography
No ratings yet
Linguistic Autobiography
2 pages
101 Prasangam Malayalam
No ratings yet
101 Prasangam Malayalam
1 page
Linguistic Interdependence and The Educational Development of Bilingual Children
No ratings yet
Linguistic Interdependence and The Educational Development of Bilingual Children
30 pages
Philippine Literary Periods
83% (280)
Philippine Literary Periods
3 pages
Spelling Midterm
No ratings yet
Spelling Midterm
2 pages
Start Preparing With IELTS Ready - Take IELTS
No ratings yet
Start Preparing With IELTS Ready - Take IELTS
4 pages
AEC-II Sem Punjabi Bhasha Da Mudhla Padhar
No ratings yet
AEC-II Sem Punjabi Bhasha Da Mudhla Padhar
4 pages
Verb Phrases: Basic Parts
No ratings yet
Verb Phrases: Basic Parts
3 pages
مراجعة اللغة الإنجليزية لغة أولي للصف الثالث الثانوي
67% (3)
مراجعة اللغة الإنجليزية لغة أولي للصف الثالث الثانوي
6 pages
Contrast, Purpose, and Reason Clauses
No ratings yet
Contrast, Purpose, and Reason Clauses
4 pages
Reet Syllabus Strategy
No ratings yet
Reet Syllabus Strategy
6 pages
New Zealand Certificate in Health and Wellbeing Level 3 Support Work
No ratings yet
New Zealand Certificate in Health and Wellbeing Level 3 Support Work
2 pages
Business Communication MCQs Set 3
No ratings yet
Business Communication MCQs Set 3
8 pages
Offers and Suggestions Explained
No ratings yet
Offers and Suggestions Explained
7 pages
Notes On Ilocos Region
No ratings yet
Notes On Ilocos Region
2 pages
Eapp Module 1
No ratings yet
Eapp Module 1
17 pages
The Handbook of Classroom Discourse and Interaction 1st Edition Numa Markee Updated 2025
No ratings yet
The Handbook of Classroom Discourse and Interaction 1st Edition Numa Markee Updated 2025
141 pages
Rhetorical Analysis 1
No ratings yet
Rhetorical Analysis 1
12 pages
Caring For Your Dog
No ratings yet
Caring For Your Dog
16 pages

Module2 Lecture3 POS

Uploaded by

Module2 Lecture3 POS

Uploaded by

Part-of-Speech Tagging

 A POS tag of a word describes the major and minor word

 There are various tag sets to choose.

 Part-of-speech can be divided into two broad categories:

 Nouns can be divided as:

 Verb class includes the words referring actions and processes.

 Adjectives describe properties or qualities

 Adverbs normally modify verbs.

 Prepositions -- on, under, over, near, at, from, to, with

 Occur before noun phrases

 A particle combines with a verb to form a larger unit

 A small closed class

 Conjunctions are used to combine or join two phrases, clauses

 Part of speech tagging is simply assigning the correct part

 Shorthand for referring to some entity or event.

 Can be useful for other NLP tasks

 Or linguistic or language-analytic computational tasks

 Roughly 15% of word types are ambiguous

 But baseline is 92%!

 (and tag unknown words as nouns)

 Partly easy because

 Supervised Machine Learning Algorithms:

• Two – level Morphology

• HMM taggers choose a tag sequence for a whole sentence rather a

• This can be simplified to

• Lets recall the HMM tagger equation:

• We need to determine, which of these two has highest likelihood?

• Now, we look at second part of Tagger eqn:

• The probability counterintuitively answers the question: “If we were

You might also like