Part-of-Speech Tagging
Part of Speech
Each word belongs to a word class. The word class of a word is known
as part-of-speech (POS) of that word.
Most POS tags implicitly encode fine-grained specializations of eight basic
parts of speech:
noun, verb, pronoun, preposition, adjective, adverb,
conjunction, article
These categories are based on morphological and distributional
similarities (not semantic similarities).
Part of speech is also known as:
word classes
morphological classes
lexical tags
2
Part of Speech (cont.)
A POS tag of a word describes the major and minor word
classes of that word.
A POS tag of a word gives a significant amount of information about
that word and its neighbours. For example, a possessive pronoun (my,
your, her, its) most likely will be followed by a noun, and a personal
pronoun (I, you, he, she) most likely will
be followed by a verb.
Most of words have a single POS tag, but some of them have more
than one (2,3,4,…)
For example, book/noun or book/verb
I bought a book.
Please book that flight.
3
Tag Sets
There are various tag sets to choose.
The choice of the tag set depends on the nature of the application.
We may use small tag set (more general tags) or
large tag set (finer tags).
Some of widely used part-of-speech tag sets:
Penn Treebank has 45 tags
Brown Corpus has 87 tags
C7 tag set has 146 tags
In a tagged corpus, each word is associated with a tag from
the used tag set.
4
English Word Classes
Part-of-speech can be divided into two broad categories:
closed class types -- such as prepositions
open class types -- such as noun, verb
Closed class words are generally also function words.
Function words play important role in grammar
Some function words are: of, it, and, you
Functions words are most of time very short and frequently occur.
There are four major open classes.
noun, verb, adjective, adverb
a new word may easily enter into an open class.
Word classes may change depending on the natural language, but all natural
languages have at least two word classes: noun and verb.
5
Nouns
Nouns can be divided as:
proper nouns -- names for specific entities such as Ankara, John,
Ali
common nouns
Proper nouns do not take an article but common nouns may take.
Common nouns can be divided as:
count nouns -- they can be singular or plural -- chair/chairs
mass nouns -- they are used when something is conceptualized
as a homogenous group -- snow, salt
Mass nouns cannot take articles a and an, and they can not be plural.
6
Verbs
Verb class includes the words referring actions and processes.
Verbs can be divided as:
main verbs -- open class -- draw, bake
auxiliary verbs -- closed class -- can, should
Auxiliary verbs can be divided as:
copula -- be, have
modal verbs -- may, can, must, should
Verbs have different morphological forms:
non-3rd-person-sg eat
3rd-person-sg - eats
progressive -- eating
past -- ate
past participle -- eaten
7
Adjectives
Adjectives describe properties or qualities
for color -- black, white
for age -- young, old
In Turkish, all adjectives can also be used as noun.
kırmızı kitap red book
kırmızıyı the red one (ACC)
8
Adverbs
Adverbs normally modify verbs.
Adverb categories:
locative adverbs -- home, here, downhill
degree adverbs -- very, extremely
manner adverbs -- slowly, delicately
temporal adverbs -- yesterday, Friday
Because of the heterogeneous nature of adverbs, some
adverbs such as Friday may be tagged as nouns.
9
Major Closed Classes
Prepositions -- on, under, over, near, at, from, to, with
Determiners -- a, an, the
Pronouns -- I, you, he, she, who, others
Conjunctions -- and, but, if, when
Participles -- up, down, on, off, in, out
Numerals -- one, two, first, second
10
Prepositions
Occur before noun phrases
indicate spatial or temporal relations
Example:
on the table
under chair
They occur so often. For example, some of the frequency
counts in a 16 million word corpora (COBUILD).
of 540,085
in 331,235
for 142,421
to 125,691
with 124,965
on 109,129
at 100,169
11
Particles
A particle combines with a verb to form a larger unit
called phrasal verb.
go on
turn on
turn off
shut down
12
Articles
A small closed class
Only three words in the class: a an the
Marks definite or indefinite
They occur so often. For example, some of the frequency
counts in a 16 million word corpora (COBUILD).
the 1,071,676
a 413,887
an 59,359
Almost 10% of words are articles in this corpus.
13
Conjunctions
Conjunctions are used to combine or join two phrases, clauses
or sentences.
Coordinating conjunctions -- and or but
join two elements of equal status
Example: you and me
Subordinating conjunctions -- that who
combines main clause with subordinate clause
Example:
I thought that you might like milk
14
Part of Speech Tagging
Part of speech tagging is simply assigning the correct part
of speech for each in an input sentence
We assume that we have the following:
A set of tags (our tag set)
A dictionary that tells us the possible tags for each
word (including all morphological variants).
A text to be tagged.
15
Pronouns
Shorthand for referring to some entity or event.
Pronouns can be divided:
personal you she I
possessive my your his
wh-pronouns who what -- who is the president?
16
Part-of-Speech Tagging
Map from sequence x1,…,xn of words to y1,…,yn of POS tags
Nivre et al. 2016
"Universal Dependencies" Tagset
Why Part of Speech Tagging?
Can be useful for other NLP tasks
Parsing: POS tagging can improve syntactic parsing
MT: reordering of adjectives and nouns (say from Spanish to English)
Sentiment or affective tasks: may want to distinguish adjectives or other POS
Text-to-speech (how do we pronounce “lead” or "object"?)
Or linguistic or language-analytic computational tasks
Need to control for POS when studying linguistic change like creation of new words, or meaning shift
Or control for POS in measuring meaning similarity or difference
How difficult is POS tagging in English?
Roughly 15% of word types are ambiguous
• Hence 85% of word types are unambiguous
• Janet is always PROPN, hesitantly is always ADV
But those 15% tend to be very common.
So ~60% of word tokens are ambiguous
E.g., back
earnings growth took a back/ADJ seat
a small building in the back/NOUN
a clear majority of senators back/VERB the bill
enable the country to buy back/PART debt
I was twenty-one back/ADV then
POS tagging performance in English
How many tags are correct? (Tag accuracy)
About 97%
Hasn't changed in the last 10+ years
HMMs, CRFs, BERT perform similarly .
Human accuracy about the same
But baseline is 92%!
Baseline is performance of stupidest possible method
"Most frequent class baseline" is an important baseline for many tasks
Tag every word with its most frequent tag
(and tag unknown words as nouns)
Partly easy because
Many words are unambiguous
Sources of information for POS tagging
Janet will back the bill
AUX/NOUN/VERB? NOUN/VERB?
Prior probabilities of word/tag
• "will" is usually an AUX
Identity of neighboring words
• "the" means the next word is probably not a verb
Morphology and wordshape:
Prefixes unable: un- ADJ
Suffixes importantly: -ly ADJ
Capitalization Janet: CAP PROPN
Standard algorithms for POS tagging
Supervised Machine Learning Algorithms:
• Hidden Markov Models
• Conditional Random Fields (CRF)/ Maximum Entropy
Markov Models (MEMM)
• Neural sequence models (RNNs or Transformers)
• Large Language Models (like BERT), finetuned
All required a hand-labeled training set, all about equal
performance (97% on English)
All make use of information sources we discussed
• Via human created features: HMMs and CRFs
• Via representation learning: Neural LMs
Recalling Rule Based Tagging -
ENGTWOL
• Two – level Morphology
• Lexical and Disambiguation Rules
• 56000 English Word Stems
ENGTWOL LEXICON
First Stage: Two Level Transducer => POS
Second Stage
• 1,100 constraints are applied to remove incorrect POS
Stochastic Tagger – HMM
• Intuition : Pick the most likely tag
• For a given word, HMM tagger picks a tag that maximises:
• HMM taggers choose a tag sequence for a whole sentence rather a
single word
Bigram HMM
tagger
• Tagger chooses a tag 𝑡𝑖 for word 𝑤𝑖 that is most probable given the
previous tag 𝑡𝑖−1 and the current word 𝑤𝑖
• This can be simplified to
Exampl
e
• Consider the sentences (Brown Corpus):
Working
:
• Consider the we only have these (2) sub-sequences:
• Lets recall the HMM tagger equation:
• We need to determine, which of these two has highest likelihood?
•
Using FC from Brown and
Switchboard Corpora
• Now, we look at second part of Tagger eqn:
• The probability counterintuitively answers the question: “If we were
expecting a verb, how likely is it that this verb would be race?”
Final
Answer:
Generative Tagging
Models
Generative Tagging
Models