0% found this document useful (0 votes)
1 views40 pages

Module2 Lecture3 POS

Part-of-speech (POS) tagging involves classifying words into their respective categories, such as nouns, verbs, and adjectives, based on their morphological and distributional properties. Various tag sets exist, with the choice depending on the application, and POS tagging is crucial for tasks like parsing and sentiment analysis. The accuracy of POS tagging in English is around 97%, with common algorithms including Hidden Markov Models and Conditional Random Fields.

Uploaded by

Sachin Kumar N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views40 pages

Module2 Lecture3 POS

Part-of-speech (POS) tagging involves classifying words into their respective categories, such as nouns, verbs, and adjectives, based on their morphological and distributional properties. Various tag sets exist, with the choice depending on the application, and POS tagging is crucial for tasks like parsing and sentiment analysis. The accuracy of POS tagging in English is around 97%, with common algorithms including Hidden Markov Models and Conditional Random Fields.

Uploaded by

Sachin Kumar N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Part-of-Speech Tagging

Part of Speech

 Each word belongs to a word class. The word class of a word is known
as part-of-speech (POS) of that word.
 Most POS tags implicitly encode fine-grained specializations of eight basic
parts of speech:
 noun, verb, pronoun, preposition, adjective, adverb,
conjunction, article
 These categories are based on morphological and distributional
similarities (not semantic similarities).
 Part of speech is also known as:
 word classes
 morphological classes
 lexical tags

2
Part of Speech (cont.)

 A POS tag of a word describes the major and minor word


classes of that word.
 A POS tag of a word gives a significant amount of information about
that word and its neighbours. For example, a possessive pronoun (my,
your, her, its) most likely will be followed by a noun, and a personal
pronoun (I, you, he, she) most likely will
be followed by a verb.
 Most of words have a single POS tag, but some of them have more
than one (2,3,4,…)
 For example, book/noun or book/verb
 I bought a book.
 Please book that flight.
3
Tag Sets

 There are various tag sets to choose.


 The choice of the tag set depends on the nature of the application.
 We may use small tag set (more general tags) or
 large tag set (finer tags).
 Some of widely used part-of-speech tag sets:
 Penn Treebank has 45 tags
 Brown Corpus has 87 tags
 C7 tag set has 146 tags
 In a tagged corpus, each word is associated with a tag from
the used tag set.
4
English Word Classes

 Part-of-speech can be divided into two broad categories:


 closed class types -- such as prepositions
 open class types -- such as noun, verb
 Closed class words are generally also function words.
 Function words play important role in grammar
 Some function words are: of, it, and, you
 Functions words are most of time very short and frequently occur.
 There are four major open classes.
 noun, verb, adjective, adverb
 a new word may easily enter into an open class.
 Word classes may change depending on the natural language, but all natural
languages have at least two word classes: noun and verb.
5
Nouns

 Nouns can be divided as:


 proper nouns -- names for specific entities such as Ankara, John,
Ali
 common nouns
 Proper nouns do not take an article but common nouns may take.
 Common nouns can be divided as:
 count nouns -- they can be singular or plural -- chair/chairs
 mass nouns -- they are used when something is conceptualized
as a homogenous group -- snow, salt
 Mass nouns cannot take articles a and an, and they can not be plural.
6
Verbs

 Verb class includes the words referring actions and processes.


 Verbs can be divided as:
 main verbs -- open class -- draw, bake
 auxiliary verbs -- closed class -- can, should
 Auxiliary verbs can be divided as:
 copula -- be, have
 modal verbs -- may, can, must, should
 Verbs have different morphological forms:
 non-3rd-person-sg eat
 3rd-person-sg - eats
 progressive -- eating
 past -- ate
 past participle -- eaten

7
Adjectives

 Adjectives describe properties or qualities


 for color -- black, white
 for age -- young, old
 In Turkish, all adjectives can also be used as noun.
 kırmızı kitap red book
 kırmızıyı the red one (ACC)

8
Adverbs

 Adverbs normally modify verbs.


 Adverb categories:
 locative adverbs -- home, here, downhill
 degree adverbs -- very, extremely
 manner adverbs -- slowly, delicately
 temporal adverbs -- yesterday, Friday
 Because of the heterogeneous nature of adverbs, some
adverbs such as Friday may be tagged as nouns.
9
Major Closed Classes

 Prepositions -- on, under, over, near, at, from, to, with


 Determiners -- a, an, the
 Pronouns -- I, you, he, she, who, others
 Conjunctions -- and, but, if, when
 Participles -- up, down, on, off, in, out
 Numerals -- one, two, first, second

10
Prepositions

 Occur before noun phrases


 indicate spatial or temporal relations
 Example:
 on the table
 under chair
 They occur so often. For example, some of the frequency
counts in a 16 million word corpora (COBUILD).
 of 540,085
 in 331,235
 for 142,421
 to 125,691
 with 124,965
 on 109,129
 at 100,169

11
Particles

 A particle combines with a verb to form a larger unit


called phrasal verb.
 go on
 turn on
 turn off
 shut down

12
Articles

 A small closed class


 Only three words in the class: a an the
 Marks definite or indefinite
 They occur so often. For example, some of the frequency
counts in a 16 million word corpora (COBUILD).
 the 1,071,676
 a 413,887
 an 59,359
 Almost 10% of words are articles in this corpus.
13
Conjunctions

 Conjunctions are used to combine or join two phrases, clauses


or sentences.
 Coordinating conjunctions -- and or but
 join two elements of equal status
 Example: you and me
 Subordinating conjunctions -- that who
 combines main clause with subordinate clause
 Example:
I thought that you might like milk
14
Part of Speech Tagging

 Part of speech tagging is simply assigning the correct part


of speech for each in an input sentence
 We assume that we have the following:
A set of tags (our tag set)
A dictionary that tells us the possible tags for each
word (including all morphological variants).
A text to be tagged.

15
Pronouns

 Shorthand for referring to some entity or event.


 Pronouns can be divided:
 personal you she I
 possessive my your his
 wh-pronouns who what -- who is the president?

16
Part-of-Speech Tagging
Map from sequence x1,…,xn of words to y1,…,yn of POS tags
Nivre et al. 2016
"Universal Dependencies" Tagset
Why Part of Speech Tagging?

 Can be useful for other NLP tasks


 Parsing: POS tagging can improve syntactic parsing
 MT: reordering of adjectives and nouns (say from Spanish to English)
 Sentiment or affective tasks: may want to distinguish adjectives or other POS
 Text-to-speech (how do we pronounce “lead” or "object"?)

 Or linguistic or language-analytic computational tasks


 Need to control for POS when studying linguistic change like creation of new words, or meaning shift
 Or control for POS in measuring meaning similarity or difference
How difficult is POS tagging in English?

 Roughly 15% of word types are ambiguous


• Hence 85% of word types are unambiguous
• Janet is always PROPN, hesitantly is always ADV
 But those 15% tend to be very common.
 So ~60% of word tokens are ambiguous
 E.g., back
earnings growth took a back/ADJ seat
a small building in the back/NOUN
a clear majority of senators back/VERB the bill
enable the country to buy back/PART debt
I was twenty-one back/ADV then
POS tagging performance in English
 How many tags are correct? (Tag accuracy)
 About 97%
 Hasn't changed in the last 10+ years
 HMMs, CRFs, BERT perform similarly .
 Human accuracy about the same

 But baseline is 92%!


 Baseline is performance of stupidest possible method
 "Most frequent class baseline" is an important baseline for many tasks
 Tag every word with its most frequent tag

 (and tag unknown words as nouns)

 Partly easy because


 Many words are unambiguous
Sources of information for POS tagging
Janet will back the bill
AUX/NOUN/VERB? NOUN/VERB?
 Prior probabilities of word/tag
• "will" is usually an AUX
 Identity of neighboring words
• "the" means the next word is probably not a verb
 Morphology and wordshape:
 Prefixes unable: un-  ADJ
 Suffixes importantly: -ly  ADJ
 Capitalization Janet: CAP  PROPN
Standard algorithms for POS tagging

 Supervised Machine Learning Algorithms:


• Hidden Markov Models
• Conditional Random Fields (CRF)/ Maximum Entropy
Markov Models (MEMM)
• Neural sequence models (RNNs or Transformers)
• Large Language Models (like BERT), finetuned
 All required a hand-labeled training set, all about equal
performance (97% on English)
 All make use of information sources we discussed
• Via human created features: HMMs and CRFs
• Via representation learning: Neural LMs
Recalling Rule Based Tagging -
ENGTWOL

• Two – level Morphology


• Lexical and Disambiguation Rules
• 56000 English Word Stems
ENGTWOL LEXICON
First Stage: Two Level Transducer => POS
Second Stage
• 1,100 constraints are applied to remove incorrect POS
Stochastic Tagger – HMM
• Intuition : Pick the most likely tag
• For a given word, HMM tagger picks a tag that maximises:

• HMM taggers choose a tag sequence for a whole sentence rather a


single word
Bigram HMM
tagger
• Tagger chooses a tag 𝑡𝑖 for word 𝑤𝑖 that is most probable given the
previous tag 𝑡𝑖−1 and the current word 𝑤𝑖

• This can be simplified to


Exampl
e
• Consider the sentences (Brown Corpus):
Working
:
• Consider the we only have these (2) sub-sequences:

• Lets recall the HMM tagger equation:

• We need to determine, which of these two has highest likelihood?



Using FC from Brown and
Switchboard Corpora

• Now, we look at second part of Tagger eqn:

• The probability counterintuitively answers the question: “If we were


expecting a verb, how likely is it that this verb would be race?”
Final
Answer:
Generative Tagging
Models
Generative Tagging
Models

You might also like