Morphological Models in NLP - Detailed Study Notes
1. Dictionary Lookup Model
Definition:
The Dictionary Lookup model is a basic approach where word forms are stored in a lexicon. It matches input words to
dictionary entries.
Working:
- Searches for word in precompiled dictionary.
- No dynamic generation of new forms.
Structure:
- Lexicon: Stores words + grammar.
- Matcher: Looks up and retrieves features.
Example:
"unhappiness" -> un- + happy + -ness
Advantages:
+ Fast retrieval, handles irregular words.
Disadvantages:
- Can't handle unknown words, static.
Applications:
- Spell checking, POS tagging.
2. Finite-State Morphology
Definition:
Uses FSMs to analyze/generate word forms via automata or transducers.
Structure:
- FSA: Validates word forms.
- FST: Maps lexical <-> surface forms.
- Lexicon + Rules
Example:
"cats" -> cat + PL
Tools:
XFST, FOMA, HFST
Advantages:
+ Fast (O(n)), good for pattern rules.
Disadvantages:
- Limited to regular structures.
Morphological Models in NLP - Detailed Study Notes
Applications:
- Spell check, MT, speech systems.
3. Unification-Based Morphology
Definition:
Feature-based model using attribute-value pairs and unification.
Structure:
- Lexicon: Features per morpheme.
- Morphotactics, phonology rules.
- Unification engine.
Example:
"walks" -> [walk] + [-s] -> [3rd person, singular]
Advantages:
+ Expressive, handles complex languages.
Disadvantages:
- Slower, ambiguity prone, complex.
Applications:
- Parsing, MT, grammar tools.
4. Function-Based Morphology
Definition:
Focuses on grammatical functions of word parts.
Examples:
"running" -> Present Participle
"cats" -> Plural Noun
Advantages:
+ Simple, rule-based, extendable.
Disadvantages:
- Not suitable for rich morphology.
Applications:
- POS tagging, grammar correction.
5. Morphology Induction
Definition:
Discovers morphemes using large data; unsupervised.
Morphological Models in NLP - Detailed Study Notes
Process:
1. Identify frequent endings.
2. Compare to find roots.
3. Use statistics.
Example:
"play, played, playing" -> play (root), -ed/-ing
Advantages:
+ Learns without rules, low-resource friendly.
Disadvantages:
- Needs large data, noise-prone.
Applications:
- MT, NLP for unknown morphology.