0 ratings0% found this document useful (0 votes) 380 views56 pagesSpeech and Language Processing Guide
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
‘The author and publisher of this book have used their best efforts in preparing this book. These efforts
include the developinent, research, and testing of the theories and programs to determine their
effectiveness. The author and publisher shall not be liable in any event for incidental or consequential
damages in connection with, or arising out of, the furnishing, performance, or use of these programs.
Copyright © 2000 by Pearson Education, Inc.
This edition is published by arrangement with Pearson Education, Inc. and Dorling Kinderstey
Publishing, Inc.
This book is sold subject to the condition that it shall not, by way of trade of otherwise, be tent, resold,
hired out, or otherwise circulated without the publisher's prior written consent in any form of binding or
cover other than that in which itis published and condition including this condition being
imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no
part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in
any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior
vitten permission of both the copyright owner and the above-mentioned publisher of this book.
ISBN 978-81-317-1672-4
First Impression, 2008
Second Impression, 2009
‘Third Impression
This edition is manufactured in India and is authorized far sale only in india, Bangladesh, Bhutan,
Pakistan, Nepal, Sri Lanka and the Maldives. Circulation af this edition outside of these territories is
UNAUTHORIZED.
Published by Dorling Kindersley (India) Pvt. Ltd
censees of Pearson Education in South Asia.
Head Office: 7* Floor, knowledge Boulevard, A-8(A) Sector-62, Noida ~ 201309 (U.P), India.
Registered Office: 14 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India.
Printed in India by Saurabh Printers Pvt. Ltd.Contents
Preface” 21
1 Introduction 27
1.1 Knowledge in Speech and Language Processing.......... 28
12 Ambiguity .... 30
13 “Modelsand Algorithms. 31
1.4 Language, Thought, and Understanding 32
15 The State of the Art and the Near-Term Future ...-...... 35
16 Some Brief History
17
Bibliographical and Historical Notes .
Foundational Insights: 1940s and 1950s
The Two Camps: 1957-1970
Four Paradigms: 1970-1983 .
Empiricism and Finite State Models Redux: 1983-1993 .
The Field Comes Together: 1994-1999.
OnMuttiple Discoveries... 1.4.
A Final Brief Note on Psychology ..
Summary .
1 Words 45
2 Regular Expressions and Automata
22
Regular Expressions
Basic Regular Expression Patterns .
Disjunction, Grouping, and Precedence
ASimple Example .... 54
A More Complex Example 55
Advanced Operators
Regular Expression Substitution, Memory, and ELIZA .
Finite-State Automata
Using an FSA to Recognize Sheeptalk. . .
Another Example .......sesesceeeeereceneeneereeene 65
Non-Deterministic FSAs . 66
Using an NFSA to Accept Strings 67
Recognition as Search -72Contents
Relating Deterministic and Non-Deterministic Automata... 74
2.3 Regular Languages and FSAs seeeeteneeeneeees 15
24 Summary esaeesceeasessssees TF
Bibliographical and Historical Notes 78
Exercises 9
Morphology and Finite-State Transducers 83
Survey of (Mostly) English Morphology 85
Inflectional Morphology . 87
Derivational Morphology .... 89
3.2 Finite-StateMorphological Parsing........sscscussceeee OL
The Lexicon and Morphotactics........sssscseecseeeee 92
Morphological Parsing with Finite-State Transducers . 7
Orthographic Rules and Finite-State Transducers 102
3.3 Combining FST Lexicon and Rules . 105,
3.4 Lexicon-Free FSTs: The Porter Stemmer 108
3.5 Human Morphological Processing .
3.6 Summary .
Bibliographical and Historical Notes
Exercises
Computational Phonology and Text-to-Speech 17
4.1. Speech Sounds and Phonetic Transcription .
4.2 The Phoneme and Phonological Rules .
4.3 Phonological Rulesand Transducers ....0esce0eeeeee08 131
4.4 Advanced Issues in Computational Phonology.
Templatic Morphology
Optimality Theory... .
4.5 Machine Leaming of Phonological Rules .
4.6 Mapping Text to Phones for TIS
Pronunciation Dictionaries -
Beyond Dictionary Lookup: Text Analysis
An FST-based Pronunciation Lexicon ..........-
47 Prosody in TTS, .sscscssesssesassseesceeersvaresContents
Phonological Aspects of Prosody . 156
Phonetic or Acoustic Aspects of Prosody 158
Prosody in Speech Synthesis .......... 158
48 Human Processing of Phonology and Morphology 160
49 Summary . 161
Bibliographical atid Historica) Notes 162
Exercises
5 Probabilistic Models of Pronunciation and Spelling 167
5.1 Dealing with Spelling Errors... 0.0... ceecseesav essen 169
5.2 Spelling Error Patterns . . .
‘im
53 Detecting Non-Word Errors .
5.4 Probabilistic Models .... 1B
5S Applying the Bayesian Method to Spelling 175
$6 Minimum EditDistance . 179
5.7 English Pronunciation Variation 182
5.8 The Bayesian Mcthod for Pronunciation . 139
Decision Tree Models of Pronunciation Variation
59 Weighted Automata .....2.....c0srccsesee
Computing Likelihoods from Weighted Automata: The For-
ward Algorithm .
Decoding: The Viterbi Algorithm .
Weighted Automata and Segmentation
Segmentation for Lexicon-Induction
5.10 Pronunciation in Humans
S11 Summary ..
Bibliographical and Historical Notes
6 N-grams
6.1 Counting Words in Corpora ... .
62 Simple (Unsmoothed)N-grams
More on N-grams and Their Sensiti
ty to the Training Cor-
pus .
63 Smoothing . .
Add-One Smoothing 233
Witten-Bell Discounting ... 236
Good-Turing Discounting
64Combining Backoff with Discounting
65 Deleted Interpolation ..........
66 N-grams for Spelling and Pronunciation
Context-Sensitive Spelling Error Correction .
‘Negrams for Pronunciation Modeling
6.7 EMWOpy 2... e ee eee
Cross Entropy for Comparing Models .
The Entropy of English... . .
Bibliographical and Historical Notes
7 HMMs and Speech Recognition
7.1 Speech Recognition Architecture..........0...0.c2000
7.2 Overview of Hidden Markov Models
7.3 The Viterbi Algorithm Revisited . .
7.4 Advanced Methods for Decoding
At Decoding
7.5 Acoustic Processing of Speech .....+
Sound Waves ..ssscecrsererereees
How to Interpret a Waveform .
Spectra...
Feature Extraction
7.6 Computing Acoustic Probabilities. . .
7.17 Training a Speech Recognizer .....
7.8 Waveform Generation for Speech Synthesis .......
Pitch and Duration Modification.
Unit Selection .
7.9 Human Speech Recognition .
vie 285
LEI Be
7.10 Summary 305
Bibliographical and Historical Notes .
Exercises
AL Syntax au
8 Word Classes and Part-of-Speech Tagging 313
8.1 (Mostly) English Word Classes 315
8.2 Tagsets for English ..
83 Part-of-Speech Tagging
. 324Contents
84
85
8.6
8.7
88
Bibliographical and Historical Notes .
ERCriSCS 6 [Link]
Rule-Based Part-of- Speech Tagging
Stochastic Part-of-Speech Tagging
329
.. 329
. 331
333
The Actual Algorithm for HMM Tagging .
‘Transformation-Based Tagging .........
How TBL Rules Are Applied
How TBL Rules Are Leamed.... +s 338
Other Issues sees eseseeevs vere ++ 338
Multiple Tags and Multiple Words 338
Unknown Words. . 340
Class-based N-grams
Summary .......
342
343
9 Context-Free Grammars for English 349
9.1 Constituency 351
9.2 Context-Free Rules and Trees . 392
9.3. Sentence-Level Constructions . 358
94 The Noun Phrase ... 360
Before the Head Noun . 361
After the Noun 363
9.5 Coordination . 365
9.6 Agreement .. 366
9.7 The Verb Phrase and Subcategorization 368
9.8 Auxiliaries. 370
9.9 Spoken Language Syntax . a
Disfluencies .........
Grammar Equivalence and Normal Form
Finite-State and Context-Free Grammars .
333
374
374
376
378
3p
381
10 Parsing with Context-Free Grammars 383
10.1 Parsing as Search.
Top-Down Parsing
Bottom-Up Parsing. .12
‘Comparing Top-Down and Bottom-Up Parsing
10.2 A Basic Top-Down Parser.
‘Adding Bottom-Up Filtering
10.3 Problems with the Basic Top-Down Parser.
Left-Recursion
Ambiguity ..
Repeated Parsing of Subtrees
10.4 The Earley Algorithm . ....
10.5 Finite-State Parsing Methods .
10.6 Summary ...........265
Bibliographical and Historical Notes
Exercises
Features and Unification
1.1 Feature Structures oo... c0sseecree
11.2. Unification of Feature Structures... .
113. Features Structures in the Grammar. . 431
Agreement . 433
Head Features 436
Subcategorization . 437
Long-Distance Dependencies .......ses0seseeeseveess 443
11.4 Implementing Unification
Unification Data Structures
The Unification Algorithm .
115 Parsing with Unification Constraints ..............00065 453
Integrating Unification into an Earley Parser............. 454
Unification Parsing [Link]... ccc. cee eee e eee
11.6 Types and Inheritance
Extensions to Typing . .
‘Other Extensions to Unification’ .
1.7 Summary .
Bibliographical and Historical Notes .
Exercises . 470
Lexicalized and Probabilistic Parsing 473
12.1 Probabilistic Context-Free Grammars . 4%
Probabilistic CYK Parsing of PCFGs - 479
Learning PCFG Probabilities . . 480
12.2 Problems with PCFGs .... - 482