0% found this document useful (0 votes)
380 views56 pages

Speech and Language Processing Guide

Uploaded by

Nitin Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
380 views56 pages

Speech and Language Processing Guide

Uploaded by

Nitin Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
‘The author and publisher of this book have used their best efforts in preparing this book. These efforts include the developinent, research, and testing of the theories and programs to determine their effectiveness. The author and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. Copyright © 2000 by Pearson Education, Inc. This edition is published by arrangement with Pearson Education, Inc. and Dorling Kinderstey Publishing, Inc. This book is sold subject to the condition that it shall not, by way of trade of otherwise, be tent, resold, hired out, or otherwise circulated without the publisher's prior written consent in any form of binding or cover other than that in which itis published and condition including this condition being imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior vitten permission of both the copyright owner and the above-mentioned publisher of this book. ISBN 978-81-317-1672-4 First Impression, 2008 Second Impression, 2009 ‘Third Impression This edition is manufactured in India and is authorized far sale only in india, Bangladesh, Bhutan, Pakistan, Nepal, Sri Lanka and the Maldives. Circulation af this edition outside of these territories is UNAUTHORIZED. Published by Dorling Kindersley (India) Pvt. Ltd censees of Pearson Education in South Asia. Head Office: 7* Floor, knowledge Boulevard, A-8(A) Sector-62, Noida ~ 201309 (U.P), India. Registered Office: 14 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India. Printed in India by Saurabh Printers Pvt. Ltd. Contents Preface” 21 1 Introduction 27 1.1 Knowledge in Speech and Language Processing.......... 28 12 Ambiguity .... 30 13 “Modelsand Algorithms. 31 1.4 Language, Thought, and Understanding 32 15 The State of the Art and the Near-Term Future ...-...... 35 16 Some Brief History 17 Bibliographical and Historical Notes . Foundational Insights: 1940s and 1950s The Two Camps: 1957-1970 Four Paradigms: 1970-1983 . Empiricism and Finite State Models Redux: 1983-1993 . The Field Comes Together: 1994-1999. OnMuttiple Discoveries... 1.4. A Final Brief Note on Psychology .. Summary . 1 Words 45 2 Regular Expressions and Automata 22 Regular Expressions Basic Regular Expression Patterns . Disjunction, Grouping, and Precedence ASimple Example .... 54 A More Complex Example 55 Advanced Operators Regular Expression Substitution, Memory, and ELIZA . Finite-State Automata Using an FSA to Recognize Sheeptalk. . . Another Example .......sesesceeeeereceneeneereeene 65 Non-Deterministic FSAs . 66 Using an NFSA to Accept Strings 67 Recognition as Search -72 Contents Relating Deterministic and Non-Deterministic Automata... 74 2.3 Regular Languages and FSAs seeeeteneeeneeees 15 24 Summary esaeesceeasessssees TF Bibliographical and Historical Notes 78 Exercises 9 Morphology and Finite-State Transducers 83 Survey of (Mostly) English Morphology 85 Inflectional Morphology . 87 Derivational Morphology .... 89 3.2 Finite-StateMorphological Parsing........sscscussceeee OL The Lexicon and Morphotactics........sssscseecseeeee 92 Morphological Parsing with Finite-State Transducers . 7 Orthographic Rules and Finite-State Transducers 102 3.3 Combining FST Lexicon and Rules . 105, 3.4 Lexicon-Free FSTs: The Porter Stemmer 108 3.5 Human Morphological Processing . 3.6 Summary . Bibliographical and Historical Notes Exercises Computational Phonology and Text-to-Speech 17 4.1. Speech Sounds and Phonetic Transcription . 4.2 The Phoneme and Phonological Rules . 4.3 Phonological Rulesand Transducers ....0esce0eeeeee08 131 4.4 Advanced Issues in Computational Phonology. Templatic Morphology Optimality Theory... . 4.5 Machine Leaming of Phonological Rules . 4.6 Mapping Text to Phones for TIS Pronunciation Dictionaries - Beyond Dictionary Lookup: Text Analysis An FST-based Pronunciation Lexicon ..........- 47 Prosody in TTS, .sscscssesssesassseesceeersvares Contents Phonological Aspects of Prosody . 156 Phonetic or Acoustic Aspects of Prosody 158 Prosody in Speech Synthesis .......... 158 48 Human Processing of Phonology and Morphology 160 49 Summary . 161 Bibliographical atid Historica) Notes 162 Exercises 5 Probabilistic Models of Pronunciation and Spelling 167 5.1 Dealing with Spelling Errors... 0.0... ceecseesav essen 169 5.2 Spelling Error Patterns . . . ‘im 53 Detecting Non-Word Errors . 5.4 Probabilistic Models .... 1B 5S Applying the Bayesian Method to Spelling 175 $6 Minimum EditDistance . 179 5.7 English Pronunciation Variation 182 5.8 The Bayesian Mcthod for Pronunciation . 139 Decision Tree Models of Pronunciation Variation 59 Weighted Automata .....2.....c0srccsesee Computing Likelihoods from Weighted Automata: The For- ward Algorithm . Decoding: The Viterbi Algorithm . Weighted Automata and Segmentation Segmentation for Lexicon-Induction 5.10 Pronunciation in Humans S11 Summary .. Bibliographical and Historical Notes 6 N-grams 6.1 Counting Words in Corpora ... . 62 Simple (Unsmoothed)N-grams More on N-grams and Their Sensiti ty to the Training Cor- pus . 63 Smoothing . . Add-One Smoothing 233 Witten-Bell Discounting ... 236 Good-Turing Discounting 64 Combining Backoff with Discounting 65 Deleted Interpolation .......... 66 N-grams for Spelling and Pronunciation Context-Sensitive Spelling Error Correction . ‘Negrams for Pronunciation Modeling 6.7 EMWOpy 2... e ee eee Cross Entropy for Comparing Models . The Entropy of English... . . Bibliographical and Historical Notes 7 HMMs and Speech Recognition 7.1 Speech Recognition Architecture..........0...0.c2000 7.2 Overview of Hidden Markov Models 7.3 The Viterbi Algorithm Revisited . . 7.4 Advanced Methods for Decoding At Decoding 7.5 Acoustic Processing of Speech .....+ Sound Waves ..ssscecrsererereees How to Interpret a Waveform . Spectra... Feature Extraction 7.6 Computing Acoustic Probabilities. . . 7.17 Training a Speech Recognizer ..... 7.8 Waveform Generation for Speech Synthesis ....... Pitch and Duration Modification. Unit Selection . 7.9 Human Speech Recognition . vie 285 LEI Be 7.10 Summary 305 Bibliographical and Historical Notes . Exercises AL Syntax au 8 Word Classes and Part-of-Speech Tagging 313 8.1 (Mostly) English Word Classes 315 8.2 Tagsets for English .. 83 Part-of-Speech Tagging . 324 Contents 84 85 8.6 8.7 88 Bibliographical and Historical Notes . ERCriSCS 6 [Link] Rule-Based Part-of- Speech Tagging Stochastic Part-of-Speech Tagging 329 .. 329 . 331 333 The Actual Algorithm for HMM Tagging . ‘Transformation-Based Tagging ......... How TBL Rules Are Applied How TBL Rules Are Leamed.... +s 338 Other Issues sees eseseeevs vere ++ 338 Multiple Tags and Multiple Words 338 Unknown Words. . 340 Class-based N-grams Summary ....... 342 343 9 Context-Free Grammars for English 349 9.1 Constituency 351 9.2 Context-Free Rules and Trees . 392 9.3. Sentence-Level Constructions . 358 94 The Noun Phrase ... 360 Before the Head Noun . 361 After the Noun 363 9.5 Coordination . 365 9.6 Agreement .. 366 9.7 The Verb Phrase and Subcategorization 368 9.8 Auxiliaries. 370 9.9 Spoken Language Syntax . a Disfluencies ......... Grammar Equivalence and Normal Form Finite-State and Context-Free Grammars . 333 374 374 376 378 3p 381 10 Parsing with Context-Free Grammars 383 10.1 Parsing as Search. Top-Down Parsing Bottom-Up Parsing. . 12 ‘Comparing Top-Down and Bottom-Up Parsing 10.2 A Basic Top-Down Parser. ‘Adding Bottom-Up Filtering 10.3 Problems with the Basic Top-Down Parser. Left-Recursion Ambiguity .. Repeated Parsing of Subtrees 10.4 The Earley Algorithm . .... 10.5 Finite-State Parsing Methods . 10.6 Summary ...........265 Bibliographical and Historical Notes Exercises Features and Unification 1.1 Feature Structures oo... c0sseecree 11.2. Unification of Feature Structures... . 113. Features Structures in the Grammar. . 431 Agreement . 433 Head Features 436 Subcategorization . 437 Long-Distance Dependencies .......ses0seseeeseveess 443 11.4 Implementing Unification Unification Data Structures The Unification Algorithm . 115 Parsing with Unification Constraints ..............00065 453 Integrating Unification into an Earley Parser............. 454 Unification Parsing [Link]... ccc. cee eee e eee 11.6 Types and Inheritance Extensions to Typing . . ‘Other Extensions to Unification’ . 1.7 Summary . Bibliographical and Historical Notes . Exercises . 470 Lexicalized and Probabilistic Parsing 473 12.1 Probabilistic Context-Free Grammars . 4% Probabilistic CYK Parsing of PCFGs - 479 Learning PCFG Probabilities . . 480 12.2 Problems with PCFGs .... - 482

You might also like