Introduction to Speech &
Natural Language Processing
Lecture 0
Introducing Course
Krishnendu Ghosh & S R Mahadeva Prasanna
Course Details
Credit: 1 (1-0-0-0-1)
Course: Introduction to Speech and Natural Language Processing
Course Type: Elective
Course Overview
This course provides a foundational understanding of Speech
Processing and Natural Language Processing (NLP). It focuses on the
core principles, techniques, and applications. The course covers text
processing, language models, speech signal processing, automatic
speech recognition (ASR), and text-to-speech synthesis (TTS).
Course Objectives
By the end of this course, students will:
Understand basic text and speech processing techniques.
Learn about language models, and embeddings.
Explore speech feature extraction, ASR, and TTS models.
Syllabus
Unit 1: Lexical Processing in NLP (2 Hours)
What is NLP & Speech Processing
Text Normalization: Tokenization, Stemming, Lemmatization.
Language Modeling: N-grams, Word2Vec, GloVe.
Syllabus
Unit 2: Syntactic Processing in NLP (2 Hours)
Sequence Labeling for Parts-of-Speech (POS) Tagging and Named
Entity Recognition (NER).
Context-Free Grammars and Constituency Parsing. Dependency
Parsing.
Syllabus
Unit 3: Semantic Processing in NLP (2 Hours)
Word Sense Disambiguation (WSD) – Understanding word meanings in
context.
Semantic Role Labeling (SRL) – Assigning roles like agent, object, etc.
Coreference Resolution – Identifying references to the same entity.
Syllabus
Unit 4: Phonetics & Speech Signal Processing (3 Hours)
Basics of Speech Production & Phonetics.
Feature Extraction: MFCC, Spectrograms, PLP Features.
Deep Learning for Speech: Introduction to Wave2Vec, WavLM.
Syllabus
Unit 5: Automatic Speech Recognition (ASR) & Text-to-Speech (TTS) (3
Hours)
ASR Pipeline: Feature extraction → Acoustic modeling → Decoding.
HMM vs. DNN-based ASR systems.
End-to-End ASR Models: Wave2Vec, Whisper API.
TTS Pipeline: Text preprocessing → Prosody → Synthesis.
Deep Learning-based TTS Models: Tacotron, FastSpeech, WaveNet.
Challenges in Speech Synthesis (Low-Resource Languages, Prosody Control).
Text books
Speech and Language Processing – Daniel Jurafsky & James H. Martin
(3rd Edition, Draft Available Online)
Springer Handbook of Speech Processing - Jacob Benesty, M. Mohan
Sondhi, Yiteng Arden Huang
Reference books / Material
Natural Language Processing with Python (NLTK Book) – Steven Bird,
Ewan Klein, Edward Loper
Spoken Language Processing: A Guide to Theory, Algorithm, and
System Development – Xuedong Huang, Alex Acero, Hsiao-Wuen Hon
Fundamentals of Speech Recognition – Lawrence Rabiner, Biing-
Hwang Juang
Grading Policy
Theoretical Assignments (2) 14%
Quizzes (12) 36%
End-Term Exam 30%
Classroom Notes 5%
Activeness in Classes 5%
Attendance 5%
X-Factor (Originality, Creativity, or Initiative) 5%