0% found this document useful (0 votes)

77 views38 pages

Understanding Mel Spectrograms

The document discusses the concepts of mel spectrogram, cepstrum, and spectrum in audio signal processing. It explains how audio signals are captured digitally, transformed using Fourier transform, and represented on the mel scale for better human perception. Additionally, it covers the advantages and disadvantages of Mel-Frequency Cepstral Coefficients (MFCCs) and their applications in speech and music processing.

Uploaded by

Bala Murugan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views38 pages

Understanding Mel Spectrograms

Uploaded by

Bala Murugan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MEL SPECTROGRAM,FREQUENCY,CEPSTRUM AND SPECTRUM

Dr. C Santhosh Kumar,

ECE Department
INTRODUCTION-MEL SPECTROGRAM

• A signal is a variation in a certain quantity over time. For audio, the quantity that varies is air
pressure.

• How do we capture this information digitally?

• We can take samples of the air pressure over time.

• The rate at which we sample the data can vary, but is most commonly 44.1kHz, or 44,100
samples per second.

• What we have captured is a waveform for the signal, and this can be interpreted, modified,
and analyzed with computer software.
The Fourier Transform

• An audio signal is comprised of several single-frequency sound waves.

• When taking samples of the signal over time, we only capture the resulting
amplitudes.

• The Fourier transform is a mathematical formula that allows us to decompose a

signal into it’s individual frequencies and the frequency’s amplitude.

• In other words, it converts the signal from the time domain into the frequency
domain. The result is called a spectrum.
The Spectrogram
The Mel Scale
• Studies have shown that humans do not perceive frequencies on a linear scale.

• We are better at detecting differences in lower frequencies than higher frequencies.

• For example, we can easily tell the difference between 500 and 1000 Hz, but we will
hardly be able to tell a difference between 10,000 and 10,500 Hz, even though the
distance between the two pairs are the same.

• In 1937, Stevens, Volkmann, and Newmann proposed a unit of pitch such that equal
distances in pitch sounded equally distant to the listener.

• This is called the mel scale.

The Mel Spectrogram
• A mel spectrogram is a spectrogram where the frequencies are converted to
the mel scale
SUMMARY ON MEL SPECTROGRAM

1.We took samples of air pressure over time to digitally represent an audio signal.

2.We mapped the audio signal from the time domain to the frequency domain using
the fast Fourier transform, and we performed this on overlapping windowed
segments of the audio signal.

3.We converted the y-axis (frequency) to a log scale and the color dimension
(amplitude) to decibels to form the spectrogram.

4.We mapped the y-axis (frequency) onto the mel scale to form the mel spectrogram.
Mel-Frequency Cepstral Coefficients
Cepstrum
Cepstrum
Cepstrum
Spectrum
Cepstrum
Spectrum
Cepstrum Quefrency Liftering Rhamonic

Spectrum Frequency Filtering Harmonic

An historical note on Cepstrum
 Developed while studying echoes in seismic signals (1960s)
 Audio feature of choice for speech recognition / identification (1970s)
 Music processing (2000s)
Computing the cepstrum
Computing the
cepstrum
Time-domain
signal
Computing the cepstrum

Time-domain
Spectrum
signal
Computing the cepstrum

Time-domain
Spectrum
signal

Log spectrum
Computing the cepstrum

Time-domain
Spectrum
signal

Log spectrum

Cepstrum
Visualising the cepstrum
Signal
Visualising the cepstrum
Signal Power spectrum

DFT
Visualising the cepstrum
Power spectrum
Visualising the cepstrum
Power spectrum
Log power spectrum

log
Visualising the cepstrum

Log power spectrum

Visualising the cepstrum

Log power spectrum Cepstrum

IDFT
Visualising the cepstrum

Log power spectrum Cepstrum

IDFT

????
Visualising the cepstrum

Log power spectrum Cepstrum

IDFT
Visualising the cepstrum

Log power spectrum Cepstrum

IDFT
1st rhamonic
MFCCs advantages
 Describe the “large” structures of the spectrum
 Ignore fine spectral structures
 Work well in speech and music processing
MFCCs disadvantages
 Not robust to noise
 Extensive knowledge engineering
 Not efficient for synthesis
MFCCs applications
 Speech processing
 Speech recognition

 Speaker recognition

 Music processing
 Music genre classification

 Mood classification

 Automatic tagging
THANK YOU

After Cat2
No ratings yet
After Cat2
162 pages
Understanding Mel-Frequency Cepstral Coefficients
No ratings yet
Understanding Mel-Frequency Cepstral Coefficients
75 pages
Generate Audio Signal & Mel Spectrogram
No ratings yet
Generate Audio Signal & Mel Spectrogram
7 pages
Understanding Mel-Spectrograms
No ratings yet
Understanding Mel-Spectrograms
41 pages
Speech Feature Extraction with MFCC
No ratings yet
Speech Feature Extraction with MFCC
9 pages
Mel Filter Bank and Cepstrum Analysis
No ratings yet
Mel Filter Bank and Cepstrum Analysis
3 pages
Automatic Speaker Recognition System
100% (1)
Automatic Speaker Recognition System
11 pages
DSP Lab Mini Project
No ratings yet
DSP Lab Mini Project
7 pages
Biometrics Lecture Speech
No ratings yet
Biometrics Lecture Speech
38 pages
MFCCs for Speech Feature Extraction
No ratings yet
MFCCs for Speech Feature Extraction
15 pages
13MFCC Tutorial
No ratings yet
13MFCC Tutorial
6 pages
Cepstrum: Origin and Definition
No ratings yet
Cepstrum: Origin and Definition
4 pages
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
No ratings yet
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
12 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Cepstrum Analysis for Engineers
No ratings yet
Cepstrum Analysis for Engineers
13 pages
Discrete Signal Representation Techniques
No ratings yet
Discrete Signal Representation Techniques
34 pages
ds203 MFCC
No ratings yet
ds203 MFCC
6 pages
Practical Cryptography PDF
No ratings yet
Practical Cryptography PDF
10 pages
MFCCs
No ratings yet
MFCCs
12 pages
Audproc 2
No ratings yet
Audproc 2
40 pages
Chapter 4: Pitch Estimation For Music Signal Processing: KH Wong
No ratings yet
Chapter 4: Pitch Estimation For Music Signal Processing: KH Wong
33 pages
Understanding the Mel Scale and MFCCs
No ratings yet
Understanding the Mel Scale and MFCCs
1 page
Padovani
No ratings yet
Padovani
4 pages
Real-Time Pitch Detection Using FFT
No ratings yet
Real-Time Pitch Detection Using FFT
8 pages
Cepstrum Analysis
No ratings yet
Cepstrum Analysis
50 pages
Final Project Report
No ratings yet
Final Project Report
15 pages
Automatic Speaker Recognition Report Hiya
No ratings yet
Automatic Speaker Recognition Report Hiya
8 pages
03 MFCC
No ratings yet
03 MFCC
50 pages
Scale Transform in Speech Analysis
No ratings yet
Scale Transform in Speech Analysis
6 pages
Text-Independent Speaker Identification
No ratings yet
Text-Independent Speaker Identification
9 pages
MFCC Technique For Speech Recognition
No ratings yet
MFCC Technique For Speech Recognition
6 pages
Real and Complex Cepstrum
No ratings yet
Real and Complex Cepstrum
26 pages
Lecture 7 - Automatic Speech Recognition
No ratings yet
Lecture 7 - Automatic Speech Recognition
58 pages
MFCCs in Speech Recognition
No ratings yet
MFCCs in Speech Recognition
14 pages
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
No ratings yet
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
34 pages
MFCC Features: Appendix A
No ratings yet
MFCC Features: Appendix A
19 pages
Cepstrum Pitch Determination: OICED-speech Sounds Result From The Resonant
100% (1)
Cepstrum Pitch Determination: OICED-speech Sounds Result From The Resonant
17 pages
Cepstrum Analysis: Real vs Complex
No ratings yet
Cepstrum Analysis: Real vs Complex
23 pages
Shazam-like Music Identification Lab
No ratings yet
Shazam-like Music Identification Lab
7 pages
Signal Analysis Mini-Project Guide
No ratings yet
Signal Analysis Mini-Project Guide
7 pages
Audio Signal Analysis Techniques
No ratings yet
Audio Signal Analysis Techniques
21 pages
Digital Signal Processing: Periodic Signals
No ratings yet
Digital Signal Processing: Periodic Signals
6 pages
Speech Analysis
No ratings yet
Speech Analysis
6 pages
Spectral Modeling and Signal Processing Intro421
100% (3)
Spectral Modeling and Signal Processing Intro421
35 pages
John Chowning: FM Synthesis
No ratings yet
John Chowning: FM Synthesis
9 pages
New Features in CueMix FX
No ratings yet
New Features in CueMix FX
14 pages
Eup - C - 08-Pitch Tracking
No ratings yet
Eup - C - 08-Pitch Tracking
10 pages
Multi-Band Pitch Estimation Techniques
No ratings yet
Multi-Band Pitch Estimation Techniques
5 pages
AIST2010: Audio Analysis Techniques
No ratings yet
AIST2010: Audio Analysis Techniques
22 pages
CueMix FX Addendum
No ratings yet
CueMix FX Addendum
2 pages
Speech Signal Processing Techniques
No ratings yet
Speech Signal Processing Techniques
5 pages
Video Production SOP: Shooting Tips
No ratings yet
Video Production SOP: Shooting Tips
15 pages
Humanism
No ratings yet
Humanism
2 pages
Edward Scissorhands Ice Dance Sheet Music
No ratings yet
Edward Scissorhands Ice Dance Sheet Music
16 pages
Brahms's Horn Trio: Adagio Mesto Analysis
100% (1)
Brahms's Horn Trio: Adagio Mesto Analysis
23 pages
Edoc - Pub Manual de International CF 600
No ratings yet
Edoc - Pub Manual de International CF 600
119 pages
Music and Arts 4 Lesson
No ratings yet
Music and Arts 4 Lesson
4 pages
LT105
100% (2)
LT105
126 pages
Test Shortlist IIT Dhanbad
No ratings yet
Test Shortlist IIT Dhanbad
7 pages
Folk Traditions
No ratings yet
Folk Traditions
12 pages
(Percussion) (Snare) Drums - 40 Drum Rudiments
100% (1)
(Percussion) (Snare) Drums - 40 Drum Rudiments
20 pages
Tribute to Van Gogh in McLean's Lyrics
No ratings yet
Tribute to Van Gogh in McLean's Lyrics
5 pages
Philippine Folk Dance Module
No ratings yet
Philippine Folk Dance Module
17 pages
17 Saal Ki Kunwari Choot
75% (4)
17 Saal Ki Kunwari Choot
2 pages
EL/K-1891 SATCOM Network Overview
0% (1)
EL/K-1891 SATCOM Network Overview
2 pages
Jawapan Bahasa Inggeris T456
No ratings yet
Jawapan Bahasa Inggeris T456
4 pages
Sound - Wikipedia, The Free Encyclopedia
No ratings yet
Sound - Wikipedia, The Free Encyclopedia
6 pages
Fdocuments - in Phased Array Book
No ratings yet
Fdocuments - in Phased Array Book
362 pages
Sample Site Zte
No ratings yet
Sample Site Zte
23 pages
This Is Peru!: Activity 4: I Love Peru! Lead in Match. A B C D E
100% (1)
This Is Peru!: Activity 4: I Love Peru! Lead in Match. A B C D E
3 pages
Tchaikovsky Hamlet
No ratings yet
Tchaikovsky Hamlet
62 pages
Gibson PDF
No ratings yet
Gibson PDF
39 pages
65° Panel Antenna Specifications
0% (1)
65° Panel Antenna Specifications
2 pages
Danna Cheung - Resume 2020
No ratings yet
Danna Cheung - Resume 2020
3 pages
Bach Concerto in A Minor Bass
No ratings yet
Bach Concerto in A Minor Bass
7 pages
Thaxted
No ratings yet
Thaxted
1 page
Watch My Life With The Walter Boys Netflix Official Site
No ratings yet
Watch My Life With The Walter Boys Netflix Official Site
1 page
The Period of The New Society NEW
75% (4)
The Period of The New Society NEW
39 pages
Prewar Dobro Identification
100% (1)
Prewar Dobro Identification
23 pages
Accessories
No ratings yet
Accessories
252 pages
Unit 5 Lesson 2
No ratings yet
Unit 5 Lesson 2
3 pages

Understanding Mel Spectrograms

Uploaded by

Understanding Mel Spectrograms

Uploaded by

MEL SPECTROGRAM,FREQUENCY,CEPSTRUM AND SPECTRUM

Dr. C Santhosh Kumar,

• How do we capture this information digitally?

• We can take samples of the air pressure over time.

• An audio signal is comprised of several single-frequency sound waves.

• The Fourier transform is a mathematical formula that allows us to decompose a

• We are better at detecting differences in lower frequencies than higher frequencies.

• This is called the mel scale.

Spectrum Frequency Filtering Harmonic

Log power spectrum

Log power spectrum Cepstrum

Log power spectrum Cepstrum

Log power spectrum Cepstrum

Log power spectrum Cepstrum

You might also like