0% found this document useful (0 votes)
77 views38 pages

Understanding Mel Spectrograms

The document discusses the concepts of mel spectrogram, cepstrum, and spectrum in audio signal processing. It explains how audio signals are captured digitally, transformed using Fourier transform, and represented on the mel scale for better human perception. Additionally, it covers the advantages and disadvantages of Mel-Frequency Cepstral Coefficients (MFCCs) and their applications in speech and music processing.

Uploaded by

Bala Murugan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views38 pages

Understanding Mel Spectrograms

The document discusses the concepts of mel spectrogram, cepstrum, and spectrum in audio signal processing. It explains how audio signals are captured digitally, transformed using Fourier transform, and represented on the mel scale for better human perception. Additionally, it covers the advantages and disadvantages of Mel-Frequency Cepstral Coefficients (MFCCs) and their applications in speech and music processing.

Uploaded by

Bala Murugan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MEL SPECTROGRAM,FREQUENCY,CEPSTRUM AND SPECTRUM

Dr. C Santhosh Kumar,


ECE Department
INTRODUCTION-MEL SPECTROGRAM

• A signal is a variation in a certain quantity over time. For audio, the quantity that varies is air
pressure.

• How do we capture this information digitally?

• We can take samples of the air pressure over time.

• The rate at which we sample the data can vary, but is most commonly 44.1kHz, or 44,100
samples per second.

• What we have captured is a waveform for the signal, and this can be interpreted, modified,
and analyzed with computer software.
The Fourier Transform

• An audio signal is comprised of several single-frequency sound waves.

• When taking samples of the signal over time, we only capture the resulting
amplitudes.

• The Fourier transform is a mathematical formula that allows us to decompose a


signal into it’s individual frequencies and the frequency’s amplitude.

• In other words, it converts the signal from the time domain into the frequency
domain. The result is called a spectrum.
The Spectrogram
The Mel Scale
• Studies have shown that humans do not perceive frequencies on a linear scale.

• We are better at detecting differences in lower frequencies than higher frequencies.

• For example, we can easily tell the difference between 500 and 1000 Hz, but we will
hardly be able to tell a difference between 10,000 and 10,500 Hz, even though the
distance between the two pairs are the same.

• In 1937, Stevens, Volkmann, and Newmann proposed a unit of pitch such that equal
distances in pitch sounded equally distant to the listener.

• This is called the mel scale.


The Mel Spectrogram
• A mel spectrogram is a spectrogram where the frequencies are converted to
the mel scale
SUMMARY ON MEL SPECTROGRAM

1.We took samples of air pressure over time to digitally represent an audio signal.

2.We mapped the audio signal from the time domain to the frequency domain using
the fast Fourier transform, and we performed this on overlapping windowed
segments of the audio signal.

3.We converted the y-axis (frequency) to a log scale and the color dimension
(amplitude) to decibels to form the spectrogram.

4.We mapped the y-axis (frequency) onto the mel scale to form the mel spectrogram.
Mel-Frequency Cepstral Coefficients
Cepstrum
Cepstrum
Cepstrum
Spectrum
Cepstrum
Spectrum
Cepstrum Quefrency Liftering Rhamonic

Spectrum Frequency Filtering Harmonic


An historical note on Cepstrum
 Developed while studying echoes in seismic signals (1960s)
 Audio feature of choice for speech recognition / identification (1970s)
 Music processing (2000s)
Computing the cepstrum
Computing the
cepstrum
Time-domain
signal
Computing the cepstrum

Time-domain
Spectrum
signal
Computing the cepstrum

Time-domain
Spectrum
signal

Log spectrum
Computing the cepstrum

Time-domain
Spectrum
signal

Log spectrum

Cepstrum
Visualising the cepstrum
Signal
Visualising the cepstrum
Signal Power spectrum

DFT
Visualising the cepstrum
Power spectrum
Visualising the cepstrum
Power spectrum
Log power spectrum

log
Visualising the cepstrum

Log power spectrum


Visualising the cepstrum

Log power spectrum Cepstrum

IDFT
Visualising the cepstrum

Log power spectrum Cepstrum

IDFT

????
Visualising the cepstrum

Log power spectrum Cepstrum

IDFT
Visualising the cepstrum

Log power spectrum Cepstrum

IDFT
1st rhamonic
MFCCs advantages
 Describe the “large” structures of the spectrum
 Ignore fine spectral structures
 Work well in speech and music processing
MFCCs disadvantages
 Not robust to noise
 Extensive knowledge engineering
 Not efficient for synthesis
MFCCs applications
 Speech processing
 Speech recognition

 Speaker recognition

 Music processing
 Music genre classification

 Mood classification

 Automatic tagging
THANK YOU

You might also like