0% found this document useful (0 votes)
24 views17 pages

Module 5 Data Compression KTU

The document discusses audio compression, focusing on concepts such as decibels (dB), which measure sound intensity on a logarithmic scale, and the difference between silence and companding strategies for lossy compression. It explains the human auditory system's frequency sensitivity, psychoacoustic modeling, and various compression methods, including conventional and lossy techniques that leverage human perception. Additionally, it covers the principles of digitization, sampling rates, and encoding methods like μ-law and A-law used in audio compression.

Uploaded by

ADHYA PRATHEESH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views17 pages

Module 5 Data Compression KTU

The document discusses audio compression, focusing on concepts such as decibels (dB), which measure sound intensity on a logarithmic scale, and the difference between silence and companding strategies for lossy compression. It explains the human auditory system's frequency sensitivity, psychoacoustic modeling, and various compression methods, including conventional and lossy techniques that leverage human perception. Additionally, it covers the principles of digitization, sampling rates, and encoding methods like μ-law and A-law used in audio compression.

Uploaded by

ADHYA PRATHEESH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 5 Audio Compression.

Define the unit decibles (db) and what aspect of the audio signal does it

measure?

Distinguish between silence and companding strategies for lossy audio

compression.
Audible noise (sound) is generated by physically vibrating material.

Any object, from a guitar string to a human vocal cord, can produce sound when it vibrates.
When an object vibrates, it disturbs the surrounding air molecules, causing them to move
back and forth.

The vibration produces pressure wave in the air, the pressure wave

travel through the air, and ultimately cause our eardrums to vibrate.

The vibration of our ear drums is converted into electrical pulses.

The brain interprets these electrical signals as sound, allowing us to perceive and understand
the auditory information. The brain processes various aspects of the sound, such as its pitch,
volume, and timbre, enabling us to recognize and differentiate between different sounds.
Since sound is a pressure wave, it takes on continuous values

As any other wave, sound has three important attributes, its speed, amplitude, and
period.

▹ Amplitude is the measure of the sound wave's strength or its maximum displacement
from a rest position.

▹ The speed of sound refers to how fast sound waves travel through a medium, such as air,
water, or steel. It's determined by the medium's properties, especially its temperature,
density, and elasticity

Frequency measures how many cycles (vibrations or oscillations) a sound wave completes in one
second. It's expressed in hertz (Hz), where one hertz equals one cycle per second
The speed of sound depends mostly on the medium it passes through, and on the
Temperature

The human ear is sensitive to a wide range of sound frequencies, normally from
about (16 to 20 Hz ) to about ( 20 to 22) KHz, depending on the person’s age and health. This is
the range of audible frequencies.

The sensitivity of the human ear to sound level depends on the frequency.

The range of the human voice is much more limited. It is only from about 500 Hz to about 2 kHz

The period of a sound wave is the time it takes for one complete cycle of the wave to pass a given
point. It's the reciprocal of the frequency, representing the duration of one cycle in seconds

WHAT IS A DECIBEL:

The problem with measuring noise intensity is that the ear is sensitive to a very
wide range of sound levels (amplitudes).(from 1 to 1011)

It is inconvenient to deal with measurements


in such a wide range, which is why the units of sound loudness use a logarithmic scale.

The (base-10) logarithm of 1 is zero, and the logarithm of 1011 is 11.


Using logarithms, we only have to deal with numbers in the range 0 through 11.
we multiply it by 10 or by 20, to get decibel system of
measurement.

The decibel (dB) unit is defined as the base-10 logarithm of the ratio of two physical
quantities whose units are powers The logarithm is then multiplied
by the convenient scale factor 10. (If the scale factor is not used, the result is measured
in units called “Bell”)

Thus, we have

where P1 and P2 are measured in units of power such as watt, joule/sec etc.

The numerator, p1, is the power (in microwatts) of the sound whose intensity level is being
measured.

It is convenient to select as the denominator the number of microwatts that produce the faintest
audible sound This number is shown by experiment to be
10−6 microwatt = 10−12 watt.

e.g. if P1=1Watt

Pr=Sound pressure(P is proportional to the square of the sound pressure Pr)


SPL-Sound pressure level

E.g. Let the sound level initally be 70 Db, What happens if P1 is doubled?

Digitization of Sound

• Analog audio represents sound waves as continuously varying electrical signals

• Digital audio represents sound using discrete numerical values, typically in binary form
(0s and 1s).

• Digitization is the process of representing various types of information in a form that can be

stored and processed by a digital device. It is the combined operations of sampling ,quantization
and encoding also called analog-to- digital (A/D) conversion.

A digital-to-analog converter (DAC) convert the numeric samples


back into voltages that are continuously fed into a speaker

For audio, typical sampling rates are from 8 kHz to 48 kHz. This range is determined by the
Nyquisttheorem.

Audio sampling technique-PULSE CODE MODULATION

Nyquist Theorem

• The Nyquist theorem states how frequently we must sample to be able to recover the original
sound.

For correct sampling we must use a sampling rate equal to at least twice the maximum
frequency content in the signa(or twice the bandwidth of the signal). This rate is called the
Nyquist rate.

• The range of human hearing is typically from 16–20 Hz to 20,000–22,000 Hz, depending
on the person and on age. When sound is digitized at high fidelity, it should therefore be
sampled at a little over the Nyquist rate of 2×22000 = 44000 Hz. This is why high-quality
digital sound is based on a 44,100-Hz sampling rate

The Human Auditory System

The frequency range of the human ear is from about 20 Hz to about 20,000 Hz, but the
ear’s sensitivity to sound is not uniform. It depends on the frequency, and experiments
indicate that in a quiet environment the ear’s sensitivity is maximal for frequencies in the
range 2 KHz to 4 KHz

. The existence of the hearing threshold suggests an approach to lossy audio compression. Just
delete any audio samples that are below the threshold.
If a signal for frequency f is smaller than the hearing threshold at f, it (the signal) should be
deleted.

Psychoacoustic Modelling

. It helps in understanding how humans perceive sound,

Key Concepts in Psychoacoustic Modelling

1. Frequency Masking (Simultaneous Masking)


o When a loud sound (masker) makes a nearby frequency less audible.
o Used in perceptual audio coding (e.g., MP3, AAC) to remove inaudible frequencies.
2. Temporal Masking
o A sound is masked by another that occurs just before or after it.
o Helps reduce data in audio compression without perceptible loss.
3. Critical Bands and the Bark Scale
o The ear perceives sound in frequency bands rather than individual frequencies.
o The Bark scale represents these critical bands, aiding in perceptual audio analysis.

The range of audible frequencies can be partitioned into a number of critical bands
▹ Critical bands are frequency bands within which two tones may interfere with each other
and be perceived as a single auditory event.
The critical band concept helps us understand how the threshold at a specific frequency is
affected by nearby sounds. If a sound occurs within the critical band of a certain frequency, it
has the potential to raise the threshold at that frequency.

Here's a breakdown of how it works:

1. Critical Bands: Each frequency has its own critical band, which is like a listening zone
or range of frequencies that the ear perceives as a group. This critical band widens as
the frequency increases.
2. Threshold Increase: When a sound occurs within the critical band of a particular
frequency, it can raise the threshold or sensitivity level at that frequency. This means
that the ear becomes less sensitive to other sounds in that frequency range because it's
already occupied by a significant sound.
3. Effects of Nearby Sounds: Sounds occurring outside of the critical band of a specific
frequency typically don't affect the threshold at that frequency. However, sounds within
the critical band can mask or obscure other sounds nearby, making them harder to
detect.

The width of a critical band is called its size. The widths of the critical bands

introduce a new unit, the Bark

one Bark is the width (in Hz) of one critical band. The Bark is defined as

In audio compression, knowledge of critical bands is utilized to allocate bits more efficiently.
Rather than allocating the same number of bits to every frequency component, more bits can be
assigned to critical bands with significant audio information while fewer bits are allocated to
less critical bands where the human ear is less sensitive.

Frequency masking (also known as auditory masking)


When two sounds fall within the same critical band, they interfere with each other more than if
they were in separate bands. This interference can result in masking, where one sound makes
another sound harder to hear.

How One Signal Raises the Threshold of Another (Masking Effect)

When a strong (loud) signal is present in a critical band, it raises the threshold of hearing for
other signals in that band. This means weaker signals that would normally be heard become
inaudible. This effect is called simultaneous masking and can occur in two ways:

Upward Masking: A low-frequency (bass) sound masks higher-frequency sounds. This


happens because low-frequency sounds create broader excitation patterns in the cochlea.

Downward Masking: A high-frequency sound slightly masks lower frequencies, though this
effect is weaker.

▹ a louder sound within a critical band can mask a quieter sound occurring nearby in
frequency.
▹ Because of the ear’s limited perception of frequencies, the threshold at a frequency f is
raised by a nearby sound only if the sound is within the critical band of f.
The strong sound source raises the normal threshold in its vicinity with
the result that the nearby sound represented by the arrow at “x”, a sound that would
normally be audible because it is above the threshold, is now masked and is inaudible.

A good lossy audio compression method should identify this case and delete the signals
corresponding to sound “x”, since it cannot be heard anyway.

NOISE REJECTION BY MEANS OF MASKING

a. Signal to Noise Ratio (SNR)

• The ratio of the power of the correct signal and the noise is called the signal to noise ratio
(SNR) — a measure of the quality of the signal.

The SNR is usually measured in decibels (dB),


b) Signal to mask ratio (SMR): The SMR at a given frequency is expressed

as the difference (in dB) between the SPL of the masker and the masking

threshold at that frequency

c.) Mask to noise ratio (MNR): The MNR at a given frequency is expressed

as the difference (in dB) between the masking threshold at that frequency

and the noise level. To make the noise inaudible, its level should be below

the masking threshold i.,e the MNR should be positive.


Temporal masking may occur when a strong sound A of frequency f is preceded or
followed in time by a weaker sound B at a nearby (or the same) frequency. If the time
interval between the sounds is short, sound B may not be audible

Sounds that occur


in an interval around the masking sound (both after and before the masking tone) can be
masked. If the masked sound occurs prior to the masking tone, this is called premasking or
backward masking, and if the sound being masked occurs after the masking tone this
effect is called postmasking or forward masking.

Psychoacoustic Model Summary

A psychoacoustic model is a mathematical model that simulates the human auditory system's
perception of sound. It aims to replicate how humans hear and perceive audio signals, taking
into account factors such as frequency masking, temporal masking, and spatial localization.

The first step in the


psychoacoustic model is to obtain a spectral profile of the signal being encoded. The audio
input is windowed and transformed into the frequency domain using a filter bank or a
frequency domain transform. The Sound Pressure Level (SPL) is calculated for each spectral
band. If the algorithm uses a subband approach, then the SPL for the band is computed from
the SPL for each

OTHER DEFINITIONS:

\
The dynamic range is the ratio of maximum to minimum

absolute values of the signal: Vmax /Vmin.

SOUND COMPRESSION METHODS

▹ Conventional compression methods, such as RLE, statistical, and dictionary-based, can


be used to losslessly compress sound files, but the results depend heavily on the specific
sound.

▹ Better sound compression can be attained by developing lossy methods that take
advantage of our perception of sound, and discard data to which the human ear is not
sensitive

▹ two approaches

▹ silence compression

▹ companding
THIS METHOD USES THE FOLLOWING PARAMETERS:

1. Parameter that specifies the largest sample that should be suppressed.


2. Parameter that specifies the shortest
run-length of small samples, typically 2 or 3.
3. Parameter that specifies the minimum number of consecutive large samples that should
terminate a run of silence
▹ Compression:

▸ In the compression stage, the dynamic range of the signal is reduced.

▸ This is done by applying a non-linear function that reduces the amplitude of the
signal at higher levels while leaving the lower levels relatively unchanged.

▸ By compressing the dynamic range, weaker signals are amplified, and stronger
signals are attenuated, leading to a more uniform signal level.

▸ Because the dynamic range has been reduced, the signal requires less bandwidth
or storage space compared to the original uncompressed signal.

▹ Expansion

▸ At the receiving end, the compressed signal is expanded back to its original
dynamic range.

▸ This is achieved by applying the inverse of the compression function used in the
first stage.

▸ The weaker signals are attenuated, and the stronger signals are amplified,
restoring the original signal's dynamic range.
The mapped 15-bit numbers can be
decoded back into the original 16-bit samples by the inverse formula

Disadv: Reducing 16-bit numbers to 15-bits doesn’t produce much compression.

Hence More sophisticated methods, such as μ-law and A-law, are commonly used and have been
made international standards.

Law Encoders:

▹ Refers to a device or algorithm used in companding systems.

▹ Companding, short for compression-expansion, is a technique used to improve the


signal-to-noise ratio (SNR) of an analog signal by compressing the dynamic range of the
signal before transmission or recording and then expanding it back to its original range
at the receiving end.

▹ a law encoder is responsible for applying a specific mathematical function or encoding


scheme to the input signal before compression.

▹ This encoding scheme determines how the input signal is mapped to the compressed
domain.

▹ The most common law encoders include A-law and μ-law encoding,

▹ it compresses the dynamic range of the input signal logarithmically, but with a different
mathematical function.

▹ μ-law encoding allocates more quantization levels to low-amplitude signals and fewer
levels to high-amplitude signals,

▹ It is defined by a piecewise linear function that maps the input signal to a quantized
output value based on its amplitude
Output is an 8-bit code in the same interval
[−1, +1]. The output is then scaled to the range [−256, +255]. Bigger samples are decoded with
more noise, and smaller samples are decoded with
less noise.

Logarithms are slow to calculate, so the μ-law encoder performs much simpler calculations
that produce an approximation

▹ Here, P represents the sign of the input sample.

▹ Bits S2, S1 and S0 are the segment code.

Q3, Q2 and Q0 are the quantization code

Encoder:

▹ The encoder determines the segment code by adding a bias of 33 to the absolute value of
input sample.

▹ Determining the bit position of the most significant one bit among bits 5 to 12 of the
input and subtracting 5 from that position.

▹ The 4 bit quantization code is set to 4 bits following the bit position determined in step b.

▹ The encoder ignores the remaining bits of the input sample and inverts the code word at
its output.
- A-law Encoding

▹ A-law encoding is a logarithmic companding algorithm used in telecommunications,


particularly in Europe and Japan.

▹ It compresses the dynamic range of the input signal logarithmically, allocating more
quantization levels to low-amplitude signals and fewer levels to high-amplitude signals.

▹ A-law encoding is defined by a nonlinear piecewise function that maps the input signal
to a quantized output value based on its amplitude.

▹ It is characterized by a higher resolution for low-level signals, which improves the


representation of quiet sounds and enhances the signal-to-noise ratio (SNR) of the
transmitted or recorded audio.
The Alaw inputs 13-bit samples
The A-law encoder generates an 8-bit codeword with the same format as the μ-law
encoder.
It sets the P bit to the sign of the input sample.
It then determines the segment code by
1. Determining the bit position of the most significant 1-bit among the seven most
significant bits of the input.
2. If such a 1-bit is found, the segment code becomes that position minus 4. Otherwise,
the segment code becomes zero.

The 4-bit quantization code is set to the four bits following the bit position deter mined in step 1,
or to half the input value if the segment code is zero.
The encoder ignores the remaining bits of the input sample, and it inverts bit P and the even-
numbered bits of the codeword before it is output
The two methods are similar; they differ
mostly in their quantizations (midtread vs. midriser).

You might also like