0% found this document useful (0 votes)
11 views17 pages

Audio Processing

Uploaded by

maritimjohn058
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

Audio Processing

Uploaded by

maritimjohn058
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

1

Report

Name

Institution

Course

Instructor

Date
2

1. Use the MATLAB function xcorr() to compute the autocorrelation function, rather than

lpc’s FFT-based approach. Note that you will need to use the lags array from the output of

xcorr() to determine the index for the autocorrelation function array at zero lag.
3

2. Solve directly for the linear predictor co-efficients from Eq (5.16) using direct matrix

inversion, rather than the function levinson() that is used within lpc().

LPC coefficients: 1.0000 -0.9071 0.0110 0.0114 0.0555

Prediction error variance: 10.0980

3. Compute the error variance directly, rather than using the function levinson() that is

used

within lpc().
4

Prediction error variance: 66.9020

Part B – LPC analysis and synthesis of a vowel

1. Apply a 25-ms rectangular window to the speech signal, centered around the peak of

the vowel segment(/u/) of the signal.

2. Plot the autocorrelation function of this windowed segment.

3. Compute the linear prediction coefficients using your function mylpc() with 16 poles.

Linear Prediction Coefficients:

1.4130
5

-0.3409

-0.1293

0.0998

0.0897

-0.3642

0.1928

-0.0713

0.1013

-0.0670

-0.0819

0.2247

-0.1392

0.0810

-0.1577

0.0464

4. Plot the log-magnitude of the resulting frequency response:


6
7

The FFT of the windowed signal shows the actual spectral content. The LPC model

approximates the spectral envelope, capturing the overall shape. Peaks in the LPC spectrum

should correspond to formants in the original signal.

5. Using your estimates of the predictor coefficients from above, compute the prediction

error signal associated with this vowel segment and plot it. Also plot the original

windowed segment and the estimated signal segment. From the prediction error signal,

what conclusions can you draw about the model (i.e., all-pole/impulse-train-driven) and

estimation accuracy?

Prediction Error Variance: NaN

Audio signal loaded.

Length of the audio signal: 8139 samples.

Window length in samples: 400

Start index: 3870, End index: 4270

Windowed signal length: 401 samples.

Plots generated for original, estimated signal and prediction error signal.
8

Conclusions.
9

If the prediction error signal has low amplitude, it indicates that the all-pole model is accurately

capturing the characteristics of the signal. High amplitude in the prediction error signal may

suggest that the model does not adequately represent the signal dynamics. The estimated signal

should closely follow the shape of the original windowed signal if the LPC model is effective.

Any discrepancies in the prediction error signal may indicate modeling limitations.

6. Using the prediction error signal that you computed above, estimate the average pitch

period of the windowed vowel segment.

Estimated pitch period: 1 samples.

Estimated pitch frequency: 16000 Hz.

Conclusions about the estimated pitch period:

1. The estimated pitch period corresponds to the fundamental frequency of the vowel segment.

2. Variations in the pitch period can indicate changes in voice characteristics or vowel quality.
10

7. Using the linear prediction coefficients obtained when using the Hamming window,

synthesize and plot a 200-ms estimate of the vowel /uː/, assuming that the excitation

source is a perfectly periodic train of ideal impulses with the period estimated from

part B.6 above (when using the rectangular window). How does your synthesized
11

waveform differ from the original? Using the MATLAB function soundsc(), listen to

your synthesized vowel and the original recording of the vowel and describe how they

compare.

>> synthesis

Playing synthesized vowel...

Playing original vowel..

Part C – LPC analysis and synthesis of a fricative

1. Extract a segment of the speech signal from assgn1.wav that extends from the

approximate start of the consonant “s” (/s/) to the end of the speech signal and apply a
12

Hamming window to that segment. Using the MATLAB function soundsc(), listen to your

windowed signal, and check that it does not include any residual vowel energy from the

preceding /uː/.

2. Compute the linear prediction coefficients for this windowed fricative using your

function mylpc() with 8 poles.

Linear Prediction Coefficients:

1.0000

-0.0000

0.0000

-0.0000

-0.0000

-0.0000

0.0000

-0.0000

0.0000

3. Plot: i) the autocorrelation function, ii) the predicted signal segment together with

the windowed signal segment, iii) the prediction error signal, and iv) the log-spectrum of

the impulse response of the forward filter together with the log-spectrum of the windowed

fricative. () () H A A  =
13

4. Plot the log-spectrum of the prediction error signal, and comment on how “white”

(i.e., flat) the error signal spectrum is relative to the spectrum of the windowed fricative.
14
15

5. Repeat part C.4, increasing the number of poles in your prediction filter until you

judge that the error signal spectrum is sufficiently white. Discuss why that minimum

number of poles might be required to sufficiently whiten the error signal.

Model Complexity

The residual to the linear model forms the prediction error signal which quantifies the signal

portion that the model is incapable of predicting. Greater number of poles means that the model

is capable of following higher order representations of the signal leading to lower correlation in

the prediction error signal.

A white noise spectrum again suggests that the different frequencies have nearly equal power, do

not follow any systematic patterns and hence, the features derived using a particular model

possess the inherent features of the base signal.

Noise vs. Signal Components

Fewer poles may result in a prediction filter that is not able to adequately described the analysis

of the speech signal and therefore, there will be large residues that are correlated and hence are

not white.

As the number of poles is increased, structures on the signal or variations in the signal can be

taken into account, thereby arriving at a stronger model of the signal.

Trade-offs

Explorations for further poles means that the models can be fitted better, but there is depreciation

in accuracy for every addition in complexity. This is a big risk of having too many poles: by
16

modeling the noise, the equalizer ends up mapping not the character of the signal, but artifacts

that make the synthesis worse.

6. Using the MATLAB number generator function randn() to create a random (and

spectrally white) source signal with a duration of 225 ms, synthesize and plot estimates of

the fricative /s/ using the linear prediction coefficient from Part C.2 above (i.e., with 8

poles) and using the coefficients from Part C.5 above (i.e., with the minimum number of

poles that achieve sufficient whitening). Using the MATLAB function soundsc(), listen to

your two estimates and describe how they compare to each other and to the original

recording of the fricative /s/. Discuss whether the synthesized vowel or the synthesized

fricative sounds closer to its respective original signal segment, and why.

Comparison of synthesized sounds

With 8 poles of the synthesized sound, the details of the fricative may come out less clear and

sharp which may lead to less fitting of the overall sound to the intended fricative sound.

So the useful synthesized sound with the minimum number of poles is thought to be closer to the

original fricative, as the poles capture more of the spectral shape of the sound.

When comparing these synthesized sounds to the sound of the recording, these areas should

include: clarity, the frequency band and general quality of the sound. The synthesized fricative

with more poles might sound like more noisy or less clear compared to the fricative with

optimum number of poles might uphold better similarity to the actual spectra properties of the

original sound at time.


17

Discussion

Combination of vowels and fricatives may produce different outcomes because of difference in

their spectral profile. Vowels possess more harmonic content in their spectra than the consonants,

whereas, fricatives are richest in noise-like spectral components. As a result, the synthesized

fricative seems less natural compared to the synthesized vowel because the latter may still bear

resemblance with the actual frequency envelope.

You might also like