1
Report
Name
Institution
Course
Instructor
Date
2
1. Use the MATLAB function xcorr() to compute the autocorrelation function, rather than
lpc’s FFT-based approach. Note that you will need to use the lags array from the output of
xcorr() to determine the index for the autocorrelation function array at zero lag.
3
2. Solve directly for the linear predictor co-efficients from Eq (5.16) using direct matrix
inversion, rather than the function levinson() that is used within lpc().
LPC coefficients: 1.0000 -0.9071 0.0110 0.0114 0.0555
Prediction error variance: 10.0980
3. Compute the error variance directly, rather than using the function levinson() that is
used
within lpc().
4
Prediction error variance: 66.9020
Part B – LPC analysis and synthesis of a vowel
1. Apply a 25-ms rectangular window to the speech signal, centered around the peak of
the vowel segment(/u/) of the signal.
2. Plot the autocorrelation function of this windowed segment.
3. Compute the linear prediction coefficients using your function mylpc() with 16 poles.
Linear Prediction Coefficients:
1.4130
5
-0.3409
-0.1293
0.0998
0.0897
-0.3642
0.1928
-0.0713
0.1013
-0.0670
-0.0819
0.2247
-0.1392
0.0810
-0.1577
0.0464
4. Plot the log-magnitude of the resulting frequency response:
6
7
The FFT of the windowed signal shows the actual spectral content. The LPC model
approximates the spectral envelope, capturing the overall shape. Peaks in the LPC spectrum
should correspond to formants in the original signal.
5. Using your estimates of the predictor coefficients from above, compute the prediction
error signal associated with this vowel segment and plot it. Also plot the original
windowed segment and the estimated signal segment. From the prediction error signal,
what conclusions can you draw about the model (i.e., all-pole/impulse-train-driven) and
estimation accuracy?
Prediction Error Variance: NaN
Audio signal loaded.
Length of the audio signal: 8139 samples.
Window length in samples: 400
Start index: 3870, End index: 4270
Windowed signal length: 401 samples.
Plots generated for original, estimated signal and prediction error signal.
8
Conclusions.
9
If the prediction error signal has low amplitude, it indicates that the all-pole model is accurately
capturing the characteristics of the signal. High amplitude in the prediction error signal may
suggest that the model does not adequately represent the signal dynamics. The estimated signal
should closely follow the shape of the original windowed signal if the LPC model is effective.
Any discrepancies in the prediction error signal may indicate modeling limitations.
6. Using the prediction error signal that you computed above, estimate the average pitch
period of the windowed vowel segment.
Estimated pitch period: 1 samples.
Estimated pitch frequency: 16000 Hz.
Conclusions about the estimated pitch period:
1. The estimated pitch period corresponds to the fundamental frequency of the vowel segment.
2. Variations in the pitch period can indicate changes in voice characteristics or vowel quality.
10
7. Using the linear prediction coefficients obtained when using the Hamming window,
synthesize and plot a 200-ms estimate of the vowel /uː/, assuming that the excitation
source is a perfectly periodic train of ideal impulses with the period estimated from
part B.6 above (when using the rectangular window). How does your synthesized
11
waveform differ from the original? Using the MATLAB function soundsc(), listen to
your synthesized vowel and the original recording of the vowel and describe how they
compare.
>> synthesis
Playing synthesized vowel...
Playing original vowel..
Part C – LPC analysis and synthesis of a fricative
1. Extract a segment of the speech signal from assgn1.wav that extends from the
approximate start of the consonant “s” (/s/) to the end of the speech signal and apply a
12
Hamming window to that segment. Using the MATLAB function soundsc(), listen to your
windowed signal, and check that it does not include any residual vowel energy from the
preceding /uː/.
2. Compute the linear prediction coefficients for this windowed fricative using your
function mylpc() with 8 poles.
Linear Prediction Coefficients:
1.0000
-0.0000
0.0000
-0.0000
-0.0000
-0.0000
0.0000
-0.0000
0.0000
3. Plot: i) the autocorrelation function, ii) the predicted signal segment together with
the windowed signal segment, iii) the prediction error signal, and iv) the log-spectrum of
the impulse response of the forward filter together with the log-spectrum of the windowed
fricative. () () H A A =
13
4. Plot the log-spectrum of the prediction error signal, and comment on how “white”
(i.e., flat) the error signal spectrum is relative to the spectrum of the windowed fricative.
14
15
5. Repeat part C.4, increasing the number of poles in your prediction filter until you
judge that the error signal spectrum is sufficiently white. Discuss why that minimum
number of poles might be required to sufficiently whiten the error signal.
Model Complexity
The residual to the linear model forms the prediction error signal which quantifies the signal
portion that the model is incapable of predicting. Greater number of poles means that the model
is capable of following higher order representations of the signal leading to lower correlation in
the prediction error signal.
A white noise spectrum again suggests that the different frequencies have nearly equal power, do
not follow any systematic patterns and hence, the features derived using a particular model
possess the inherent features of the base signal.
Noise vs. Signal Components
Fewer poles may result in a prediction filter that is not able to adequately described the analysis
of the speech signal and therefore, there will be large residues that are correlated and hence are
not white.
As the number of poles is increased, structures on the signal or variations in the signal can be
taken into account, thereby arriving at a stronger model of the signal.
Trade-offs
Explorations for further poles means that the models can be fitted better, but there is depreciation
in accuracy for every addition in complexity. This is a big risk of having too many poles: by
16
modeling the noise, the equalizer ends up mapping not the character of the signal, but artifacts
that make the synthesis worse.
6. Using the MATLAB number generator function randn() to create a random (and
spectrally white) source signal with a duration of 225 ms, synthesize and plot estimates of
the fricative /s/ using the linear prediction coefficient from Part C.2 above (i.e., with 8
poles) and using the coefficients from Part C.5 above (i.e., with the minimum number of
poles that achieve sufficient whitening). Using the MATLAB function soundsc(), listen to
your two estimates and describe how they compare to each other and to the original
recording of the fricative /s/. Discuss whether the synthesized vowel or the synthesized
fricative sounds closer to its respective original signal segment, and why.
Comparison of synthesized sounds
With 8 poles of the synthesized sound, the details of the fricative may come out less clear and
sharp which may lead to less fitting of the overall sound to the intended fricative sound.
So the useful synthesized sound with the minimum number of poles is thought to be closer to the
original fricative, as the poles capture more of the spectral shape of the sound.
When comparing these synthesized sounds to the sound of the recording, these areas should
include: clarity, the frequency band and general quality of the sound. The synthesized fricative
with more poles might sound like more noisy or less clear compared to the fricative with
optimum number of poles might uphold better similarity to the actual spectra properties of the
original sound at time.
17
Discussion
Combination of vowels and fricatives may produce different outcomes because of difference in
their spectral profile. Vowels possess more harmonic content in their spectra than the consonants,
whereas, fricatives are richest in noise-like spectral components. As a result, the synthesized
fricative seems less natural compared to the synthesized vowel because the latter may still bear
resemblance with the actual frequency envelope.