0% found this document useful (0 votes)
29 views3 pages

Lecours 1968

The document discusses the spectral analysis of animal sounds, particularly focusing on the harmonics present in the calls of the male killer whale and the implications for bioacoustics. It highlights the importance of filter bandwidth in analyzing nonstationary signals like speech, and presents findings from experiments comparing different spectrum analyzers for speech recognition. The results indicate that narrow-band filters are preferable for recognizing vowels, while wide-band filters are better for classifying consonants.

Uploaded by

bmagic20061101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views3 pages

Lecours 1968

The document discusses the spectral analysis of animal sounds, particularly focusing on the harmonics present in the calls of the male killer whale and the implications for bioacoustics. It highlights the importance of filter bandwidth in analyzing nonstationary signals like speech, and presents findings from experiments comparing different spectrum analyzers for speech recognition. The results indicate that narrow-band filters are preferable for recognizing vowels, while wide-band filters are better for classifying consonants.

Uploaded by

bmagic20061101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Comments on “Spectral Analysis case of a digital analysis, this informa-

of the Calls of the Male Killer tion can be conveyed by giving the time
Whale” duration of the data set, the data window
function used, and the smoothing ap-
In theabovepaper,’Singleton and plied tothe estimated power spectral
Poulter point to an abundance of har- density function. With a sample of
monics present in spectralanalyses of length T, the spectral estimates are
many animalsounds.They comment spaced 1/T apart in the frequency do-
that workers in bioacoustics have be- main. In the analysis of a nonstationary
lieved that some of these harmonics may signal, as for examplespeech data, the
Correspondence be artifacts of the equipment, and cite
BLlsnel and Watkins. Busne12 does not
choice of Twill represent a compromise
between resolution in time and in fre-
refer to artifacts in analysis and doesnot quency.
even mention harmonics; he referred to R. C. SINGLETON
frequency shifts, that he thought might T. C. POULTER
have been caused by Doppler, as arti- Stanford Research Institute
facts. The reference to my work is Menlo Park, Calif. 94025
to a recent paper.a I did not suggest
either in my paper or in the oral presen-
tation “that harmonicsinduced by alter-
ing the signalare in any way spurious or
nonexistent,” as Singleton and Poulter
infer. I did state that a knowledge of the
analyzing filter bandwidth (which di-
rectly affects response time) is critical for
Adaptive Spectral Analysis for
the interpretation of spectrographic
Speech-Sound Recognition
analysis. For example, a train of pulses
may be portrayed in the analysis either
by discrete pulses or by its equivalent Abstract
harmonic structure,depending entirely A patternrecognitionalgorithm has been
on the analyzing filter bandwidth em- used to compare the usefulness of two types
ployed. of spectrum analyzersfor speech recognition.
WILLIAMA. WATKINS A number of speech researchers have
Woods Hole Oceanographic Institution noted that it might be useful to allow
Woods Hole, Mass. 02543 the parameters of speech spectrum ana-
lyzers to vary according to the charac-
Manuscript rcceived November 13. 1967.
teristics of the speech sounds being pro-
1 R. C. Singleton ;und T. C. Poulter, IEEE Trans.
Audio and Elecfroucoustirs, vol. AU-15, pp. 104-113,
cessed. In particular, it seems clear that
June 1967. the use of narrow-band filters should en-
1 R . G . Busnel,“Informationin
the
human
whistled language and s a mammalwhistling,”in
hance the formant pattern of vowels,
IW~uics,Doiplrbrs, and Porpoises, K. S. Norris,Ed. and that wide-bandwidth filters should
Rerkclcy, Calif.: University of Clllifornia Press. 1966,
pp. 544-568. bring out thenoise-like structure of con-
* W. A. Watkins, “The harmonic interval: fact or sonants.
artifact in spcctrnl analysis of pulse trains,” in Marbw
Bioucousrics TI, W. N. Tavolga. Ed. New York: The possible uses of a variable param-
Pergnnlon, 1967, pp. 15-42,.(Paperpresented nt the eter spectrum analyzer for the purpose
1966 Symp. on Mwinc Bioacoustics.)
of recognizing simple speech sounds
have been investigated. A pattern recog-
nition algorithm has been used to com-
pare an analyzer having 32 active Gauss-
ian filters of 100-Hz bandwidth and one
having 16 filters of 200-Hz bandwidth
Authors’ Reply4
but twice the time resolution. Experi-
Watkins’ pape? was unavailable at mental results with the four stop conso-
the time of writing our paper, and ap- nants /p/, /b/, /i/, /d/ taken two by
parently our memory of his oral presen- two in initial position,and with the pairs
tation was faulty. Perhaps we were mis- of vowels /a/, / 3 / and/e/,/i/ have
led by his title, but in any eventwe apol- been obtained. Narrow-band filters have
ogize for misinterpreting his remarks. been found to be preferable for recog-
We would agree with Watkins that the nizing vowels, and wide-band filters
effective filter bandwidth, or equivalent appear to be better for classifying st!
information, shouldbe included in re- consonants when only the data OP
porting spectral analysis results. In the main body of the consonant are UI

4 Manuscript received December 18. 1967. Manuscript received May 29, 1968.
3.2 kHz -

0.0 -

Fig. 1. Spectrogram of " W h a t did YOU talk about?" with 100-Hz Fig. 2. Spectrogram of " W h a td i d you talk about?" with 200-Hz
Alters. Horizontal sca18; 10 ms/div. filters. Horizontal scale: 5 mr/div.

TABLE
Number of Sounds Correctly Classified on a Given Total
(to Starting Point of the Consonant)
____ ___
Sounds
Time Interval , Recognition
Performance

I
From (ms) To (ms) 100 Hz 200 Hz
I -__I- ___
P-b fU to+40 24/52 40/55
__I .___I-

P-t tu h+20 62/90 47/90


--_1_1-

fa hS40 39/49 21/46


t-d
to4-30 tuS50 34/57 47/61
1111_--

to hS20 71/96 79/96


b-t
1 to+20 I t0+40 I 47/64 I 39/64

An adaptive spectral analyzer is an present in the different regions of the came to similar conclusions through
instrument which can match its param- spectrum is indicated by intensity modu- studies of auditory signal detectability in
eters to those of an incoming signal. A lation as well as some horizontal deflec- noise; Creelman [7], for example, inter-
spectral analyzer with variable param- tion. Figs. 1 and 2 give an idea of the dif- prets his results "as suggesting that ob-
eters has been designed by Thomas [l]. ferences
between the two analyzing servers areable, with high amplitude
It is an analyzer of the bank-of-filters modes selected for the same speech complex signals as well as with sinusoids,
type with active filterswhich are read sounds; it can be seen that the formant to match their receptive systems to the
and quenched simultaneously at fixed structure is much better defined in the signals to be detected." It appears then
intervals. 100-Hz mode (Fig. l), while the noise that the existence of a n auditory uncer-
Increasing the rate of quenching (the structure is enhanced in the 200-Hz tainty relationapplicable to the whole
time resolution) increases the frequency display. audio-frequency range and atleast in the
bandwidth of the filters (decreases the Arguments in favor of adaptive spec- time interval from 3 to 300 milliseconds
frequency resolution). The active filters trum analysis can be drawn fromthe is virtually certain, althoughan exact
have a sin ( x ) / x frequency response but, study of speech characteristics and of measure of this relation has yet to be
when the input signal to the analyzer is the human hearing system. Gabor [2], found. It should be mentioned also that
multiplied by a raised cosine synchro- having applied to signal analysis Heisen- phoneticians and speech researchers use
nized to the quenching pulse and having berg's uncertainty principle, showed large-bandwidth filters in spectrum anal-
a period equal to the duration of the thatthe ear can trade time and fre- ysis when they want to study the fine
interval between twosuccessive read- quency resolution between the limits of details and the temporal effects in the
ings, the filters have effectively nearly an uncertainty relation of the form speech sounds, and narrow-bandwidth
Gaussiantime and frequency charac- filters when studying the formant struc-
Af.At = constant.
teristics. The output of the analyzer can ture.
be seen on a display oscilloscope as Different authors [3]-[5] have sup- In the first phase of the experiments,
shown in Figs. 1 and 2; the frequency ported his conclusions by taking mea- data were recorded on the same speech
axis is vertical, the time axis goes hori- surements of phenomena related to the sounds for different analyzing modes; a
zontally from left to right. The energy uncertainty relation. Others [6]-[8] few speech sounds were then selected for

524 IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS DECEMBER 1968


learning, and the discriminant function A conclusion which can bedrawn
finally arrived at was tested. from these results is thatthe100-HZ
A linear hyperplane method was mode has shown itself superior to the
chosen for this work. It has been pro- 200-Hz modefor recognizingvowel
posed and used by Braverman [9]for the soundsandfor classifying consonants
recognition of printeddigits. Turski with the help of the frequency transi- 1959.
[4]Chic-an Liang and L. A. Chistovich,
[lo] has published a modification of the tions between consonantsand vowels. “Frequency differencelimens as a func-
algorithm and large parts of it were used In other words, the 100-Hz representa- tion of tonal duration,” Souiet P/I-v$.--
for the present work. The hyperplanes tion has shown itself morecapable of Acoust., p. 75, July 1960.
were drawn perpendicular to and at the classifying sounds correctly in these [ 5 ] A. R. Sekey, “A study of auditory pcr-
ception in thetime frequencydomain,”
middle of the “hyperline” joining two cases where, according to ourknowledge P11.D. dissertation, University of Lon-
“hyperpoints” of different affiliation. of speech characteristics, soundsare don, London, England, March 1962.
Once all the speech samples reserved for characterized andshould bedifferenti- [6] D. M. Green, T. G. Birdsall, and W. 1’.
learning had been used, the regions with- ated by their resonance pattern. Tanner, “Signal detectionas a func-
out affiliation were incorporatedin a The 200-Hz mode has, on the whole, tion of signal intensity and duration,”
J. Acolrst. Sac. A m , vol. 29, p. 523,
neighboring region affiliated to some shown itself superior to the100-Hz mode 1957.
category. when consideringthe startof stop conso- [7] C. D. Creelman, “Detection of corn-
Four speakers were used; most of the nants and, in particular, when trying to plex signals asa function of sigllal
experiments were donewith thestop distinguish between two stop consonants bandwidth and duration,” J. Acolrst.
Soc. Am., vol. 33, p. 89, 1961.
consonants because, while it isquestion- whose loci were in the same frequency [8] D. M. Green, “Auditory perceptioll
able whether it would really be profitable region. Fortheinitialpart of the six of a noise signal,” J. Acousf. SOC.Am.,
to use large-bandwidth filters to recog- pairs of stopconsonantsstudied,the vol. 32, p. 121, 1960.
nize consonants fromoneanother,it 200-Hz mode proved to be significantly [9] F. M. Braverman, “Experimcnts on
seems certain thatthe recognition of superior to the 100-Hz mode for three machine learning to recognize visual
patterns,” Autonzariotz C X F I ~ Rrnznte
vowels will become more difficult under pairs (p-by p-t, d-t); both displays Control, vol. 25, p.315, 1962.
these conditions. Because of the limita- seemed to be nearly equivalent in two [lo] W. Turski,“A learning autornatorl for
tions in computer time, it was resolved other cases (p-d, b-t); and the 200-Hz solving stability problems of differ-
to take the consonants by pairs. About display appearedsomewhatbetterfor ential equations,” Conlprctutia, vol. I,
35 samples of each consonant were re- the other pair(d-b). In general, it can be p. 57, 1964.
[Ill M. Lecours, “Adaptivc spectral an&
corded, 5 of which were reserved for said that the200-Hz mode was at an ad- ysis for speech-sound recognition,”
learning. vantage when considering the initial part P&D. dissertation, University of Lon-
The starting point of the stop conso- of thestopconsonants,althoughthis don, London, England, May 1967.
nants was taken to be the earliest point might not be true in every case.
at which two adjacentsamples at fre- This study is incomplete in this sense,
quencies higher than 600 Hz would equal that it represents only a first step in the
or exceed a threshold just high enough study of an important problem.It might
not to be exceeded by occasional noise. have been more interesting to compare
The data on theprecursive voicing were 100-Hz and 400-Hz filters. However, it
disregarded. can already beseen that smalldifferences
A portion of the speech sound had to in thedesign of a spectrum analyzer can The ac Resistance of
be selected for study, and the segmenta- produce substantial differences of recog- Carbon Microphones
tion had to be carried out not only be- nition performance. The fact that one
tween the phonemes but also inside the cannot conclude that one mode always IEEE StandardNo. 258l gives in para-
phoneme. For example, one part of the performs better in given conditions graph 5.3 a method for determining the
stopconsonants consists of a burst of makes it difficult to implement an auto- impedance of a carbon microphone. The
noise;anothercontains the frequency maticadaptivespectrum analyzer. It measurement taken, namely the ratioof
transitions from the burst to the vowel. might bea better ideato use in parallela thedc voltage drop acrossthemicro-
It is normal to expect that the optimum few spectrumanalyzers withdifferent phoneandthedc exciting current,is
mode of the analyzer would not always filtering characteristics. actually the effective dc resistance of the
be the same for all parts of the conso- MICHEL LECOURSmicrophone, acousticallyexcited.This
nant. One had, at first, to proceed by Lava1 University has been called the“speaking resis-
trial and error; after some time, it be- Quebec, Canada tance.” Since a carbonmicrophone is
came convenient to study the first 20 ms J. J. SPARKW virtually nonreactive, itis often wrongly
of the consonant, the first 40 ms, and the University of Essex assumed that its speaking resistance is
second part of the consonantgoing from Essex, England equal to itseffective ac source resistance
20 or 30 ms to 50 or 60 ms after the start or its impedance. Similarly, under
of the vowel. Every pair of stop conso- REFERENCES “quiet” conditions, i.e., withvery low
nants was tested in these three time in-
tervals. The experiments were repeated [l] R. S. Thomas, “A real-time audio Manuscript received June 5 , 196S.
spectral analyserusing active filters
with new learning samples. Table I sum- with adjustable parameters,” Ph.D. Talking I “IEEE Standard on Test Procedure for Closc-
marizes the results in the cases where dissertation, University of London, Audio andPressure-Type Microphones,” IEEE Trans.
Electroacoustics. vol. AU-14, pp. 156-162,
significant differences were obtained. London, England, 1964. December 1966.

CORRESPONDENCE 525

You might also like