Evaluating Spatial Sound
Systems
Mark F. Bocko
Audio & Music Engineering
Audio Engineers love specs …
• Predicting which speakers will sound good …
2
How many speakers are enough?
$
NHK 22.2
$ $
$
$ $
$
$ $$ $ $ $
$ $ $ $ $
$$
$ $ $
$
Quantitatively evaluate Framework
any spatial sound Specify listening space &
1
Specify virtual acoustic
2
reproduction method in speaker placement sources to be created
any space … Compute signals driving 3
each loudspeaker
• Incorporate quantitative models of binaural (Your favorite method)
hearing into audio system design tools 4
Compute acoustic field at Compare
• Identify the computable quantities that listener (directional IR) & Assess
correspond to what listeners report they hear
(locations, spatial extent of sources, diffusiveness) Compute sound field- 5
listener interaction
(head model)
• Make the design of systems for creating spatial
audio more deterministic and less trial and error 6 7
Compute percepts Infer virtual acoustic
• Both for free space sound reproduction (binaural fusion model) source properties
• And for headphone based reproduction
4
Outline
• How the ear works – very briefly
• Meddis hair cell model
• Cross-correlation model of directional hearing
• Audio coherence and spatial hearing
• Interaural time and level differences
• Spectral coloring from source elevation
• Correlograms
• Examples
5
Human
Auditory
System
6
7
Reissner Membrane
Scala Vestibuli
Tectorial Membrane
Organ of Corti
Scala Tympani
Basilar Membrane
8
©2013 by American Physiological Society
9
Meddis Hair
5
Input Signal
0
Cell Model -5
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time (sec)
150
Cell Probability
Deflection
100
Around 3000 inner hair cells
50
0
along the length of the basilar
~ Firing
-50
membrane Hair
-100
-150
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time (sec)
Neuron firing is
Neuronal Pulse Stream
1.2
irregular and 0.8
clustered near
0.6
0.4
signal peaks
0.2
0
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time (sec)
10
Meddis Hair
5
Input Signal
0
Cell Model -5
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time (sec)
150
Cell Probability
Deflection
100
50
~ Firing
-50
Hair
-100
-150
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time (sec)
Neuronal Pulse Stream
1.2
Spontaneous 0.8
0.6
firing rate 0.4
0.2
0
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time (sec)
11
Binaural Fusion Model
ea r Low Freq
l ef t
m
Fro
High Freq
u t
tp
Ou
Site of r
Binaural ht ea
m rig
Fusion Fro
To right
cochlea
To left Represent as a bi-directional delay line
cochlea
12
Binaural fusion mechanism à 2 msec windowed cross-correlation
2 msec *
DELAY LINE FROM RIGHT EAR
DELAY LINE FROM LEFT EAR
W(T)
xr(t) t
T
t1 t2 t3
TW
W(T)
xl(t) 𝜏
T The lag where the peak in the cross-correlation
appears is the Interaural Time Difference
t
t1 - 𝜏 t2 - 𝜏 t3 - 𝜏
TW • Jeffress, L. A. (1948). A place theory of sound localization. Journal
of comparative and physiological psychology, 41(1), 35. 13
Interaural Time Difference and source direction
(in the horizontal plane)
0 50 100 Perceived ITD (direction to source) is
determined by location of the peak in the
Sl S Sr
short-time cross-correlation function
Low frequency limit of
Rayleigh diffraction around sphere
q !#
ITD = 𝑠𝑖𝑛(𝜃)
"$
30° 30°
c is the speed of sound
ITD = 0 when 𝜃 = 0
L R ITD = (3/2)*(d/c) when 𝜃 = 90°
2d
d
Note: Factor of 3/2 is due to diffraction around listeners head
14
Role of coherence in binaural hearing
3 Sec white noise bursts
S1 S2
• S1 alone
• S2 alone
• S1 + S2 the same
• S1 + S2 different
15
Demonstration of lateralization as a function of noise burst duration
• Play a series of uncorrelated stereo noise bursts of decreasing duration
(2sec 1sec 0.5sec 0.2sec 0.1sec 50msec 20msec 10msec 5msec 2msec 1msec)
Series of uncorrelated
2msec stereo noise bursts
• At about 2 msec and less, each burst is identified with a specific location
• The cross-correlation function always has a peak somewhere! But it is different each time.
• The auditory percept being computed by the brain is updated about every 2 milliseconds
16
-0.5
Auditory
10 20 30 40 50 60 70 80
Sample Number
Cross-correlation Function
“Sluggishness”
1
“L” click
Norm X-Corr
0.5
• How quickly can a listener follow time- 0
-0.5
varying binaural cues? -1
• Evidence for a 200 - 300 msec threshold
-80 -60 -40 -20 0 20 40 60 80
Lag (samples)
• Distribution of 2 msec window ITD’s has a
“memory” of 100 - 300 msec Series of L, C, R located clicks
60
50
40
30
10 msec 50 msec 100 msec 250 msec 500 msec
20
10 Your brain averages over a hundred or more 2 msec windows
0
-20 -15 -10 -5 0 5 10 15 20 25 and constructs a histogram of interaural time differences.
Histogram of ITD’s
17
Correlograms – Frequency dependent interaural time differences
u e n cy
Freq
Frequ
n cye
Del
ay
2-D (ITD & frequency) map encodes source location
Brain decodes these maps to source locations ITD
ITD à lateral position of source Stereo speaker pair – center panning
Frequency dependence à source elevation (anechoic conditions)
18
Procedure
• For a given head model …
• Compute the reference correlograms for all possible sound source directions
• Specify the multi-channel reproduction system, the influence of the room, and
the signals driving each speaker (for whatever method you choose)
• Compute the resulting correlogram
• Project the computed correlogram onto the reference set to infer the direction
• One may infer a superposition of source directions
• Specific methods
• Decompose into spherical harmonics (orthogonality helps)
• Error minimization
• Machine learning
19
So how does the method work? … assessing the effect of reverberation
Aula Carolina
(Aachen)
20
Reverberation broadens the source image
250 Trials - Stereo Loudspeakers @ +/- 30 degrees - Delta = 0 (center pan)
80
Reverb
Anechoic
70
60
50
Number of Trials
40
Note: Random nature of nerve
30
impulse stream creates a spread
20
of image width, even in a non-
10
Reverberant space
0
-30 -20 -10 0 10 20 30
21
Perceived Incident Angle of Sound Source in Degrees
Spatial Blur – experimental measurements
The model reproduces the observed angular acuity.
Spread arises from statistics of neuronal pulses.
22
Blauert, J., “Spatial Hearing: The Psychophysics of Human Sound Localization”, MIT Press 1983.
Spatial acuity with one ear!
If you don’t believe the cross-correlation model look at this!
23
Blauert, J., “Spatial Hearing: The Psychophysics of Human Sound Localization”, MIT Press 1983.
Sl Sr Modeling Stereo Reproduction
Frequency dependence of head
diffraction
𝑅"!" 𝑡, 𝜏 = 𝑅# 𝑡, 𝜏 + 𝑓 $ 𝜔 𝑅# 𝑡, 𝜏
+ 𝑓 𝜔 𝑅% 𝛿 𝜏 + 𝜏& + 𝛿(𝜏 − 𝜏& )
𝜏& = left-right ear delay
𝑅# 𝑡, 𝜏 is the cross-correlation of the Sl and Sr
L R
d
24
2
L Speaker Apparent Intended R Speaker
Stereo Sweet Spot calculation 1.5
• Compute peak of distribution of ITD’s for
1
Dl Dr
a real source at the intended location
• Compute peak of distribution of ITD’s for y
(x0,y0)
the stereo rendered intended source
0.5
• Infer the apparent source direction from
peak of ITD distribution x
0
(0,0)
• This example is for coherent sources – the
formalism also can be used with partially
coherent sources, i.e., real signals in
-0.5
reverberant spaces.
-1
-2 -1.5 -1 -0.5 0 0.5 1 25 1.5
Main Points
• Integrated a quantitative neurological model into a spatial audio analysis tool
• Randomness of auditory nerve firing events is important
• Predicts measured angular acuity
• Two time scales are in play
• Short ( ~ 2 msec) window for cross correlation in brainstem
• Longer ( ~ 100 msec) histogram “memory” (higher level processing)
• We can predict what a listener will tell you they hear
• Location and spread of sound source
• There’s a lot left to do …
• Integrate with room modeling software for a complete analysis package
• Create synthesis tools – find the designs and algorithms that best reproduce a desired spatial
sound effect
• Continue to refine auditory models
• Distance cues
26
END
27
Cochlea
28
Cross-correlation (similarity of two signals)
[x1 x2 x3] [x1 x2 x3] [x1 x2 x3] [x1 x2 x3] [x1 x2 x3]
[y1 y2 y3] [y1 y2 y3] [y1 y2 y3] [y1 y2 y3] [y1 y2 y3]
Lag -2 -1 0 1 2
Delay = 0 Two random sequences Two random sequences Delay = 30 samples
10 10
5 5
0 0
-5 -5
-10
-10
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
Cross-correlation
Cross-correlation
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2 0 50 100 150 200 250 300 350 400
0 50 100 150 200 250 300 350 400
Signals are correlated but delayed
Uncorrelated signals
Two random sequences
10
-5
-10
20 40 60 80 100 120 140 160 180 200
Cross-correlation
0.2
0.15
0.1
0.05
-0.05
-0.1
-0.15
0 50 100 150 200 250 300 350 400
No dominant peak in cross-correlation
Precedence effect
• Law of the first wave-front …
• Direction is inferred from 1st wave-front (up to about 30-40 msec)
• Haas effect – short delays enhance “spaciousness”
0 – 2 msec delay 0 – 40 msec delay 0 – 200 msec delay
(in 20 steps) (in 20 steps) (in 20 steps)
Explained by saturation and recovery time of hair cell response.
31
Directional impulse responses
Directional Impulse Response
Track both the time of 10
-3
arrival and the direction 2.5
of each room reflection 1.5
0.5
z
-0.5
(Matlab Demo: Imp_Resp_w_Angle_3.m) -1
-1.5
-2
-2.5
2
2 0 -3
1 10
0
-3 -1 -2
10 -2
y
-3 x 32