The Paraspax method applied on loud-
speaker arrays
Loudspeaker array-based synthesis of varying spaces includ-
ing an investigation on how the sound field changes at different
position within the array

Master’s thesis in Master Program Sound and Vibration

HANNA PERSSON

Department of Architecture and Civil Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
Master’s thesis ACEX30
Gothenburg, Sweden 2022


Master’s thesis ACEX30

The Paraspax method applied on loudspeaker
arrays

Loudspeaker array-based synthesis of varying spaces including an
investigation on how the sound field changes at different position

within the array

HANNA PERSSON

Department of Architecture and Civil Engineering
Division of Applied Acoustics

Audio Technology Group
Chalmers University of Technology

Gothenburg, Sweden 2022


The Paraspax method applied on loudspeaker arrays
Loudspeaker array-based synthesis of varying spaces including an investigation on
how the sound field changes at different position within the array

HANNA PERSSON

© HANNA PERSSON, 2022.

Supervisor: Jens Ahrens, Division of Applied Acoustics
Examiner: Jens Ahrens, Division of Applied Acoustics

Department of Architecture and Civil Engineering
Division of Applied Acoustics
Audio Technology Group
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Department of Architecture and Civil Engineering
Gothenburg, Sweden 2022

iv


The Paraspax method applied on loudspeaker arrays
Loudspeaker array-based synthesis of varying spaces including an investigation on
how the sound field changes at different position within the array
HANNA PERSSON
Department of Architecture and Civil Engineering
Chalmers University of Technology

Abstract
Binaural room impulse responses (BRIRs) describes the transmission from a sound
source to a listeners left and right ear, unlike monaural room impulse responses which
only contains one channel and therefore sounds the same to both ears. The first
method to get BRIRs of a space is by recordings of a sound source using a dummy
head with microphones in each ear for different head orientations and positions in
the space. This could be both time consuming and costly and therefore research is
trying to find new ways that are more practical and includes signal processing.

The Paraspax method is a method for parametric spatial audio rendering with 6
DoF based on one monaural room impulse response. The method encodes monaural
and spatial parameters offline into a parametric spatial sound field for arbitrary head
orientations and room positions. The most important parameters are the amplitudes
of the direct sound and up to 10 early reflections with corresponding times and
directions of arrival (TOAs, DOAs). The TOAs are simulated from a reflection
detection algorithm and the image source model provides the DOAs. These, together
with the rest of the parameters, forms the basis of BRIRs synthesized for audio
reproduction using headphones. The work of this thesis contains an extension of
the BRIR synthesis into a loudspeaker array-based synthesis where the parametrized
direct sound and early reflections are distributed over some loudspeakers arranged
over a sphere. The resulting sound field is estimated for a listener positioned at
different positions inside the loudspeaker array. The authors of the Paraspax have
presented the method for a shoebox-shaped room but it is still unknown how it
works for other environments and therefore a handful of room impulse responses
will be tested.

The thesis will answer what the minimum number of loudspeakers in the loud-
speaker array is and how the sound field at different listening positions differs from
the sound field created at the center of the array. It will also be shown how some
parameters of the loudspeaker array influences the sound. Convolving an anechoic
drums audio file with the synthesized sound field created by the loudspeaker array
acts for virtually place the listener at different positions in the loudspeaker array and
the resulting sound represents how the drums are perceived in the different environ-
ments of test. The results obtained by the loudspeaker array containing a reduced
number of loudspeakers are analyzed and compared with a 84-loudspeaker array.
It will be shown that the loudspeaker array is highly dependent on the simulated
DOAs, and especially the azimuth angles as it will appear that the loudspeakers
should be placed around the listener. If the DOAs of the early reflections are var-
ied enough in azimuth, it shows that three loudspeakers are enough. The sound
behaves differently depending on the direction in which the listener moves, but by

v


increasing the number of loudspeakers or the radius of the array, the listener can
generally move more freely with the exception of when the TOA differences between
the different loudspeakers being too large.

vi


Acknowledgements
I would like to express my gratitude to my supervisor and examiner Jens Ahrens
at the Division of Applied Acoustics at Chalmers. Without you this project would
not have been possible. Thank you for proposing this project as a master’s thesis,
for your wise and humble words in guiding me through this work, and thank you
for your quick response to emails that did not make it a hindrance at all that you
were in the US for almost throughout the course of the project. I would also like
to send my thanks to Wolfgang Kropp at the Division of Applied Acoustics who
was kind enough to lend me his headphones, and also to the other employees of the
Division who taught me their knowledge in acoustics and have been very helpful to
me during my two years as a master student.

I’m thankful to my classmates, and my dear partner and roommate Daniel Hall
for acting as a sounding board during the course of the project. I also want to thank
Daniel for the great support he gave me during my study period. I want to give my
last thanks to my friend Christine Jeppsson just for being there.

Hanna Persson, Gothenburg, October 2022

viii


x


Contents

List of Figures xiii

List of Tables xvii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 3
2.1 Echo density profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Image source model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Head related impulse responses . . . . . . . . . . . . . . . . . . . . . 6

3 Methods 7
3.1 Mixing time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Reverberation level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Spectral components . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3.1 Reflection detection . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.2 Directions of arrival . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 Late reverberation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Early diffuse sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Extension on loudspeaker arrays . . . . . . . . . . . . . . . . . . . . . 20

3.6.1 Listener at the center . . . . . . . . . . . . . . . . . . . . . . . 21
3.6.2 Virtual loudspeaker array . . . . . . . . . . . . . . . . . . . . 23
3.6.3 Change listener position . . . . . . . . . . . . . . . . . . . . . 25

4 Results 27
4.1 At the center of the loudspeaker array . . . . . . . . . . . . . . . . . 27
4.2 Investigation of the sweet spot . . . . . . . . . . . . . . . . . . . . . . 36

5 Discussion 49
5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 At the center of the loudspeaker array . . . . . . . . . . . . . . . . . 50
5.3 Investigation of the sweet spot . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Influence of different parameters . . . . . . . . . . . . . . . . . . . . . 54

6 Future research 55

xi


Contents

7 Conclusion 57

Bibliography 59

A Tested room impulse responses I
A.1 Genesis 6 studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
A.2 Trollers gill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
A.3 Maes howe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
A.4 Arthur sykes rymer auditorium . . . . . . . . . . . . . . . . . . . . . III
A.5 Koli national park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III
A.6 Stairway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III
A.7 Hoffman lime kiln . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI
A.8 Central hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII
A.9 Helsington church . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII
A.10 Promenadikeskus concert hall . . . . . . . . . . . . . . . . . . . . . . VIII
A.11 Innocent railway tunnel . . . . . . . . . . . . . . . . . . . . . . . . . . X
A.12 Falkland palace royal tennis court . . . . . . . . . . . . . . . . . . . . XI
A.13 Shrine and parish church of all saints . . . . . . . . . . . . . . . . . . XI
A.14 Hamilton mausoleum . . . . . . . . . . . . . . . . . . . . . . . . . . . XIV
A.15 Terrys factory warehouse . . . . . . . . . . . . . . . . . . . . . . . . . XIV
A.16 Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XV

B Results XVII

xii


List of Figures

2.1 Path of a first-order reflected sound ray from a sound source S to a
receiver R using the image source S ′. . . . . . . . . . . . . . . . . . . 5

2.2 Path of a second-order reflected sound ray from a sound source S to
a receiver R using the image source S ′′. . . . . . . . . . . . . . . . . . 5

3.1 Calculated echo density profiles using the Paraspax method of three
types of RIRs; a tunnel, a semi outside environment and a warehouse
of big volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Reverberation level of the Genesis 6 studio estimated by the RMS
method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Reverberation level of the Shrine and parish church of all saints esti-
mated by the EDC method. . . . . . . . . . . . . . . . . . . . . . . . 11

3.5 Direct sound and early reflections found by the reflection detection
algorithm of the Paraspax method in the RIR of the Hamilton mau-
soleum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4 Direct sound and early reflections found by the reflection detection
algorithm of the Paraspax method in the RIR of the Helsington church. 13

3.6 Direct sound and early reflections found by the reflection detection
algorithm of the Paraspax method in the RIR of the Koli national park. 14

3.7 DOAs of the direct sound and early reflections in azimuth and el-
evation of Arthur sykes rymer auditorium, found by the Paraspax
method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.8 The inside (left) and the floor plan marked with different source and
receiver positions (right) of the Hoffman lime kiln chamber. . . . . . . 16

3.9 DOAs of the direct sound and early reflections in azimuth and eleva-
tion of the Stairway, found by the Paraspax method. . . . . . . . . . 17

3.10 DOAs of the direct sound and early reflections in azimuth and eleva-
tion of Hoffman lime kiln, found by the Paraspax method. . . . . . . 17

3.11 The early part of the measured RIR of the Helsington church plotted
together with the detected direct sound and early reflections (direc-
tional part) and the corresponding weighting function. The direc-
tional part and weighting function is found by the Paraspax method. 19

3.12 The binaural diffuse reverberation built from binaural white noise and
based on the measured RIR of the Helsington church plotted together
with the inverse weighting function. . . . . . . . . . . . . . . . . . . . 20

xiii


List of Figures

3.13 The directional and diffuse components that forms the early part of
the synthesized BRIR of the Helsington church. . . . . . . . . . . . . 21

3.14 A spherical loudspeaker array of 84 loudspeakers positioned at the
green dots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.15 Early reflections passed to two different loudspeakers of the array.
One loudspeaker is assigned one reflection (left) while another is as-
signed three reflections (right). . . . . . . . . . . . . . . . . . . . . . . 23

3.16 Measured HRIRs from a Neumann KU100 artificial head showing
how sound is reaching the left and right ear when the sound source
is positioned at 90◦ to the left (left) and at 180◦ right behind (right). 24

3.17 Sound travelling from the loudspeakers to a listener positioned at
the center of the array (black dot) and to a listener at an arbitrary
position (red dot) within a simplified loudspeaker array of four loud-
speakers in the horizontal plane. . . . . . . . . . . . . . . . . . . . . . 25

4.1 A loudspeaker array in the vertical plane containing 7 loudspeakers
positioned at the green dots right in front of the listener. . . . . . . . 29

4.2 A loudspeaker array in the horizontal plane containing 3 loudspeakers
positioned at the green dots around the listener. . . . . . . . . . . . . 30

4.3 Measured monaural room impulse response of Maes Howe. . . . . . . 31
4.4 Loudspeaker array synthesis of Maes howe from a loudspeaker array

in the horizontal plane containing 3 loudspeakers (left) and from the
84-loudspeaker array (right). . . . . . . . . . . . . . . . . . . . . . . . 31

4.5 Interaural coherence of Maes howe for a loudspeaker array in the hori-
zontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker
array (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.6 Loudspeaker array synthesis of the Helsington church for a loud-
speaker array only changing in the horizontal plane using 3 loud-
speakers (left) and its monaural RIR (right). . . . . . . . . . . . . . . 33

4.7 Measured monaural room impulse response of the Helsington church. 33
4.8 Interaural coherence of the Helsington church for a loudspeaker array

in the horizontal plane containing 3 loudspeakers (left) and for the
84-loudspeaker array (right). . . . . . . . . . . . . . . . . . . . . . . . 34

4.9 Loudspeaker array synthesis of the Promenadikeskus concert hall for
a loudspeaker array in the horizontal plane containing 3 loudspeakers
(left) and the 84-loudspeaker array (right). . . . . . . . . . . . . . . . 35

4.10 Interaural coherence of the Promenadikeskus concert hall for a loud-
speaker array in the horizontal plane containing 3 loudspeakers (left)
and for the 84-loudspeaker array (right). . . . . . . . . . . . . . . . . 35

4.11 DOAs of the direct sound and early reflection of Trollers gill. . . . . . 37
4.12 DOAs of the direct sound and early reflection of the Koli national park. 37
4.13 DOAs of the direct sound and early reflection of the Central Hall. . . 38
4.14 Directions of the listener’s movements in the 3-loudspeaker array. . . 38
4.15 Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array

in the horizontal plane at the listener position 1 meter in front of the
center (left) and 2 meters in front of the center (right). . . . . . . . . 39

xiv


List of Figures

4.16 Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array
in the horizontal plane at the listener position 3.5 meter in front of
the center (left) and 5.5 meters in front of the center (right). . . . . . 40

4.17 Interaural coherence of Maes howe for a 3-loudspeaker array in the
horizontal plane at the listener position 1 meter in front of the center
(left) and 3.5 meters in front of the center (right). . . . . . . . . . . . 41

4.18 Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array
in the horizontal plane at the listener position 1 meter behind the
center (left) and 2 meters behind the center (right). . . . . . . . . . . 41

4.19 Loudspeaker array synthesis of the Genesis 6 studio for a 3-loudspeaker
array in the horizontal plane at the sweet spot. . . . . . . . . . . . . 42

4.20 Loudspeaker array synthesis of the Genesis 6 studio for a 3-loudspeaker
array in the horizontal plane at the listener position 1.5 meters aside
to the left of the center (left) and 3.5 meters aside to the left (right). 43

4.21 Loudspeaker array synthesis of the Genesis 6 studio for a 3-loudspeaker
array in the horizontal plane at the listener position 1.5 meters aside
to the right of the center (left) and 3.5 meters aside to the right (right). 44

4.22 HRIR from loudspeaker 3 to the listener at the new listener positions
1.5 m aside to the right (left) and 3.5 m aside to the right (right). . . 44

4.23 Loudspeaker array synthesis of Maes howe at the listening position 2
m diagonally in front of the center to the left (left) and at the listening
position 2 m diagonally in front of the center to the right (right). . . 45

4.24 Loudspeaker array synthesis for the listener position 3.5 m diagonally
forward to the left of the Promenadikeskus concert hall (left) and of
Maes howe (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.25 Loudspeaker array synthesis for the listener position 3.5 m aside to
the left of the Genesis 6 studio when the radius is decreased to 5 m
(left) and increased to 20 m (right). . . . . . . . . . . . . . . . . . . . 47

A.1 Floor plan with source and receiver positions (left) and photo taken
from the control room (right) of the Genesis 6 studio. . . . . . . . . . I

A.2 The measured valley of Trollers gill (left) and floor plan with source
and receiver positions (right). . . . . . . . . . . . . . . . . . . . . . . II

A.3 The outside location of Maes Howe (left) with floor plan (right). . . . II
A.4 The interior of Maes Howe. . . . . . . . . . . . . . . . . . . . . . . . . III
A.5 Floor plan with measurement positions of source and receiver at the

Arthur sykes rymer auditorium. . . . . . . . . . . . . . . . . . . . . . IV
A.6 The interior of the Arthur sykes rymer auditorium. . . . . . . . . . . IV
A.7 The Koli national park at summer. . . . . . . . . . . . . . . . . . . . V
A.8 The floor at which measurements were made at the Stairway (left).

The floors below the measurement floor (right). . . . . . . . . . . . . V
A.9 Floor plan of Hoffman lime kiln with source and receiver positions. . VI
A.10 The exterior (left) and interior (right) of the Hoffman lime kiln. . . . VI
A.11 Floor plan of the Central hall with source and receiver positions. . . . VII

xv


List of Figures

A.12 The interior of Central hall. The speaker used in the measurements
is visible on stage (right). The hall is equipped with bleachers at the
back and a bunch of chairs at the front (left). . . . . . . . . . . . . . VII

A.13 Floor plan of the Helsington church with source and receiver positions.VIII
A.14 The interior of Helsington church. The loudspeaker at the altar

(right) and the microphone at position "R6" (left). . . . . . . . . . . . IX
A.15 Floorplan of the Promenadikeskus concert hall with source and re-

ceiver positions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX
A.16 The interior of the Innocent railway tunnel. . . . . . . . . . . . . . . X
A.17 Source and receiver positions of the Innocent railway tunnel. . . . . . XI
A.18 The interior of the Falkland palace royal tennis court. . . . . . . . . . XII
A.19 Floor plan of the Shrine and parish church of all saints together with

source and receiver positions. . . . . . . . . . . . . . . . . . . . . . . XII
A.20 The interior of the Shrine and parish church of all saints. . . . . . . . XIII
A.21 The exterior (left) and the interior, showing the microphone and loud-

speaker position (right) of Hamilton mausoleum. . . . . . . . . . . . . XIV
A.22 The interior of Terrys factory warehouse including microphone posi-

tion (left) and loudspeaker position (right). . . . . . . . . . . . . . . . XV

xvi


List of Tables

3.1 Window lengths in echo density profile when calculating mixing time
for three different measured RIRs. . . . . . . . . . . . . . . . . . . . . 8

4.1 Numbers of loudspeakers required in the loudspeaker array for differ-
ent impulse responses. . . . . . . . . . . . . . . . . . . . . . . . . . . 28

A.1 Information about the tested room impulse responses. . . . . . . . . . XVI

B.1 Various results for all tested room impulse responses. . . . . . . . . . XVIII

xvii


List of Tables

xviii


1
Introduction

Spatial audio can be obtained by convolving audio signals by spatial room impulse
resonses (SRIR). SRIRs consists of a spatial description of the sound field in a given
room in addition to its monaural parameters. The goal is to create a perceptually
plausible virtual environment that is coherent with the real environment. Audio
in virtual environments is often used together with other things that require high
computer load, such as visuals, and therefore it is important to simultaneously
maintaining a low computation time. This can be fulfilled by parametrizing the
monaural and spatial parameters offline into a parametric spatial description of the
sound field. An example of a method that do this is the Paraspax method [1]. The
Paraspax derives monaural and spatial parameters for a 6 degrees of freedom (DoF)
virtual environment which corresponds to arbitrary head orientations and room
positions of the listener. This parametric description forms the basis of binaural
room impulse responses (BRIRs) which are synthesized by a synthesis algorithm of
the Paraspax method.

The work of this thesis is to use the Paraspax method to create parametric
SRIRs for a number of different monaural room impulse responses (RIRs) of dif-
ferent environments. Furthermore, the BRIR synthesis in the Paraspax method
creates 2-channel (left and right) BRIRs for any head orientation and listener trans-
lation applied for head phones and can be used for real-time rendering. Within this
work, the BRIR synthesis will instead be extended to a loudspeaker array-based
synthesis such that the monaural and spatial parameters of the RIR are distributed
over a number of loudspeakers arranged along a sphere that forms the array. The
loudspeaker array synthesis is first generated for a listener that is positioned at the
center of the array, where the work will investigate the lowest amount of loudspeak-
ers required in the array in order to not reduce the quality of the sound field created
by the spherical loudspeaker array used as the starting point. In a next step, the
listener changes position inside the array and it will be examined to what extent
the listener can move from the center before the sound image will change and not
sound as good anymore.

1.1 Background

The method of the Paraspax [1] has yet only been tested with one monaural omnidi-
rectional room impulse response for a shoebox-shaped room of dimensions 11.73 m
× 4.74 m × 4.62 m (length × width × height). The floor of the room is of concrete
and the walls are plain, where one of the two long sides consists of large glass panes.

1


1. Introduction

The reverberation time was measured to 0.9 s. However, to examine any limita-
tions of the method, it is of interest to know how the method behaves with varying
environments, for example outside environments with low reverberation times that
only contributes to a low number of reflections as well as warehouses or churches of
high reverberation times. Descriptions of the tested monaural RIRs together with
measurements positions of loudspeaker and microphones can be found in Appendix
and they are available online at [2] and [3].

In addition, the method have only been used for headphone reproduction but
any research on how it behaves with loudspeaker arrays is yet to be done.

1.2 Related works
As the method of this thesis is built upon the Paraspax method [1], the presented
theory and parametrization of the monaural RIRs presented in Methods before the
extension to loudspeaker arrays takes place is taken from the Paraspax method.

The Paraspax method is inspired by the Binauralization of omnidirectional room
impulse responses (BinRIR) algorithm which is a method for parametric spatial au-
dio rendering with 6 DoF [4]. Just as for the Paraspax, only a single measured
omnidirectional RIR is required to obtain a set of BRIRs of the room of interest.
The process of synthesizing the late reverberation in BinRIR is reused in the Paras-
pax and will be explained in detail in Methods. However, listening tests where
synthesized BRIRs was compared with measured BRIRs resulted in uncertainties
in the method and especially regarding the incidence direction of the early reflec-
tions and therefore, the Paraspax has extension and improvements from the BinRIR
algorithm.

2


2
Theory

2.1 Echo density profile
The mixing time of a room impulse response is the moment in time where the
specular part meets the diffuse part. The Paraspax method calculates the mixing
time according to Abel et al. [5], where the echo density from a reverberation
impulse response is measured.

Objects and reflective surfaces in a reverberant environment interacts with sounds
to create reflections. These reflections will in turn interact with the environment to
create even more reflections. When measuring a RIR, these reflections increases in
time until the echo density can be seen statistically, or more specific, the sound pres-
sure amplitudes of an impulse response is assumed to have a Gaussian distribution
of zero mean with evolving color and level.

The echo density profile η(t) is measured over time with the property that once
an acoustic space is fully mixed it takes on a Gaussian distribution. Over a sliding
reverberation impulse response window of length 2δ + 1 in samples, it is defined
as the number of impulse response taps laying outside the standard deviation of
the window divided by the expected number of samples lying outside a standard
deviation erfc(1/

√
2) = 0.3173 for a Gaussian distribution

η(t) = 1/erfc(1/
√

2)
2δ + 1

t+δ∑
τ=t−δ

1{|h(τ)| > σ}. (2.1)

If the argument of 1{·} is true, it returns one, otherwise zero, h(t) denotes the
reverberant impulse response and the window standard deviation is

σ =

√√√√ 1
2σ + 1

t+δ∑
t=τ−δ

h(τ). (2.2)

By normalizing by the expected standard deviation of a Gaussian noise, the
resulting number of taps outside the standard deviation is a number between 0
and 1. Few prominent reflections separated in time and level will contribute to a
larger standard deviation resulting in an echo density profile close to 0. As the
reflections occurs more frequently and decreases in amplitude over time, the echo
density increases over time. The mixing time is then defined as the time at which
the echo density measure reaches 1 the first time.

The choice of sliding window lengths affects the echo density profile. Shorter
windows are expected to have a high variance about its local mean as it includes

3


2. Theory

fewer impulse response taps. The window should be long enough so that it covers
a few reflections but short enough to provide good statistics. Impulse responses
of closely overlapping reflections should have longer windows so that the variation
between the different windows should not be so great. Shorter windows is a good
choice for impulse responses having only a few prominent reflections. However,
too short window lengths can contribute to jumps in the echo density profile when
no reflection is within the window. According to Abel et al. [5] a good choice
for window length is between 20 and 30 ms. Moreover, Abel et al. presents time
varying window lengths with the idea of having shorter windows at the beginning
of the impulse response where the echo density haven’t had time to increase yet
and then let the window increase with the increasing echo density. Within work,
however, only constant window lengths will be considered.

2.2 Image source model
The image source model is used to find the specular reflection pattern from a sound
source to a receiver within an enclosure, i.e. a room, and here a shoebox-shaped
room will be considered. The method contains some simplifications about the sound
field. The sound waves are idealized as sound rays which, when in a homogeneous
medium, travels along straight lines and from this diffraction is neglected. In ad-
dition, interference is not considered but instead the intensity of the sound field
components are added. Sound in rays are perfectly reflected at a boundary, which
is not the case in real world where some of the energy is scattered in an omnidirec-
tional pattern. As the reflections are repeated, the scattered energy increases until
the majority of sound energy is diffuse. For this reason, the image source model
should only be used for predicting the early reflections in a room and not the diffuse
sound.

Let S be a sound source and R be a receiver, both located in a room of plane
and smooth walls. The sound energy travels from S at a constant speed of sound
along the rays and decreases with the distance attenuation of 1/r2, where r is the
traveled distance. Except from the direct sound ray, the rays are reflected in the
walls before they hits R. A reflected ray can be seen as originating behind the wall
from a virtual sound source S ′ called image source. The image source is mirrored
on the line perpendicular to the wall such that the distance between the wall and
S is the same as the distance between the wall and S ′, as illustrated in Figure 2.1.
In this way, the path of the first-order reflected ray from S to R corresponds to the
path from S ′ to R. The intensity of the reflected ray is reduced by a factor 1-α,
where α is the absorption coefficient of the wall [6].

If the first-order reflected ray hits some wall a second time, the process above is
repeated, creating a second-order image source S ′′. The second-order image source
S ′′ is in turn mirrored on the line perpendicular to the first-order image source S ′,
as seen in Figure 2.2 The intensity of the resulting second-order reflection is then
reduced by a factor (1-α1)(1-α2), where α1 and α2 denotes the absorption coefficients
of the first and second wall, respectively. This process can then again be repeated
to obtain a third-order reflection and so on. The impulse response is then obtained
by summing up signals from the source S and each image source S ′, S ′′,..

4


2. Theory

Figure 2.1: Path of a first-order reflected sound ray from a sound source S to a
receiver R using the image source S ′.

Figure 2.2: Path of a second-order reflected sound ray from a sound source S to a
receiver R using the image source S ′′.

5


2. Theory

2.3 Head related impulse responses
Humans cannot pick up sound equally from all directions as the omnidirectional
microphone can. How sound is received to us depends on the size and shape of our
pinnae, head and upper torso, among others. Moreover, the distance from the source
to each ear can differ, the head can shadow the ears differently and the sound waves
can either directly enter the ear canal or first get reflected in the torso or pinnae.
The resulting modified signal received from a sound source is then processed by
our auditory system and gives us the ability to localize the source. The location is
estimated in our brain by comparing binaural cues, i.e. cues received by both ears,
such as interaural time and level differences. Humans can locate sounds in three
dimensions (distance, azimuth and elevation). Azimuth and elevation are angles
originating from a spherical coordinate system. The elevation angle describes the
vertical angle of a sphere, defined from a fixed zenith at 0◦ and extends to 180◦,
while the azimuth is the angle, defined up to 360◦, of the orthogonal projection on a
horizontal plane that is orthogonal to the zenith and goes through the origin. Thus,
HRIRs relates the location of the source to the location of the ears and can be used
to create virtual sound sources.

The frequency domain counterpart of the HRIR is called the head related transfer
function (HRTF). Sets of HRIRs and HRTFs always comes in pairs, corresponding
the left and right ear. BRIRs are obtained either by convolution of a RIR with a
set of HRIRs in time domain or by filtering the RIR in frequency domain with the
analogous HRTFs set.

HRIRs are commonly measured with a dummy head in an anechoic chamber. A
dummy head have the shape and size of a human head including pinnae and ear
canals where the microphones are placed. In this way the recording is made up from
the dummy head’s and especially a human’s perspective. There are a handful of
dummy heads which differs in design, for example KEMAR and Neumann KU100.
The measurements are performed by rotating the dummy head in the horizontal
plane of high resolution in front of a loudspeaker at a constant distance throughout
the measurement. The result is a set of HRIRs that corresponds to how humans
pick up sound when the source is positioned at various positions around us.

In the far field, when the distance between the source and dummy head r is
greater than 1 m, the HRIRs is attenuated by a factor 1/r. For distances smaller
than 1 m, the measured differences between the left and right ear will increase and it
is therefore more common to perform the measurements with at least 1 m between
the source and dummy head.

6


3
Methods

To run the Paraspax method [1] with any monaural RIR the distance from source
to receiver as well as source direction in azimuth and elevation are required. The
method got its name from the three keywords parametrization, spatialization and
extrapolation that forms the basis of the method. In the first part of the parametriza-
tion, following standard monaural room acoustic parameters according to ISO 3382-2
[8] are calculated; reverberation time (RT60, RT30, RT20), early decay time (EDT ),
clarity (C80, C50), definition (D50, D80), early decay curve (EDC) and direct-to-
reverberant ratio (DRR). The parameters are calculated in both octave bands and
broadband spectrum in the frequency range of human hearing (20 Hz - 20 kHz [9]).
Then the amplitude and time of arrival (TOA) for the direct sound and early reflec-
tions are estimated. As a last step in the parametrization, the reverberation level
is calculated, defined as the level of the diffuse sound field at the TOA of the first
early reflection.

The spatialization sets directions of arrival (DOAs) to the direct sound and early
reflections in spherical coordinates (azimuth and elevation). By doing this a 3 DoF
sound field for arbitrary head orientations of the listener is yield. The method can
be extended to create a sound field of 6 DoF for arbitrary head orientations as the
listener moves through the room. This is done in the extrapolation part where the
amplitudes, TOAs and DOAs of the direct sound and early reflections are modified
corresponding to a virtual space. The result is a BRIR synthesized for headphone
reproduction. By extending the method on loudspeaker arrays, the extrapolation
part is replaced by a loudspeaker array setup, where the parameterized directional
components of the measured RIR are distributed between the loudspeakers according
to their respective DOAs. Each loudspeaker signal is then assigned synthesized early
diffuse sound and late reverberation.

The parts of the method described in section 3.1-3.5 is taken directly from the
Paraspax method and shows how a RIR is parameterized and spatialized, followed
by how the diffuse reverberation is synthesized. It will also be reported how the
Paraspax method behaves for a variety of rooms, that are listed in Appendix. In
section 3.6 the extension of the method will be explained in detail, where the parts
that are re-used from the Paraspax method are noted. Except for these parts, the
extension on loudspeaker array is created from scratch.

7


3. Methods

3.1 Mixing time
A RIR is composed of direct sound, early reflections and late reverberation. The
direct sound and early reflections are spectral components and appears in the begin-
ning of the RIR. The late reverberation that consists only of diffuse sound belongs
to the latter part and its start in time is determined by the mixing time.

The Paraspax method divides and processes the early part from the late part
and connects them once all respective components are found and it is therefore
important that the mixing time is predicted exactly in the transition between the
directional and diffuse sound field of the RIR. The mixing time is a good predictor if
it is estimated after all early reflections. If it instead appears too early in time, some
of the early reflections will be part of the diffuse sound. Nor must it be estimated
too late in time since the diffuse sound in the early part of the synthesized BRIR will
depend on when the mixing time starts, and the same applies to each loudspeaker
signal of the loudspeaker array.

The echo density profile is calculated for estimating the mixing time and by
default, the Paraspax method uses a window length of approximately 21.3 ms. By
adjusting the window lengths, the mixing time is shifted, where shorter lengths
contributes to earlier mixing times and longer lengths gives later mixing times.
Table 3.1 shows the window length and mixing time in milliseconds of three selected
monaural RIRs. The first is called the Innocent railway tunnel and is a tunnel
previously accommodated with two railway tracks. It is now used for pedestrians
and bicycles as the tracks got replaced by paving. The tunnel have a semicircular
cross section of dimensions 4.5 m × 6 m (height × width) and it extends as far
as 517 m [2]. The Falkland palace royal tennis court is a semi outside environment
with the size of 2300 m3 and no roof. Its walls and floor is made of a concrete-like
material. The Terrys factory warehouse is an empty industrial building of 4500 m3.

Name Window length in ms Mixing time in ms
Innocent railway tunnel 42.67 114.88

Falkland Palace
Royal tennis court 25 168.56

Terrys factory warehouse 21.3 176

Table 3.1: Window lengths in echo density profile when calculating mixing time
for three different measured RIRs.

The full list containing the mixing times with respective window lengths of all
RIRs is presented in Appendix, showing that the default window length only is
ideal for 5 out of 15 tested RIRs. There are no clear pattern between suitable
window length and type of RIR, meaning that there are no rule that works in every
case. The window lengths used in the end was selected by trial and error so that the
estimated mixing time appears in the transition of the early and late part of the RIR.
Furthermore, the window lengths changes drastically independent of reverberation
time, indicating that the echo density profile measure of a RIR is insensitive to
reverberation time, just as stated by Abel and Huang [5].

8


3. Methods

The echo density profile of the three RIRs listed in Table 3.1, is presented in
Figure 3.1, where respective mixing times are marked with circles. The mixing
times is set to when the echo density reaches 1 the first time, according to Abel and
Huang [5].

Figure 3.1: Calculated echo density profiles using the Paraspax method of three
types of RIRs; a tunnel, a semi outside environment and a warehouse of big volume.

Typical for a reverberation impulse response is a echo density profile starting
around 0 and then increasing towards 1. As seen in Figure 3.1, this is the case for
all three RIRs, and also for all tested 15 RIRs. Furthermore, the narrow shape of
the tunnel allows the sound to be reflected quickly which results in an echo density
profile starting above 0 in contrast to the tennis court and warehouse, and especially
the warehouse. Its big, empty space causes the sound to travel longer before it gets
reflected, but as soon as it does so the space is quickly fully mixed. This is seen in
Figure 3.1 as it begins to grow fast. Both the space of the tunnel and the tennis
court are equipped with cavities where the sound disappears without being reflected
back. This results in a space that never fully gets mixed. This can be seen in the
graphs of their respective echo density profile which jumps up and down. This is
especially the case for the tennis court which has a larger opened surface compared
to the tunnel.

The mixing time for the tunnel is estimated earliest in time, followed by the tennis
court and then the warehouse. This estimation holds for any measurement setup
as the echo density profile measure is independent of measurement setup within the
same room. The increase in echo density profile only depends on the room’s shape
and volume.

9


3. Methods

3.2 Reverberation level

Reverberation level contains information about both directional and diffuse rever-
beration. Within the Paraspax there are three different methods for estimating the
reverberation level called the MAX, RMS and EDC method. What differentiates
the various methods is how the envelope of the absolute pressure response |p| is esti-
mated. Just as perceived by the names of the different methods, the MAX and RMS
method uses a sliding window of 1 ms and then the maximum respective the root-
mean-square of that window is calculated. The EDC method uses the previously
calculated early decay curve and transforms it into a level curve.

The following steps are the same for all three methods. A first-order polynomial
fit is used for the envelope from two to three times the mixing time in order to
guarantee that no early reflections distorts the envelope. The remaining values of
the decay curve is estimated by linear extrapolation. The reverberation level is
defined as the level of the diffuse sound field at the TOA of the first early reflection,
but the amplitude of the reverberation can be found from the decay curve at any
time, for example at the mixing time.

According to the authors of the Paraspax, the reverberation level can be esti-
mated using any of the three methods, but the MAX method is the one that provides
the best estimates. This conclusion holds also for the tested RIRs, where the MAX
method was used for all of them except the Genesis 6 studio and the Shrine and
parish church of all saints north street that got the best reverberation level esti-
mates using the RMS and EDC method, respectively. The reverberation level of
these spaces as well as sliding window and polynomial fit can be seen in Figure 3.2
and 3.3. These plots shows that respective polynomial fit follows the decay of the
amplitude in dB and is thereby a good representation of the diffuse sound in the
spectral components. The reverberation level is calculated as -6.7 dB for the Genesis
6 studio and -9.4 dB for the Shrine and parish church of all saints north street. The
reverberation level for the other RIRs are listed in the Appendix.

3.3 Spectral components

The Paraspax method divides the early part of the RIR into spectral components
and early diffuse sound. The spectral components of the direct sound and early
reflections are estimated in time, defined by the respective TOAs, with correspond-
ing amplitudes. They are then assigned with DOAs in spherical coordinates which
describes the angle at which they reach the listener. The parts of the processing
that produces these parameters are described below, followed by the processing that
produces the synthesized early diffuse sound. The directional components param-
eterized by the Paraspax method is used for the loudspeaker array, and a number
of decorrelated copies of the early diffuse sound are synthesized, one for each loud-
speaker.

The direct sound is one single event in the RIR and is easily found by applying
a 1 ms long window to the onset. Then the TOA is defined as the time index of
the absolute maximum of the pressure response within this window. To get the

10


3. Methods

Figure 3.2: Reverberation level of the Genesis 6 studio estimated by the RMS
method.

Figure 3.3: Reverberation level of the Shrine and parish church of all saints esti-
mated by the EDC method.

11


3. Methods

corresponding RMS amplitude a new asymmetric window is applied around the
TOA. The window is of length 1.5 ms, starting 0.5 ms before the TOA and ending 1
ms after due to summing localization, i.e. if two or more sound waves arrive within a
time interval of 1 ms or smaller then all sound sources contributes to the direction of
the perceived total sound [10]. The amplitude is then defined as the RMS average of
the window. The method succeeded in finding these components of the direct sound
for all tested RIRs. Moreover, the method finds up to 10 early reflections for each
RIR which makes the approach a bit more complex. The TOAs and amplitudes will
be tracked by a reflection detection algorithm explained in subsequent section.

3.3.1 Reflection detection

When the TOA and amplitude of the direct sound is found, a reflection detection
algorithm is used to find the TOAs and amplitudes of the early reflections. First,
the TOAs are found by applying a sliding window of 1 ms to the whole RIR or up
to two times the estimated mixing time. If the energy of a time index in the window
is three times higher than the median energy of the whole window then a reflection
is defined at this time index, i.e. the TOA of the reflection. The RMS amplitudes
of the early reflections at the TOAs are calculated in an asymmetrical window in
the same way as for the direct sound. The high resolution of the window length
is needed in order to capture the ground reflection that is important, especially
for outside environments where the ground reflection might be the only reflection.
Furthermore, the high resolution normally provides more than 10 early reflections
which are first selected according to summing localization; if more than one TOA is
found within a time span of 1 ms then the one with the highest RMS amplitude is
defined as a reflection while the other(s) are removed from the early reflections.

As there still may be many reflections in the selection list, they are sorted by
their amplitudes in descending order such that the early reflections selected in the
end corresponds to the loudest reflections. The Paraspax method selects between
6 and 10 most prominent reflections due to interest of lowering the computational
load. The goal is to not use an unnecessary number of reflections that slows down
the processing but at the same time to use as many as necessary to fully recreate
a space. A previous study where the aim was to use a minimal set of salient early
reflections showed by listening experiments that 6 reflections are enough to repro-
duce parametric spatial audio rendering that is indiscernible from a fully-rendered
reference for speech content based on the image source model [10]. The image
source model is one out of three different approaches for calculating the DOAs of
the early reflections in the Paraspax method, and it is also the image source model
implemented by the Paraspax method that is used within this thesis in subsequent
section.

The absolute pressure response of the RIR is plotted together with the TOAs
and RMS amplitudes, detected by the Paraspax method, of the direct sound and
early reflections of three tested RIRs in Figure 3.4-3.6. For these RIRs the reflection
detection algorithm found a different number of reflections.

12


3. Methods

Figure 3.5: Direct sound and early reflections found by the reflection detection
algorithm of the Paraspax method in the RIR of the Hamilton mausoleum.

Figure 3.4: Direct sound and early reflections found by the reflection detection
algorithm of the Paraspax method in the RIR of the Helsington church.

13


3. Methods

Figure 3.6: Direct sound and early reflections found by the reflection detection
algorithm of the Paraspax method in the RIR of the Koli national park.

Figure 3.4-3.6 shows that the mixing time appears after all early reflections which
means it is a good predictor. The reflection algorithm detected 10 early reflections
for the Helsington church while it only found 9 for the Hamilton mausoleum. The
Koli national park is an outside environment only contributing to a single reflection.

3.3.2 Directions of arrival

The Paraspax method uses three different approaches for estimating the DOAs of the
selected early reflections whose TOAs and amplitudes have been previously found.
The approach used here is the image source model (see Section 2.2), but for those
interested in the other approaches, based on pseudo-randomized or precomputed
DOAs, are referred to [1]. As mentioned before, the method requires predefined
source direction and source distance which can be used to estimate the DOA of the
direct sound. Alternatively, the direct sound DOA can be found from the image
source simulation together with the DOAs of the reflections.

Geometrical data (source and receiver positions and room dimensions) is neces-
sary for spatialization using the image source model. The room dimensions are an
approximation of a shoebox-shaped room, an approximation that may differ greatly
from reality for some of the tested rooms that, for example, have arched sides, con-
tain small passages or misses one or some of the walls. The order of image sources is
set to 2 which would be enough since first-order image sources in an empty shoebox-
shaped room contributes to six dynamically reproduced early reflections from the
four walls, floor and roof. The Paraspax method also allows for preferring first-order

14


3. Methods

reflections but this was not the case in this study.
The image source simulation of the Paraspax method derives all second-order

reflections from the simulated room, each with a corresponding TOA and DOA.
These TOAs are compared with the TOAs obtained from the reflection detection
such that those with the smallest TOA differences are defined as the same reflec-
tion. Then the azimuth and elevation of the corresponding DOA describes in which
direction the reflection will arrive to the listener. In the following, the environments
of three tested RIRs are described and their respective DOA pattern found from the
simulated image source model of the Paraspax method will be shown.

Arthur sykes rymer auditorium is said to reproduce sound of high quality thanks
to its unique acoustics. Its preferred noise criterion (PNC) is better than the PNC
15 standard, which means it shuts out outside noise. It is a rectangular-shaped
auditorium so the space itself is shoebox-shaped. The simulated DOAs of the direct
sound (red) and early reflections (fushia) in the auditorium are presented in Figure
3.7, showing that reflections hits the receiver only in two azimuth angles but in
various elevation angles. Due to its rectangular shape, the result is more accurate
than the following examples, but the image source model does not take into account
the inclined medical floor and its interior. However, the image source model is a
popular approach for generating early reflections due to its efficient simulation [11].

Figure 3.7: DOAs of the direct sound and early reflections in azimuth and elevation
of Arthur sykes rymer auditorium, found by the Paraspax method.

The Stairway of a university is located in a shoebox-shaped room of which rises
in height. The impulse response is assumed measured at one of the mid floors so

15


3. Methods

that the source and receiver positions are positioned in the middle of the room.
Note that reflections in the steps will not be estimated in an empty shoebox-shaped
room of the image source model. The tunnel-like chamber Hoffman lime kiln has
arched side walls and roof. It is a large U-shaped stone construction, whose impulse
response was measured at position "R1", seen in Figure 3.8, differs markedly from
a rectangular room. The curved area, seen in the floor plan, is used in the image
source simulation, approximated as a shoebox-shaped room of dimensions 25 m ×
4.72 m × 2.3 m (length × width × height).

Figure 3.8: The inside (left) and the floor plan marked with different source and
receiver positions (right) of the Hoffman lime kiln chamber.

The DOAs of the direct sound and early reflections of the Stairway and Hoffman
lime kiln can be seen in Figure 3.9 and Figure 3.10 respectively. Due to where
the source and receiver are positioned in the Stairway, the sound waves are more
likely to get reflected in the side walls than in the floor and roof and therefore,
almost all reflections have 0◦ in elevation angle but have the greater spread in the
azimuth. However, the stairs contribute to floor and ceiling formations where some
sound waves would be reflected in reality which will not be included here. The
approximation of the space of Hoffman lime kiln allows some of the sound waves to
get reflected instead of disappearing through the long parallel corridors. It means
that the image source simulation contributes to more reflections than in real world
compared to the Stairway where the case is opposite.

3.4 Late reverberation
The diffuse reverberation of the Paraspax method is synthesized in the same way
as in the BinRIR algorithm [4]. Just as in the Paraspax, the measured RIR is

16


3. Methods

Figure 3.9: DOAs of the direct sound and early reflections in azimuth and elevation
of the Stairway, found by the Paraspax method.

Figure 3.10: DOAs of the direct sound and early reflections in azimuth and eleva-
tion of Hoffman lime kiln, found by the Paraspax method.

17


3. Methods

separated and divided into direct sound, early reflections and late reverberation,
which are treated separately before being put together.

The aim is to create 2-channel late reverberation for a pair of ears that is spatially
equally distributed. It means that the late reverberation is omnidirectional, having
on average the same proportion of sound energy from every direction. To obtain
this, the signals are decorrelated, i.e. the cross-correlation is reduced. Interaural
coherence is the measure of similarity between the reverberation received by each
of the two ears, where a low value is desirable and creates a more pleasant sound in
comparison to correlated diffuse sound which sounds strange and not very diffuse.

In a first step, binaural white noise is generated and filtered with an interaural
coherence filter which gives each channel slightly different parameters. The binaural
noise is then split into time segments of 2.67 ms [4] and convolved with small chunks
of the measured RIR of 0.67 ms to adapt the noise into the energy decay curve. The
length of the windows was determined using a listening test. All time sections
are windowed with raised-cosine ramps and then finally added all together with the
overlap-add method which overlap all time sections so that the cosine ramps entering
each other before adding them. This gives a smooth and coherence result.

The late reverberation for the loudspeaker array is synthesized in the same way
as in the Paraspax method, but a few more decorrelated copies are generated, one
for each loudspeaker signal. So instead of creating a 2-channel decorrelated late
reverberation for headphone reproduction as in the Paraspax method, a N -channel
decorrelated late reverberation is synthesized, where N is the number of loudspeakers
in the array.

3.5 Early diffuse sound
The late reverberation synthesized in the Paraspax method is based on the measured
RIR in the whole time range so it extends therefore in both the early and latter part.
The diffuse sound in the latter part is used as late reverberation. However, diffuse
sound is also used in the early part of the synthesized BRIR and added to the
parameterized directional components. The motivation comes from a study based
on listening experiments [12], which has shown that by adding diffuse sound to the
spectral components in the early part of a BRIR contributes to higher perceptual
quality then if only spectral components are used. Therefore, early diffuse sound will
also be present in each loudspeaker signal of the loudspeaker array. In the Paraspax
method, it is synthesized by taking the early part (up to 2-3 times the mixing
time) of the synthesized late reverberation from the previous step and performing
some further processing described below. Since the loudspeaker array requires a
N -channel late reverberation where each channel is decorrelated, the early diffuse
sound will consist of a N -channel decorrelated early diffuse sound, one for each
loudspeaker in the array, that is built upon the N -channel late reverberation. The
following description of how the early diffuse sound is obtained by the Paraspax
method is the same for the loudspeaker array, except that a few more copies are
synthesized.

Together with the late reverberation, the early diffuse sound field is estimated
from a weighting function based on the selected early reflections obtained from the

18


3. Methods

reflection detection. It is created by taking a sliding window of 1 ms to the absolute
pressure response and convolving it by a Hanning window of 3 ms. Its values at the
TOAs of the direct sound and selected early reflections are windowed with a 1.5 ms
window and set to 1. The strong edges that arisen are smoothed out with a 1 ms
window. The weighting function of the Helsington church, whose direct sound and
early reflections are presented in Figure 3.4, can be seen in Figure 3.11 illustrated
by the yellow curve.

Figure 3.11: The early part of the measured RIR of the Helsington church plotted
together with the detected direct sound and early reflections (directional part) and
the corresponding weighting function. The directional part and weighting function
is found by the Paraspax method.

To get the early diffuse sound, the weighting function is inverted as the square
root. Let wf be the weighting function. The inverse weighting function is then

(wf)−1 =
√

1 − wf. (3.1)

Reverberation level is also used in BRIR synthesis when estimating the early
diffuse sound as preserves the diffuse sound in the spectral components. In order for
the directional parts to still be prominent and not masked by the diffuse sound, the
inverse weighting function is limited to not exceeding the value of the reverberation
level. The inverse weighting function of the Helsington church and its diffuse rever-
beration in the early part of one of the N channels is plotted in Figure 3.12. The
binaural diffuse sound in the early part extends up to the 2 times the mixing time.

19


3. Methods

Figure 3.12: The binaural diffuse reverberation built from binaural white noise
and based on the measured RIR of the Helsington church plotted together with the
inverse weighting function.

The early diffuse sound is obtained by multiplying the two functions in Figure
3.12, and the obtained results can be seen in Figure 3.13, where it is plotted together
with the directional part obtained by multiplying the weighting function with the
measured RIR in order easily show that the early diffuse sound never exceeds the
amplitudes of the direct sound and the early reflections.

3.6 Extension on loudspeaker arrays

The monaural and spatial parameters calculated in the previous steps can be used
for spatial audio reproduction. The extension on loudspeaker arrays uses a number
of loudspeakers as the source, instead of headphones that are used in the Paras-
pax method. Instead of using a physical loudspeaker array, head-related impulse
responses can be used to create virtual sound sources which makes it possible to
virtually place a listener inside the array using headphones. The listener is then
able to virtually move within the array by adjusting each loudspeaker signal in rela-
tion to the new distance and angle between the listener and the loudspeakers. The
simulated sound field at the center and how it is constructed will be explained first
followed by the construction of the sound field as the listener changes position.

20


3. Methods

Figure 3.13: The directional and diffuse components that forms the early part of
the synthesized BRIR of the Helsington church.

3.6.1 Listener at the center
The loudspeaker array is defined as a number of loudspeakers placed over a sphere.
The listener is placed at the origin of the sphere so the position of each loudspeaker
relative the listener can be represented by spherical coordinates (azimuth and eleva-
tion). In total, the loudspeaker array consists of 84 loudspeakers and their positions
are illustrated in Figure 3.14 by the green dots. The loudspeaker array contains 7
elevation angles with a resolution of 25◦, starting at 75◦ above and ending at -75◦

below the listener. The azimuth angle extends around the listener from 0◦ to 330◦

with a resolution of 30◦. The loudspeakers lying on the circle in the horizontal plane
orthogonal to the zenith, at the fourth elevation row, are in line with the ears of
the listener. The north and south poles of the sphere are not equipped with any
loudspeaker. The paper presented by Müller and Ahrens [13] shows that listeners
who performed a listening test could not hear any clear differences in SRIRs con-
taining elevated early reflections or not. Although there are larger perceived spatial
differences in loudspeaker-based reproduction compared to reproduction using head-
phones, the elevated reflection has to be strong in order for a listener to hear clear
differences when projecting this reflection onto the horizontal plane.

In a first step, the sound field is rotated such that the DOAs of the direct sound
and early reflections are converted into the global coordinate system relative the
listener in which the loudspeaker array is defined. By doing this the direct sound
will always be played-back from the loudspeaker positioned right in front of the
listener that corresponds to 0◦ in azimuth and elevation. Then the asymmetrical
windows of 1.5 ms containing the amplitudes of the early reflections at the respective

21


3. Methods

Figure 3.14: A spherical loudspeaker array of 84 loudspeakers positioned at the
green dots.

TOAs are distributed over the loudspeakers. The loudspeaker position in spherical
coordinates that matches the DOA of a reflection the best is the loudspeaker that
will play-back that reflection. Figure 3.15 shows the loudspeaker signals of two
of the loudspeakers in the array of the Helsington church when all reflections has
been distributed over the loudspeaker array. One early reflection is passed to one of
the loudspeakers as its position matches the DOA of that reflection. For the other
loudspeaker, the incidence angle of three reflections matches its position.

The time it takes for each loudspeaker signal to reach the listener (the TOA of
the loudspeaker array) is determined by the radius of the loudspeaker array, rLA,
which is set to 10 meters,

TOALA = rLA

c
≈ 29.2 ms, (3.2)

where c = 343 m/s is the speed of sound in air. Each loudspeaker signal will therefore
be shifted according to this radius so that the time of arrival of the direct sound is
the same for all tested RIR as the same loudspeaker array is used to reproduce all
rooms.

Both the late reverberation and early diffuse sound are synthesized (and the de-
scription of how it is synthesized can be read in section 3.4 and 3.5) such that they
consists each of a 84-channel of decorrelated signals that corresponds to each loud-
speaker in the array. The method has now simulated a set of spectral components,
early diffuse sound and late reverberation, represented by each loudspeaker. The
listener can either physically be placed inside a loudspeaker array whose respective

22


3. Methods

Figure 3.15: Early reflections passed to two different loudspeakers of the array.
One loudspeaker is assigned one reflection (left) while another is assigned three
reflections (right).

loudspeakers plays its assigned loudspeaker signal, or the loudspeaker array can be
created virtually by using HRIRs corresponding to each loudspeaker position in the
array for creation of virtual sources.

3.6.2 Virtual loudspeaker array
When simulating a virtual loudspeaker array it can be play-backed using headphones
and the aim is therefore to synthesize a 2-channel BRIR that corresponds to the total
contribution of all loudspeaker signals in the array.

In the Paraspax method, HRIRs are simulated as spherical harmonics coefficients
at the spatial order of M ≤ 35 which corresponds to arbitrary head orientations of
the listener. The HRIR set used is measured from a Neumann KU100 artificial head.
This set of HRIRs can be used for creating virtual loudspeakers, where only HRIRs
of the head orientations that corresponds to the loudspeaker positions relative the
listener are used. The HRIRs are used to get the transmission of the loudspeaker
signals from each loudspeaker to the listener. The HRIR for two different loud-
speaker positions (90◦ in azimuth to the left, and 180◦ in azimuth, right behind the
listener) are shown in Figure 3.16. The elevation angle of the two HRIRs is at 0◦.
As seen in Figure 3.16, the sound coming from the loudspeaker positioned to the
left of the listener will be perceived as louder by the left ear for almost the entire
time range. The sound reaching the right ear is attenuated as the sound path is
obstructed by the listener’s head. However, for the loudspeaker placed at 180◦ right
behind the listener, both ears will hear approximate the same amount of the signal

23


3. Methods

Figure 3.16: Measured HRIRs from a Neumann KU100 artificial head showing
how sound is reaching the left and right ear when the sound source is positioned at
90◦ to the left (left) and at 180◦ right behind (right).

at the same time.
The loudspeaker signals containing the assigned spectral components, early dif-

fuse sound and late reverberation is convolved with its respective HRIR and by
doing this, each loudspeaker signal will have a left and right channel. As the de-
sired output of the virtual loudspeaker array is the combined contribution from all
loudspeakers, the loudspeaker signals are added to form a 2-channel loudspeaker
array-based synthesis representing how the sound is perceived by the listener that is
virtually placed inside the array. As the part of the loudspeaker signals that consists
of the directional components only contains energy at the TOAs of the direct sound
and early reflections, the contributions from all loudspeakers can easily be added.
For the early diffuse sound and late reverberation, on the other hand, a further step
is required when summing up the various loudspeaker signals.

The early diffuse sound that is played-back from each loudspeaker are incoherent
with equal RMS. By adding two such signals, the sound increases by 3 dB. To
counteract this, the sum of the early diffuse sound of each loudspeaker is divided by
the square root of the number of loudspeakers used in the array,

p̃(t) =
∑N

n=1 p̃n(t)√
N

, (3.3)

where p̃(t) is the RMS sound pressure amplitudes added from all loudspeakers and
N denotes the number of loudspeakers. The same applies the late reverberation.

To move the listener inside the loudspeaker array, the listener can either physi-
cally move, or the loudspeaker signals in the virtual loudspeaker array can be mod-

24


3. Methods

ified according to the new loudspeaker positions relative the listener for each new
listener position.

3.6.3 Change listener position

In the Paraspax method, only the direct sound and early reflections are modified
as the listener moves the head or changes its position. The early diffuse sound
and the late reverberation is kept constant during the audio reproduction, but the
reproduction is still accurate due to changing the directional components in the
method also results in a change in the DRR [1]. However, for a listener moving
within a loudspeaker array, the distance to the different sources changes with each
new listener position and therefore should the whole signal of each loudspeaker
change.

The first step is to define the new loudspeaker positions in spherical coordinates
relative the new listening position. Each loudspeaker signal still contains the same
components, but the incident angle of the sound from each loudspeaker to the listener
changes, where certain loudspeaker signals are strengthened in some areas while they
are weakened in others. Figure 3.17 shows an example of how the sound rays of the
loudspeaker signals reaches a listener, positioned at the new position, marked with
a red dot, and how it differs from when the listener is positioned at the center,
marked with a black dot. The loudspeaker array is illustrated as 4 loudspeakers in
the horizontal plane, positioned at varying azimuth angles at the elevation angle of
0◦ for simplicity.

Figure 3.17: Sound travelling from the loudspeakers to a listener positioned at the
center of the array (black dot) and to a listener at an arbitrary position (red dot)
within a simplified loudspeaker array of four loudspeakers in the horizontal plane.

25


3. Methods

As seen in Figure 3.17, when the listener moves inside the array, the distance
between the listener and every source changes and is not the same for all loudspeak-
ers, as it was when the listener was positioned at the center. The TOA of each
loudspeaker signal therefore has to be modified according to the new distance to the
listener. The new distances for each loudspeaker relative the listener are calculated
and the loudspeaker signals are shifted accordingly.

A variable d is defined as the difference between the distance at the sweet spot,
dss, and the distance at the new position, dnp,

dn = dssn − dnpn, for n = 1, 2..., N. (3.4)

Negative values of dn denotes that the new position is closer to loudspeaker n than
before, while greater distances gives positive values. The respective loudspeaker
signals are amplified or reduced, depending on dn. The Paraspax method uses
the inverse-square law saying that the sound energy radiating from a point source
decreases proportional to the square of the distance. This distance attenuation,
however, is often too extreme for a loudspeaker which cannot really be equated to
a point source. A factor for the distance attenuation that works in most cases for
loudspeakers is the square root of the distance rn between loudspeaker n and the
new listening position, such that

p̃n(t) ∝ 1
√

rn

, if dn > 0, (3.5)

p̃n(t) ∝
√

rn, if dn < 0, (3.6)

where p̃n(t) denotes the RMS sound pressure from loudspeaker n containing the
directional as well as the diffuse sound.

For the virtual loudspeaker array, the loudspeaker signals are convolved with a
new set of HRIRs that corresponds to the loudspeaker positions relative the new
listener position, as seen in Figure 3.17.

26


4
Results

The method is tested for the 15 selected monaural RIRs that can be found in [2] and
[3]. The aim is to reduce the number of loudspeakers used in the loudspeaker array
while maintaining the sound quality. The number of loudspeakers is determined
when the listener is positioned at the center of the array. Using the resulting number
of loudspeakers, the listener will move within the loudspeaker array and it will
be examined how far from the center the listener can move without changing the
sound image. Then the influence of different parameters of the loudspeaker array is
investigated and it will be examined if these parameters also changes how far the
listener can move from the center.

The results are obtained from the virtual loudspeaker array such that the listener
is virtually placed inside the loudspeaker array. The resulting loudspeaker array
synthesis that represents the total contribution from all loudspeakers is convolved
with an anechoic drums audio file. The resulting audio is analyzed by listening to
the auralization and conclusions are drawn from the listening along with analysis of
corresponding plots.

4.1 At the center of the loudspeaker array
When the listener is positioned at the center of the array, the aim is to reduce the
number of loudspeakers while maintaining the sound quality. It could be both costly
and time consuming to assemble a loudspeaker array containing many loudspeakers.
The sound field created by the loudspeaker array of 84 loudspeakers presented in
Figure 3.14 will be used as a reference and compared with two simplified versions
of a loudspeaker array. The number of loudspeakers in the simplified versions is
reduced until the sound quality differs from the sound quality created by the 84-
loudspeaker array. The two simplified loudspeaker arrays that will be tested here
contains loudspeakers whose positions only varies in azimuth or elevation, unlike
the 84-loudspeaker array whose loudspeaker positions have different azimuth and
elevation angles. The simplified loudspeaker array having loudspeaker positions with
varied azimuth angle is in the horizontal plane and the loudspeakers are positioned
at ear level of the listener on the circle at the fourth row in the 84-loudspeaker array
in Figure 3.14 when the elevation angle is 0◦. In the other simplified version, the
loudspeakers are positioned right in front of the listener at the azimuth angle of 0◦,
and at different elevation angles such that the loudspeaker array is in the vertical
plane.

The auralization from the 84-loudspeaker array compared with the auralization

27


4. Results

from the simplified loudspeaker arrays resulted in that the quality of the sound field
is still maintained for the number of loudspeakers that are presented in Table 4.1
for all of the 15 tested environments. For the environments where the number of
loudspeakers are not specified in Table 4.1, the simplified loudspeaker arrays could
not be used.

Number of loudspeakers Number of loudspeakers
Name (varying azimuth) (varying elevation)

Genesis 6 studio 3 -
Trollers gill - -
Maes howe 3 -

Arthur sykes rymer auditorium - -
Koli national park - -

Stairway 3 -
Hoffman lime kiln 3 -

Central hall - -
Helsington church 3 -

Promenadikeskus concert hall 3 -
Innocent railway tunnel 3 -

Falkland palace
royal tennis court 3 -
Shrine and parish
church of all saints 3 -

Hamilton mausoleum 3 -
Terrys factory warehouse 4 -

Table 4.1: Numbers of loudspeakers required in the loudspeaker array for different
impulse responses.

For the loudspeaker array in the vertical plane, up to seven loudspeakers was used
at different elevation angles, but this loudspeaker array setup could not achieve
binaural sound at all. The 7-loudspeaker array setup is presented in Figure 4.1
from the side, where the listener is placed at the origin of the sphere and the mid-
loudspeaker at 0◦ in elevation is at the listener’s ear level. The loudspeakers are
positioned from 60◦ above the listener to -60◦ below the listener with a 20◦ resolution.
A lower amout of loudspeakers in the loudspeaker array in the vertical plane looks as
in Figure 4.1, but with greater spacing between the loudspeakers due to the higher
resolution. The reason why the simplified version of the loudspeaker array in the
vertical plane cannot be used can be seen from Figure 4.1. The loudspeakers are
positioned right in front of the listener and each loudspeaker signal will therefore
reach the left and right ear of the listener equally and create a mono sound. In
order for the loudspeaker array with varying elevation to create binaural sound,
there should be variations in the azimuth as well so that the sound reaches the
listener from behind and the sides as well.

As seen in Table 4.1, three loudspeakers are enough in 10 cases for the simplified
loudspeaker array in the horizontal plane. The corresponding 3-loudspeaker array

28


4. Results

Figure 4.1: A loudspeaker array in the vertical plane containing 7 loudspeakers
positioned at the green dots right in front of the listener.

is presented from above in Figure 4.2, where the loudspeakers are represented by
green dots placed at an equal azimuth angle between each other at 0◦, 120◦ and
240◦. Henceforth these loudspeakers will be called "L1", "L2" and "L3", respectively.
The listener is positioned at the origin of the circle, facing the rightmost loudspeaker
of azimuth angle at 0◦.

For all the environments that required three loudspeakers in the loudspeaker
array in the horizontal plane, presented in Table 4.1, the 3-loudspeaker array created
a sound field which can be equated to the sound field created by the 84-loudspeaker
array. The sound field sounds spacious and dynamic, and it sounds like the sound
from the different drums in the audio are coming from different directions. It also
sounds wide in comparison to when a 2-loudspeaker array is used. Then the sound
quality drops drastically. The loudspeakers are then positioned right in front and
behind the listener at 0◦ and 180◦ so that the spaciousness in the sound decreases
and it sounds more flat and narrow. It is also harder to hear from which direction
the sound from the different drums come from. The position of the loudspeakers
results in that the loudspeaker signals reaches the listener equally and therefore it
sounds more monaural than binaural.

The Terrys factory warehouse is the only environment where it is preferred to
use a 4-loudspeaker array in the horizontal plane over a 3-loudspeaker array. The
loudspeakers are then placed around the listener with equal azimuth angle between
them as in Figure 4.2, but with the loudspeaker positions at 0◦, 90◦, 180◦ and 270◦

in azimuth. However, the reflections are distributed only over the three loudspeakers
positioned at 0◦, 180◦ and 270◦, while the loudspeaker signal at 90◦ only contains
diffuse sound. When the 3-loudspeaker array is used, only the two loudspeakers at

29


4. Results

Figure 4.2: A loudspeaker array in the horizontal plane containing 3 loudspeakers
positioned at the green dots around the listener.

0◦ and 120◦ are assigned reflections. What distinguishes the 3-loudspeaker array
from the 4-loudspeaker array is that the 3-loudspeaker array creates a more flat and
mono sound than the 4-loudspeaker array that sounds richer and more binaural. In
addition, the 3-loudspeaker array creates some disturbing echo which could not be
heard in the loudspeaker array of 84 and 4 loudspeakers. Moreover, the DOAs of
the reflections are more in line with the loudspeaker positions of the 4-loudspeaker
array compared to when three loudspeakers are used.

The measured monaural RIR of Maes howe is shown in Figure 4.3 and the syn-
thesis created from the 3-loudspeaker array in the horizontal plane of Maes Howe
is shown in Figure 4.4 together with the synthesis from the 84-loudspeaker array
for comparison. In these plots, it can be seen that the method successfully recre-
ated the structure of the monaural RIR, and that a reduced number of loudspeakers
gives rise to similar plots as for the 84-loudspeaker array. In the 3-loudspeaker
array the direct sound is played from loudspeaker L1. The DOAs in azimuth of the
10 early reflections are approximated to the angles of the loudspeaker positions at
120◦ and 240◦ such that seven reflections are played-back from loudspeaker L2 and
three reflections are played-back from loudspeaker L3. This can be seen in the plot,
which is dominated at the left ear because L2 is positioned on the left hand side of
the listener. When using an array of 4 loudspeakers instead of 3 for this particular
environment, the azimuth of the early reflections are more in line with the azimuth
of the loudspeaker positions. The synthesis, both from a 4-loudspeaker array as well
as a 3-loudspeaker array, sounds similar as to the synthesis of the loudspeaker array
of 84 loudspeakers. However, since the goal is to reduce the number of loudspeakers

30


4. Results

Figure 4.3: Measured monaural room impulse response of Maes Howe.

Figure 4.4: Loudspeaker array synthesis of Maes howe from a loudspeaker array
in the horizontal plane containing 3 loudspeakers (left) and from the 84-loudspeaker
array (right).

31


4. Results

Figure 4.5: Interaural coherence of Maes howe for a loudspeaker array in the
horizontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker array
(right).

in the array, the loudspeaker array containing three loudspeakers is the one to be
chosen at the end.

The frequency-dependent interaural coherence (IC) can be seen in Figure 4.5 for
the 3- and 84-loudspeaker array. The IC of the 3-loudspeaker array shows values
close to 1 for frequencies up to 100 Hz before it decreases for higher frequencies.
The IC then continues up and down and is higher that 0.5 at frequencies around
2 500 Hz, between 3 200 Hz and 3 900 Hz and from 5 200 Hz to 5 600 Hz as well
as above 12 700 Hz, indicating that the signals in the synthesis are more correlated
than decorrelated at these frequencies. The IC of the 84-loudspeaker array shows
that the signals are decorrelated to a higher degree in general compared to when
a reduced number of loudspeakers are used, and this is especially applied to the
mid-frequency region.

For the Helsington church, three and one reflections are assigned to loudspeaker
L2 and L3, respectively, while the direct sound and six reflections are played-back
from loudspeaker L1. The synthesis from the 3- and 84-loudspeaker array are shown
in Figure 4.6, which is dominated at the left ear because most reflections are assigned
to L2. However, it is not as left ear dominated as the synthesis of Maes howe and
the reason is that most reflections are played-back from the loudspeaker in front
of the listener. Its monaural RIR is presented in Figure 4.7, where it can be seen
that the space contributes to more and denser reflections than the synthesis that
only contains 10 early reflections. However, by comparing the synthesis with the
measured RIR, it is clearly audible that it is the same space.

Its IC is presented in Figure 4.8 together with the IC from the 84-loudspeaker
array, which shows that the loudspeaker signals have incoherent directional compo-

32


4. Results

Figure 4.6: Loudspeaker array synthesis of the Helsington church for a loudspeaker
array only changing in the horizontal plane using 3 loudspeakers (left) and its monau-
ral RIR (right).

Figure 4.7: Measured monaural room impulse response of the Helsington church.

33


4. Results

Figure 4.8: Interaural coherence of the Helsington church for a loudspeaker array
in the horizontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker
array (right).

nents in the frequency range between 300 Hz and 14 500 Hz. It can also be seen
from the plots in Figure 4.8 that the signals gets more coherent when the number
of loudspeakers in the array is reduced.

The result of having 3 loudspeakers in the array in the horizontal plane is clear
for the Genesis 6 studio, Maes howe, Stairway, Hoffman lime kiln, the Helsington
church, the Innocent railway tunnel, Falkland palace royal tennis court and Hamilton
Mausoleum, which all have at least four reflections distributed over loudspeaker
L2 and L3, and the rest of the reflections are assigned to loudspeaker L1. The
Promenadikeskus concert hall and the Shrine and parish church of all saints north
street have only 3 reflections assigned to both L2 and L3 and here the result is not
that obvious. The synthesis of the Promenadikeskus concert hall from the 3- and
84-loudspeaker array are presented in Figure 4.9, where no clear differences can be
pointed out from the plots but they differ more from each other by listening to the
results. From the 3-loudspeaker array it is harder to here the spaciousness since
the sound arrives mostly from the loudspeaker at the front and dominates the total
output while the other two loudspeakers that are necessary for creating binaural
sound contributes to a lower sound as only three reflections are distributed over
these loudspeakers.

The corresponding IC can be seen in Figure 4.10, showing incoherent directional
events in the frequency range from 400 Hz to approximate 10 000 Hz. The signals
are more correlated in the mid frequencies in the 3-loudspeaker array compared to
the 84-loudspeaker array, but they are still more correlated than decorrelated.

The syntheses in Figure 4.4-4.9 shows that in overall, the method performs well.

34


4. Results

Figure 4.9: Loudspeaker array synthesis of the Promenadikeskus concert hall for
a loudspeaker array in the horizontal plane containing 3 loudspeakers (left) and the
84-loudspeaker array (right).

Figure 4.10: Interaural coherence of the Promenadikeskus concert hall for a loud-
speaker array in the horizontal plane containing 3 loudspeakers (left) and for the
84-loudspeaker array (right).

35


4. Results

The TOAs of the direct sound and early reflections in the loudspeaker array synthe-
ses are in line with the TOAs in the monaural RIRs, apart from that the syntheses
are shifted according to the radius of the loudspeaker array. However, the amplitude
of the direct sound and early reflections in the syntheses are reduced in comparison
to the monaural RIR for some environments, while it is louder in the syntheses for
some other environments. Even though a maximum of ten early reflections are found
in the method, the RIRs retain their shape when they are recreated for loudspeaker
array synthesis, and the resulting binaural audio from the convolution process recre-
ates what it would have sounded like to be in the original location. Depending on
how many reflections that are assigned to each loudspeaker in the array, the synthe-
ses can be left or right ear dominated, which is the case for all tested RIRs as most
reflections tend to come from one side of the room.

The interaural coherence of the tested environments shows that the signals at
the left and right ears are coherent for low frequencies below around 300 Hz and for
high frequencies above 10 kHz. In the frequency range between these frequencies
the signals are incoherent. For some of the tested RIRs, a reduced number of
loudspeakers in the array contributes to the signals getting more coherent than the
signals of the 84-loudspeaker array. Also, for those spaces that are not rectangular in
shape, but the geometry has been approximated to a shoe-box room, the interaural
coherence is overall higher in the whole frequency range, compared to the spaces
where the true dimensions could be used.

The DOAs of the direct sound and early reflections of the RIRs that did not
perform well on the loudspeaker array are presented in Figure 4.11-4.13, while the
DOAs of the Arthur sykes rymer audiotorium was already presented in Figure 3.7
in the Methods chapter. What these environments have in common is that the
DOAs of the early reflections have little or no variation in azimuth. Therefore,
a loudspeaker array that only varies in azimuth will not create binaural sound as
it requires some reflections to reach the listener from behind. However, the early
reflections of these environments varies in elevation but binaural sound can not be
created for a loudspeaker array that varies only in elevation due to the positions of
the loudspeakers, which are placed in front of the listener.

4.2 Investigation of the sweet spot
The investigation of the sweet spot described in this section assumes a loudspeaker
array in the horizontal plane containing 3 loudspeakers, which is the required number
of loudspeakers for most of the environments tested. The sound field created by the
loudspeaker array is synthesized every 0.5 meter when the listener moves forward,
backwards and straight to the sides as well as diagonally forward and backwards,
both to the left and right. The different directions of the listener’s movement in
the 3-loudspeaker array can be seen in Figure 4.14. The environments in which the
method could not reproduce any binaural sound using the loudspeaker array will
not be included in this investigation. Also, different parameters of the loudspeaker
array will be changed in order to examine its influence on how the sound image
changes at different positions.

As the listener moves inside the loudspeaker array, the distance between the

36


4. Results

Figure 4.11: DOAs of the direct sound and early reflection of Trollers gill.

Figure 4.12: DOAs of the direct sound and early reflection of the Koli national
park.

37


4. Results

Figure 4.13: DOAs of the direct sound and early reflection of the Central Hall.

Figure 4.14: Directions of the listener’s movements in the 3-loudspeaker array.

38


4. Results

listener and each loudspeaker will change. A reduced distance to a particular loud-
speaker makes this loudspeaker signal dominating the total output from all loud-
speakers as the TOA of that loudspeaker signal decreases as well as the amplitude
increases. For each listener position in the loudspeaker array, the angle between
listener and loudspeaker changes and thus new HRIRs are created for each and ev-
ery step. At certain positions, the signal from a particular loudspeaker reaches the
listener with a lower level due to how it is angled towards the listener.

By moving straight forward towards loudspeaker 1, the directional components
assigned to this loudspeaker are amplified and as the TOA of these directional com-
ponents decreases, they gets separated from the rest of the directional components.
This can be seen in Figure 4.15, where the loudspeaker array synthesis of Maes
howe is presented for the listener positioned 1 and 2 meters in front of the center,
respectively.

Figure 4.15: Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array
in the horizontal plane at the listener position 1 meter in front of the center (left)
and 2 meters in front of the center (right).

Only the direct sound is playedback from loudspeaker 1 and all early reflections
are assigned to loudspeaker 2 and 3. As seen in Figure 4.15, the direct sound gets
more and more amplified as well as more separated from the rest of the reflections
as the listener gets closer to loudspeaker 1. At the same time, the amplitude of the
reflections and diffuse sound played-back from the loudspeakers behind the listener
decreases and they arrives to the listener later in time. The quality of the sound
image deteriorates already 0.5 meters from the center as it now sounds more flat
and not as rich as it does at the center, and the sound color does not sound as wide
as before. It continues to sound worse until the listener is 3.5 meters in front of the
center when it starts to sound as it does at the sweet spot again. The syntheses at

39


4. Results

Figure 4.16: Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array
in the horizontal plane at the listener position 3.5 meter in front of the center (left)
and 5.5 meters in front of the center (right).

3.5 and 5.5 meters in front of the center is shown in Figure 4.16, where it can be
seen that the reflections gets more and more attenuated until they are a part of the
diffuse sound at 5.5 meters from the center.

The IC of Maes howe when the listener has moved 1 m respective 3.5 m in front
of the center is presented in Figure 4.17, where it can be seen that the left and right
signals becomes more coherent the further from the center the listener moves.

The opposite happens when the listener instead moves backwards from the cen-
ter. The direct sound reaches the listener after the reflections do and the further
from the center the listener is, the higher the reflections become in amplitude while
the direct sound is attenuated. This is illustrated in Figure 4.18, where it can be
seen that at 2 meters from the center, the direct sound is already attenuated a lot.
Furthermore, amplitude of the reflections increases for listening positions up to and
including 3.5 meters from the center before they starts to attenuates. This is due
to that the distance between the listener and loudspeaker 1 and 2 starts to increase
again for listening positions further than 3.5 meters backwards from the center. The
reflections contribute to a richer sound compared to how it sounded when the lis-
tener moved forward, but the direct sound do not sound as clear as it did when the
listener was positioned at the center and this affects the sound quality negative.

For the Genesis 6 studio, six early reflections are distributed over loudspeaker
2 and 3, where only one of them are played-back from loudspeaker 2. The other
4 reflections and the direct sound are assigned to loudspeaker 1. This creates a
loudspeaker array synthesis which is dominated at the right side, as seen in Figure
4.19. By moving straight aside to the left, loudspeaker 2 will dominate the total

40


4. Results

Figure 4.17: Interaural coherence of Maes howe for a 3-loudspeaker array in the
horizontal plane at the listener position 1 meter in front of the center (left) and 3.5
meters in front of the center (right).

Figure 4.18: Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array
in the horizontal plane at the listener position 1 meter behind the center (left) and
2 meters behind the center (right).

41


4. Results

Figure 4.19: Loudspeaker array synthesis of the Genesis 6 studio for a 3-
loudspeaker array in the horizontal plane at the sweet spot.

output from all loudspeakers and as this loudspeaker are closer to the left ear, the
signal will be amplified at the left ear. But both loudspeaker 1 and 3 will dominate
the right ear because the listener’s head will shadow these loudspeaker signals at the
left ear. This will in turn amplify the sound on this side of the listener. This can be
seen in Figure 4.20, where the one reflection played-back from loudspeaker 2 is only
heard by the left ear and appears earlier in time than the direct sound and rest of
the reflections. Moreover, the whole loudspeaker array synthesis is dominated at the
right ear as most reflections reaches the listener from this side. The amplitude of the
loudspeaker signals of loudspeaker 1 and 3 attenuates more and more the further to
the left the listener is, while it increases for loudspeaker 2. The sound quality is good
as far as 1.5 meters from the center then it drastically deteriorates the further from
the center the listener moves. The separation in time of the reflections contributes
to a more flat sound and it sounds like the previous wide sound color has tapered
off, but the spaciousness, though, can still be heard.

The opposite applies when moving to the right, where loudspeaker 3 will be the
dominate loudspeaker. This is illustrated in Figure 4.21. It sounds just as when
moving to the left, but the listener can move further to the right compared to the
left, before the quality goes down. The HRIR from loudspeaker 3 to the listener
at the new listener positions 1.5 m as well as 3.5 m aside to the right is presented
in Figure 4.22, where it can be seen how the listeners head, upper torso and pinna
filters the loudspeaker signal at the two ears.

When the listener moves diagonally forward, both to the left and right, loud-

42


4. Results

Figure 4.20: Loudspeaker array synthesis of the Genesis 6 studio for a 3-
loudspeaker array in the horizontal plane at the listener position 1.5 meters aside to
the left of the center (left) and 3.5 meters aside to the left (right).

speaker 1 dominates the total output from all loudspeakers and reaches the listener
before the other loudspeaker signals do. The listener can move further from the cen-
ter in the loudspeaker array of Maes howe, in comparison to the Promenadikeskus
concert hall, which only has two and one reflection assigned to loudspeaker 2 and 3,
respectively, where the quality deteriorates already by moving 7 cm diagonally for-
ward. The quality deteriorates because the sound sounds flat and the spaciousness
decreases. This is also the case for Maes howe, with the exception that it sounds
just as at the sweet spot up to 3.5 meters from the center.

Due to the symmetry of the loudspeaker array on each side of the listener, the
same occurs when the listener moves diagonally forward to the left as to the right,
but the results are reversed. The further away from the center the listener is,
the less the loudspeakers are angled towards the listener, which contributes to the
sound being reduced in amplitude on both ears. For listening positions up to 2.8
m diagonally in front of the center of the array, the loudspeaker signals behind the
listener are attenuated, not only due to how the loudspeakers are angled relative
the listener but also due to the increase in distance between the loudspeakers and
the listener. This applies especially to the loudspeaker positioned on the same side
to the one the listener moves towards. This can be seen in Figure 4.23, where the
syntheses of Maes howe at listening position 2 m diagonally to the left and to the
right is presented.

43


4. Results

Figure 4.21: Loudspeaker array synthesis of the Genesis 6 studio for a 3-
loudspeaker array in the horizontal plane at the listener position 1.5 meters aside to
the right of the center (left) and 3.5 meters aside to the right (right).

Figure 4.22: HRIR from loudspeaker 3 to the listener at the new listener positions
1.5 m aside to the right (left) and 3.5 m aside to the right (right).

44


4. Results

Figure 4.23: Loudspeaker array synthesis of Maes howe at the listening position
2 m diagonally in front of the center to the left (left) and at the listening position 2
m diagonally in front of the center to the right (right).

As seen in Figure 4.23, the direct sound is amplified equally in both cases, but is
dominated at different ears. Moreover, as the listener moves to the left, loudspeaker
2 becomes closer to the listener than loudspeaker 3. The opposite happens when
the listener moves to the right and this can be seen in Figure 4.23 by looking at
the reflections. The reflections assigned to loudspeaker 2 are amplified when the
listener moves to the left, while the reflections distributed to loudspeaker 3 are
attenuated. When the listener instead moves to the right, the reflections playeb-
back from loudspeaker 3 will be amplified while the loudspeaker signal of loudspeaker
2 is attenuated. Moreover, the reflections of loudspeaker 3 are higher in amplitude
when the listener moves to the right in comparison to the reflections of loudspeaker
2 when the listener moves to the left.

At the position 3.5 m diagonally in front of the center of the array to the left, the
minimum distance between listener and loudspeaker 2 occurs, which highly amplifies
this loudspeaker signal. The distance to loudspeaker 3, on the other hand, increases
and the reflections played from this loudspeakers are highly attenuated. This can be
seen in Figure 4.24, where the reflections, in the syntheses of the Promenadikeskus
concert hall as well as Maes howe at this listener position, played-back from loud-
speaker 2 are drastically amplified, while the loudspeaker signal of loudspeaker 3 is
attenuated to the extent that the reflections of loudspeaker 3 are lower in amplitude
than the diffuse sound coming from the other speakers.

45


4. Results

Figure 4.24: Loudspeaker array synthesis for the listener position 3.5 m diagonally
forward to the left of the Promenadikeskus concert hall (left) and of Maes howe
(right).

The same but opposite applies to loudspeaker 2 and 3 when the listener do the
same move, but to the right. As the listener continuous to move further away from
the center of the array, the loudspeaker signals of loudspeaker 2 and 3 gets more
and more attenuated as the distance between the listener and these loudspeakers
increases.

When the listener moves diagonally forward to the left, the signals f