Unified Auralization of Room Simulations
Investigation of sampling grids and other parameters

Degree project report in the Department of Architecture and Civil Engineering

Xuanye Liu

Division of Applied Acoustics

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2024
www.chalmers.se

www.chalmers.se


Degree project report 2024

Unified Auralization of Room Simulations

Investigation of sampling grids and other parameters

Xuanye Liu

Department of Architecture and Civil Engineering
Chalmers University of Technology

Gothenburg, Sweden 2024


Unified Auralization of Room Simulations
Investigation of sampling grids and other parameters
Xuanye Liu

© Xuanye Liu, 2024.

Supervisor: Jens Ahrens, Department of Architecture and Civil Engineering, Teknisk
akustik
Examiner: Jens Ahrens, Department of Architecture and Civil Engineering,Teknisk
akustik

Degree project report 2024
Department of Architecture and Civil Engineering
Chalmers University of Technology
SE-412 96 Gothenburg
Sweden
Telephone +46 31 772 1000

Cover: Create binaura1 signal by PWD-based method(Provided by Jens Ahrens)

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Gothenburg, Sweden 2024

iv


Unified Auralization of Room Simulations
Investigation of sampling grids and other parameters
Xuanye Liu
Department of Architecture and Civil Engineering, Applied Acoustic
Chalmers University of Technology

Abstract
With the development of computer science and the increase in computing power,
the finite-difference time-domain method has gradually become a viable option for
solving wideband, transient problems in acoustics simulation. In this method, the
binaural response can be extracted by local sampling of the finite-difference grid
using an array of receivers. This thesis focuses on the effects of the grid size, the
spacing of grid points, and the distribution of grid points in the case of a volumetric
sampling. This thesis also investigates the effect of the change in impulse response
length and others. These effects include variations in numerical robustness and the
frequency properties of the generated audio signal. In order to achieve this study,
this thesis simulates the sound field formed by a single plane wave propagation. By
sampling this sound field with a volumetric grid, the plane wave density function
is calculated using the plane wave decomposition method and integrated with free-
field head-related transfer functions in the spherical harmonics domain to get the
binaural impulse response. The results of the valuable work show that the numerical
stability of a volumetric grid with uniformly distributed sampling points is greatly
affected at high frequencies due to spatial aliasing. In addition, the results also give
some valuable conclusions for the selection of the impulse response length and the
selection of the number of sampling points.

Keywords: FDTD, Plane wave decomposition, Sampling grid, Spatial audio.

v


Acknowledgements
I would like to express my deepest appreciation to all those who provided me with
the possibility to complete this thesis.
I am very grateful to my supervisor, Jens Ahrens, whose advice helped me clarify
my thoughts. He leads my academic research in the direction of spatial audio and
gives me a lot of opportunities to think about things far beyond the master’s thesis
itself.
I would like to thank my parents for their wisdom, unconditional love and support
throughout my life; this thesis would not have been possible without them. Thank
you.

Xuanye Liu, Gothenburg, March 2024

vii


List of Acronyms

Below is the list of acronyms that have been used throughout this thesis listed in
alphabetical order:

FDTD The Finite-Difference Time-Domain
PWD Plane Wave decomposition
IR Impulse Responce
HRTF Head-related transfer function
FFT Fast Fourier Transform
IFFT Inverse Fast Fourier Transform

ix


Nomenclature

Below is the nomenclature of indices, sets, parameters, and variables that have been
used throughout this thesis.

Indices

n Index for the spherical harmonic function order
m Index for the spherical harmonic function degree
d, f, g Index for position in discrete space
u Index for position in discrete time
i Index for grid nodes number

Sets

Z Set of Integer

Parameters

c Speed of sound
λ Courant number when using FDTD method
C1, C2 Constants when using FDTD method
a(k, θ, ϕ) Plane wave density function at wave number k, (θ, ϕ) direction
H l(k, θ, ϕ) HRTF of left ear at wave number k, (θ, ϕ) direction
jn(kr) Spherical Bessel functions of the first kind of nth order
hn(kr) Spherical Bessel functions of the second kind of nth order
hn(kr) Spherical Hankel functions of the first kind of nth order
h2

n(kr) Spherical Hankel functions of the second kind of nth order

xi


Y m
n (θk, ϕk) Spherical Harmonic at order n, degree m, wave number k, direction

(θk, ϕk)
bn(kr) Radial function
anm Plane wave density coefficients in frequency domian
anm(t) Plane wave density coefficients in time domain(Ambisonics coeffi-

cients)
Q Number of grid nodes
C Quadrature Matrix in frequency domain
cnm Quadrature Matrix in time domain
Cv Coefficient of variation
NIR Impulse response length of cnm

Variables

t Time
r Vectors in the spherical coordinate system
p(r, t) Pressure of pointr at time t

p|ud,f,g Pressure of point(d, f, g) at time u in discrete space and time
pl(k) Pressure of left ear at wave number k

pl(t) Pressure of left ear at time t

p Pressure matrix in frequency domain
p(t) Pressure matrix in time domain

xii


Contents

List of Acronyms ix

Nomenclature xi

List of Figures xv

List of Tables xvii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 3
2.1 Wave equation and FDTD method . . . . . . . . . . . . . . . . . . . 3
2.2 Plane wave density function . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Representation sound field in Spherical Harmonics . . . . . . . . . . . 4

2.3.1 Spherical Harmonics . . . . . . . . . . . . . . . . . . . . . . . 4
2.3.2 Spherical Fourier Transform . . . . . . . . . . . . . . . . . . . 5
2.3.3 Spherical Bessel and Hankel Functions . . . . . . . . . . . . . 6
2.3.4 Using Spherical Harmonics to express wave . . . . . . . . . . . 7

2.4 How to get Plane wave density function by microphone Array . . . . 9

3 Methods 13
3.1 General procedure for generating spatial audio signals . . . . . . . . . 13
3.2 Adjustment of the shape of the sampling grid . . . . . . . . . . . . . 14
3.3 The change in matrix stability expressed through Condition Number 19
3.4 Compare results generated by different methods . . . . . . . . . . . . 20

4 Results 23
4.1 Impact on the Condition Number . . . . . . . . . . . . . . . . . . . . 23

4.1.1 Influence caused by different grid size . . . . . . . . . . . . . . 23
4.1.2 Influence caused by different number of points inside the grid . 24
4.1.3 Influence caused by different distribution of points inside the

grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Results generated by different parameters . . . . . . . . . . . . . . . . 27

4.2.1 Difference between ground truth output and the binaural out-
put of three Anatomy planes . . . . . . . . . . . . . . . . . . . 27

xiii


Contents

4.2.2 Influence caused by different distribution of points inside the
grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2.3 Influence caused by different grid nodes number . . . . . . . . 31
4.2.4 Influence caused by the different impulse response length . . . 32
4.2.5 Influence caused by different grid size . . . . . . . . . . . . . . 34

5 Conclusion 35

xiv


List of Figures

2.1 Balloon plot of the spherical harmonics for n = 0 (top row) to n = 2
(bottom row) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Magnitude of the radial function bn(kr) = 4πinjn(kr), for kr = 8, 16 . 9

3.1 Flowcharts used to show the steps of procedure for generating spatial
audio signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Uniform distribution Cubic Array with side length L = 0.1 m, dis-
tance between two adjacent points d = 0.0167 m, number of grid
points N = 343 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Uniform distribution Cubic Array with side length L = 0.1 m, number
of points on axis Naxis = 7, number of grid points N = 343 . . . . . . 16

3.4 An example of the Condition number of a Uniform distribution Cubic
Array with side length L = 0.1 m, number of points on axis Naxis = 7,
number of grid points N = 343. . . . . . . . . . . . . . . . . . . . . . 19

3.5 An example of the Condition number plot in normalized frequency
of a Uniform distribution Cubic Array with side length L = 0.1 m,
number of points on axis Naxis = 7, number of grid points N = 343. 20

4.1 Condition number of Different Grid Size Cubic Arrays with number
of points on axis Naxis = 7, number of grid points N = 343. . . . . . 23

4.2 Condition number(Normalized Frequency) of Different Grid Size Cu-
bic Arrays with number of points on axis Naxis = 7, number of grid
points N = 343. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.3 Condition number of Different number of point on axis Cubic Arrays
with grid side length L = 0.1 m . . . . . . . . . . . . . . . . . . . . . 25

4.4 Condition number of Different distribution Cubic Arrays with grid
side length L = 0.1 m, number of points on axis Naxis = 7, number of
grid points N = 343. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5 Average of Condition Number between 1000 Hz and 5000 Hz of Cubic
Arrays with grid side length L = 0.1 m . . . . . . . . . . . . . . . . . 26

4.6 Human anatomy planes . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.7 Deviation of the output depends on the incidence direction (dB)(Transverse,

right ear) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.8 Deviation of the output depends on the incidence direction (dB)(Frontal,

right ear) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.9 Deviation of the output depends on the incidence direction (dB)(Sagittal,

right ear) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

xv


List of Figures

4.10 Relatively deviation of the output depends on the incidence direction
Cv = −0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.11 Absolutely deviation of the output depends on the incidence direction
Cv = −0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.12 Relatively deviation of the output depends on the incidence direction
Cv = 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.13 Absolutely deviation of the output depends on the incidence direction
Cv = 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.14 Deviation speed of the output depends on the coefficient of variation 31
4.15 Deviation of the output depends on the number of grid points (dB)

with grid side length L = 0.1 m . . . . . . . . . . . . . . . . . . . . . 32
4.16 Deviation of the output depends on the impulse length. Grid info:

Side length L = 0.1 m, Number of points on axis Naxis = 7, Number
of grid points N = 343. . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.17 Deviation speed of the output depends on the impulse length. Grid
info: Side length L = 0.1 m, Number of points on axis Naxis = 7,
Number of grid points N = 343. . . . . . . . . . . . . . . . . . . . . 33

4.18 Deviation of the output depends on the grid size (dB). Grid info:
Number of points on axis Naxis = 7, Number of grid points N = 343. 34

4.19 Deviation speed of the output depends on the grid size (dB). Grid
info: Number of points on axis Naxis = 7, Number of grid points
N = 343. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

xvi


List of Tables

4.1 Computaion time t for NIR = 2048, 1024, . . . 8. . . . . . . . . . . . . . 32

xvii


List of Tables

xviii


1
Introduction

1.1 Background

Acoustic simulation has a wide range of uses, such as fixing acoustic defects in a
room or for virtual reality use. There are many ways to perform acoustic simulations.
Diffused field equations (Sabine/Eyring), Ray/particle tracing, and Wave simulation
are more commonly used. The diffuse field equations give only a fairly rough result
for the sound field. Ray/particle tracing is more accurate than diffuse field equations,
but it is a high-frequency approximation of a sound field. Besides, diffraction is not
easy to simulate using this method. Wave simulation is more accurate than the
previous two, as it takes wave diffraction into account. It validates in the entire
frequency range, but its computational cost is enormous, which means it usually
takes a lot of time. With the development in processor performance and parallel
computing, it has become feasible to use wave simulation.

The Finite Difference Time Domain (FDTD) algorithm is a useful tool for our wave
simulations[1]. Since the FDTD method produces results for each time step in the
computation process, it enables the real-time auralization of the sound field. In the
actual simulation, we need to specify the shape of the house and the location of some
of the main reflective surfaces, and for auralization, we should specify the location
of the listener. We have many different strategies to achieve auralization. One can
choose an audio signal to be used as the source for the sound field simulation so
that what is obtained is the result of this audio signal after it has propagated in the
sound field. Another alternative is to measure and calculate the binaural impulse
response and convolve it at a later stage with the original audio signal, which was
measured in the free field.

A human head model can be placed directly at the listener’s position during the
modeling phase to obtain this binaural response. One detail to consider when em-
bedding a geometric model of a listener in the grid[2][3][4], however, is the amount of
detail required to model a mannequin compared to the walls or furniture of a house.
A large amount of detail can lead to a significant increase in computing power. An-
other option to obtain this binaural response is to extract the plane-wave component
of the sound field and spatially integrate it with the pre-measured HRTF[5], which
is also the approach used in this thesis.

1


1. Introduction

1.2 Purpose and Goals
The main objective of this study is to investigate the effects of different sampling
grid sizes and other parameters on the generation of spatial signals using plane wave
decomposition. This study also aims to evaluate how parameters change influence
the accuracy, efficiency, and overall performance of binaural signal generation. This
study aims to discover potential optimization strategies by systematically investi-
gating these parameters.
To achieve the purposes mentioned above, this paper sets forth the following specific
goals:

1. Systematically vary the sampling grid size and assess its direct impact on the
condition number and quality of the generated spatial signals. This involves a
comparative analysis of different grid sizes to identify optimal configurations.

2. Investigate the role of parameters such as impulse response length and number
of grid points. This goal focuses on understanding how these parameters can
be tuned to improve signal generation outcomes.

3. Propose potential optimization strategies based on the findings of the study.
4. Summarize the general principles of signal generation using plane wave de-

composition methods. This is to understand the details of signal generation.

2


2
Theory

In this section, this thesis briefly describes the process of auralization in FDTD using
the plane wave decomposition method and summarizes the required formulas. Also,
since we are sampling the sound field using a volumetric grid, which is a special
case of a spherical grid, it is convenient to use spherical harmonics to express the
sampled data and in this way approximate a plane wave density function. Therefore,
the knowledge about spherical harmonics will be mainly introduced in this section.
The main theoretical basis of this thesis comes from the papers of Jonathan Sheaffer
et al.[6] and the book written by Boaz Rafaely[7].

2.1 Wave equation and FDTD method
Before proceeding with a sound field simulation, we should first understand how
to properly describe a sound field. In a homogeneous space, a sound field can be
formulated by the acoustic wave equation[6]:

∇2p(r, t) − 1
c2

∂2

∂t2 p(r, t) = 0 (2.1)

where ∇ is the Laplace operator, p(r, t) is the sound pressure and c is the speed
of sound. The wave equation can be expressed in a Cartesian coordinate or in
a spherical coordinate. When expressed in the spherical coordinate, what r in the
equation represents r = (r, θ, ϕ), where r denotes radial distance, θ denotes elevation
and ϕ is azimuth.
We usually utilize the FDTD method for the simulation of sound propagation. To
utilize the FDTD method, we need to mesh the space and specify the time step for
the solution. Continuous Cartesian coordinate system is discretized as (x, y, z, t) →
[dX, fX, gX, uT ], u here is index position in discrete time and [d, f, g] are index
positions in discrete space. Accordingly, the wave equation can be formulated under
the FDTD method as[6]:[

δ2
t − λ2

(
δ2

x + δ2
y + δ2

z + C1δ
2
xδ2

y

+C1δ
2
xδ2

z + C1δ
2
yδ2

z + C2δ
2
xδ2

yδ2
z

)]
p|ud,f,g = 0

(2.2)

where δ2
D denotes a second-order finite difference operator over the dimension D, λ

here is the Courant number and constants C1 and C2, are chosen according to the
desired finite difference scheme.
In the above, we have briefly described the basic methods for implementing sound
field simulation.

3


2. Theory

2.2 Plane wave density function

The solution of Eq. (2.1) is the distribution of pressure in space over time; after
the wave has traveled through space for a sufficiently long period of time, the sound
field can be expressed as the continuum of plane waves. Therefore, we can intro-
duce the Plane wave density function a (k, θ, ϕ) to represent the sound field, where
k is the wavenumber. However, the sound field obtained here does not consider
the human body’s influence on the sound, so we can use the Head-related transfer
function (HRTF) to post-process this sound field to introduce the influence of the
human body. With the Plane wave density function known, the binaural response
considering the HRTF can be obtained by the following equation[6]:

pl(k) =
∫

(θ,ϕ)∈S2
a(k, θ, ϕ)H l(k, θ, ϕ)dθdϕ (2.3)

where H l(k, θ, ϕ) is an HRTF catalog for the left-ear, pl(k) is the sound pressure at
the left ear. The essence of this formula is a kind of beamforming synthesis. The
question that remains at this point is how to get the Plane wave density function.
However, before describing how to get the plane wave density function, I will go into
more detail about spherical harmonics, as they are an important tool for us to get
the plane wave density function through the microphone array.

2.3 Representation sound field in Spherical Har-
monics

2.3.1 Spherical Harmonics
The nature of the spherical harmonics is a set of basis functions, or more precisely,
the spherical harmonic functions are a set of complete orthogonal functions, and
hence, it is a set of orthogonal basis. Basis function is not an unfamiliar concept;
the sine and cosine functions (equivalently, the complex exponential functions) form
a set of basis known as Fourier basis functions in the function space. Using Fourier
basis functions, we can transform a signal between the time and frequency domains
via the Fourier transform, or alternatively, we can use a linear combination of Fourier
basis functions to represent any other function. Fourier basis functions serve a
very important role in signal processing, and while Fourier basis functions are well
suited for working with time-series data and frequency analysis, they do not directly
provide a mechanism for working with the three-dimensional spatial distribution of
sound. The Fourier transform is primarily concerned with the frequency content
of a signal, not its spatial directionality or distribution. Therefore, the spherical
harmonic function has a unique advantage when dealing with spatial audio signals.
The mathematical expression for the spherical harmonic function is as follows[7]:

Y m
n (θ, ϕ) ≡

√√√√2n + 1
4π

(n − m)!
(n + m)!P

m
n (cos θ)eimϕ, (2.4)

4


2. Theory

where ! represents the factorial function, P m
n (cos θ) are the associated Legendre

functions, m ∈ Z is an integer denoting the function degree and n ∈ Z is a nat-
ural number denoting the function order. The following figure shows the spherical
harmonic functions of orders 0 to 2.

Figure 2.1: Balloon plot of the spherical harmonics for n = 0 (top row) to n = 2
(bottom row)

We introduce a new concept here, the Associated Legendre polynomials. Associ-
ated Legendre Polynomials are a special class of functions of particular importance
in mathematical physics that are derived from Legendre Polynomials, which were
discovered when dealing with the effect of ellipsoids on gravitational potentials.
The Associated Legendre polynomials are derived by differentiating the Legendre
polynomials. Its mathematical expression is as follows[7]:

P m
n (x) = (−1)m

(
1 − x2

)m/2 dm

dxm
Pn(x), x ∈ [−1, 1], (2.5)

where Pn(x) denotes Legendre polynomials, which can be obtained directly by using
the following differential formula[7]:

Pn(x) = 1
2nn!

dn

dxn

(
x2 − 1

)n
, (2.6)

The Legendre polynomials form a complete and orthogonal set of basis functions
over the line section x ∈ [−1, 1].

2.3.2 Spherical Fourier Transform
As mentioned in the previous section, any signal can be expressed in terms of Fourier
basis functions, an operation also known as the Fourier transform. The Spherical

5


2. Theory

Fourier Transform is the analog of the Fourier transform on the sphere, which means
the Spherical Fourier Transform decomposes a function defined on a sphere into a
sum of spherical harmonics, each of which is multiplied by a coefficient that repre-
sents the amplitude of the function at a particular frequency. It is important to note
here that the spherical harmonic function is actually the set of all square-integrable
functions on the unit sphere, which can also be expressed as Y m

n (θ, ϕ) form a basis in
Hilbert space L2(S2). Spherical Fourier Transform works in the following way: given
a function satisfying f(θ, ϕ) ∈ L2(S2), it can be represented by a set of weighted
spherical harmonic functions as[7]:

f(θ, ϕ) =
∞∑

n=0

n∑
m=−n

fnmY m
n (θ, ϕ), (2.7)

where fnm are the weights function. Similar to the Fourier transform, the spherical
Fourier transform has its inverse form, and this inverse form can be used to represent
this weighting function[7]:

fnm =
∫ 2π

0

∫ π

0
f(θ, ϕ) [Y m

n (θ, ϕ)]∗ sin θdθdϕ. (2.8)

2.3.3 Spherical Bessel and Hankel Functions
One piece of the mathematical puzzle that is still missing before we use spherical har-
monics to represent waves, is Spherical Bessel and Hankel Functions. It is actually
a part of the solution of the wave equation in the spherical coordinate system. After
solving the wave equation in the spherical coordinate system using the separated
variable method, the following solution can be obtained[7]:

p(r, t) = jn(kr)Y m
n (θ, ϕ)eiωt (2.9)

or

p(r, t) = hn(kr)Y m
n (θ, ϕ)eiωt, (2.10)

where jn(kr) denotes spherical Bessel functions of the first kind, hn(kr) denotes
spherical Hankel functions of the first kind. The solutions of these two forms(Eq.
(2.9) and Eq. (2.10)) are equivalent.
The spherical Bessel function of the first kind, jn(x), and of the second kind, yn(x)
can be written as [8]:

jn(k) = (−1)nxn

(
1
x

d

dx

)n sin(x)
x

(2.11)

and

yn(x) = −(−1)nxn

(
1
x

d

dx

)n cos(x)
x

. (2.12)

Similarly, it is possible to write out the spherical Hankel functions of the first kind,
hn(x), and the second kind h(2)

n (x),

6


2. Theory

hn(x) = −i(−1)nxn

(
1
x

d

dx

)n
eix

x
(2.13)

and

h(2)
n (x) = i(−1)nxn

(
1
x

d

dx

)n
e−ix

x
. (2.14)

Spherical Bessel function and Spherical Hankel function are related to each other
by

hn(x) = jn(x) + iyn(x) (2.15)

h(2)
n (x) = jn(x) − iyn(x). (2.16)

Spherical Bessel and Hankel functions have two important properties: Orthogonality
and Recurrence Relations. Orthogonality, which means spherical Bessel and Hankel
functions are orthogonal with respect to their order under appropriate weighting.
Recurrence Relations, which means these functions satisfy specific recurrence rela-
tions, which can be used for efficient computation of function values for different
orders.

2.3.4 Using Spherical Harmonics to express wave
Consider a plane wave of unit amplitude of single frequency propagating from the
direction (θk, ϕk), with a wave vector k̃ = −k = (k, θk, ϕk). In a spherical coordinate,
this plane wave can be written as[7]:

p(k, r, θ, ϕ) = e−ik·r = eik̃·r, (2.17)

where r = (r, θ, ϕ), with Spherical Fourier Transform, this wave can be written as
the sum of the spherical harmonics and the spherical Bessel function[7]:

p(k, r, θ, ϕ) =
∞∑

n=0

n∑
m=−n

4πinjn(kr) [Y m
n (θk, ϕk)]∗ Y m

n (θ, ϕ). (2.18)

In Eq. (2.18), a set of spherical harmonic functions of infinite order is used to express
this plane wave, which is clearly impossible in practice. A set of finite-order spherical
harmonics can actually be used to model a plane wave propagating in space[7]:

p(k, r, θ, ϕ) ≈
N∑

n=0

n∑
m=−n

4πinjn(kr) [Y m
n (θk, ϕk)]∗ Y m

n (θ, ϕ), (2.19)

which also introduces truncation errors.
It is now possible to express a single plane wave propagating in space using spherical
harmonics, but as presented in Section 2.2, after the wave has traveled through
space for a sufficiently long period of time, the sound field can be expressed as the
continuum of plane waves. Therefore, we can introduce the Plane wave density
function a(k, θk, ϕk) to represent the sound field. With the plane wave density

7


2. Theory

function, several spherical harmonics can be combined to express the wave formed
by combining multiple plane waves[7].

p(k, r, θ, ϕ) =
∫ 2π

0

∫ π

0
a (k, θk, ϕk) eik̃·r sin θkdθkdϕk

=
∞∑

n=0

n∑
m=−n

4πinanm(k)jn(kr)Y m
n (θ, ϕ)

(2.20)

Just as it is possible to express a single plane wave by a set of finite-order spherical
harmonics, it is also possible to express the case of multiple plane waves synthesized
by a set of finite-order spherical harmonics. The formula is as follows[7]:

p(k, r, θ, ϕ) ≈
N∑

n=0

n∑
m=−n

4πinanm(k)jn(kr)Y m
n (θ, ϕ)

=bn(kr)
N∑

n=0

n∑
m=−n

anm(k)Y m
n (θ, ϕ),

(2.21)

where radial function is given by bn(kr) = 4πinjn(kr). It is important to note that
the plane wave composed by the above equation does not take into account the
scattering of the sound wave after encountering an object, i.e., the scattering of the
sound wave after encountering a human body is not taken into account in this case
of ours. However, we will use HRTF in post-processing to add the effect of the
human body to our binaural response, so this absence is reasonable.

As mentioned earlier, finite-order harmonics can simulate plane waves propagating
in space with truncation errors, which is mainly caused by the high-frequency decay
of the Bessel term in the radial function bn(kr). A figure(Figure.2.2) from Boaz
Rafaely’s book [7] is cited here to illustrate this point.

8


2. Theory

Figure 2.2: Magnitude of the radial function bn(kr) = 4πinjn(kr), for kr = 8, 16

2.4 How to get Plane wave density function by
microphone Array

In the previous section, we briefly described the plane wave density function and how
to use it to synthesize the binaural response with the HRTF, but we still don’t know
how to get it. By looking at Eq. (2.21), we can see that we can perform plane wave
synthesis given the plane wave density function and spherical harmonics. Knowing
that the spherical harmonics are a set of functions that can be pre-calculated if
the distribution of pressure in space is measured by a microphone array, the plane
wave density function can also be calculated accordingly. This operation is known
as plane wave decomposition (PWD).

9


2. Theory

Imagine that Q recievers are arbitrarily placed in a spherical coordinate system,
where the positions of these microphones can be represented as (ri, θi, ϕi), i =
1, ..., Q, i ∈ Z, and the pressure at each receiver is captured and transformed to
the frequency domain using Fast Fourier Transform(FFT), resulting in a frequency-
dependent vector[6],

p = [p1(k), p2(k), · · · , pQ(k)]T , (2.22)
where []T denotes transpose of matrix. According to the previous section, combining
a set of spherical harmonics can represent an arbitrary function. In particular, when
we consider that the sound in the sound field consists of a series of plane waves,
according to Eq. (2.21), this formula can be rewritten in this case as[6]:

p (k, ri, θi, ϕi) ≈
N∑

n=0

n∑
m=−n

anm(k)bn (kri) Y m
n (θi, ϕi) . (2.23)

Eq. (2.23) can be further written in the form of a matrix as[6]:

p = Banm. (2.24)
Since the N th order spherical harmonic has 2N + 1 plane wave coefficients corre-
sponding to it, the first N th order spherical harmonic has a total of (N + 1)2 plane
wave density coefficients corresponding to it, anm is a (N + 1)2 vector representing
the plane wave coefficients of the sound field, which can be written as[6]:

anm =
[
a00, a1(−1), a10, a11, . . . , aNN

]T
. (2.25)

Similarly, combined with the fact that we have Q sampling points the B is a Q ×
(N + 1)2 matrix, which can be written as[6]:

BT

=


b0 (kr1) Y 0

0 (θ1, ϕ1) · · · b0 (krQ) Y 0
0 (θQ, ϕQ)

b1 (kr1) Y −1
1 (θ1, ϕ1) · · · b1 (krQ) Y −1

1 (θQ, ϕQ)
... . . . ...

bN (kr1) Y N
N (θ1, ϕ1) · · · bN (krQ) Y N

N (θQ, ϕQ)

 . (2.26)

From Eq. (2.24) we know that the plane wave density coefficient matrix can be
obtained by simply multiplying the inverse of the B matrix over the left of p matrix.
However, in this thesis, the relationship between the number of sampling points Q
and the order of the spherical harmonic N is as follows:

Q > (N + 1)2 . (2.27)
In this case, we can only look for the inverse of the B matrix in the least squares
sense, thus there is[6]:

anm ≈ Cp = B†p, (2.28)
where C denotes the quadrature coefficients matrix, which transforms sound pres-
sure into plane wave density coefficients matrix. The matrix B† is the Moore-Penrose
pseudoinverse of B, written as[6]:

10


2. Theory

B† =
(
BHB

)−1
BH , (2.29)

where []H denotes the conjugate transpose. It is worth noting that all equations
in this subsection are in the frequency domain. However, this thesis requires the
generation of signals in the time domain, so it is necessary to convert C into impulse
responses of the corresponding order of spherical harmonics by Inverse Fast Fourier
Transform(IFFT) and denoted by cnm. In the time domain, Eq. (2.28) can be
written as:

anm(t) ≈ cnmp(t). (2.30)

Where the p denotes the time-domain signal received by each receiver, the anm(t)
denotes the ambisonic coefficient matrix.
In addition to this, accordingly, Eq. (2.3) can be written in discretized form as[6]:

pl(t) =
∞∑

n=0

n∑
m=−n

ã∗
nm(t) ∗ H l

nm(t), (2.31)

where

ãnm(t) = (−1)m
[
an(−m)(t)

]∗
. (2.32)

an(−m)(t) and H l
nm(t) are the ambisonic coefficients and HRTF of the left ear rep-

resented in the spherical harmonic domain, respectively. ∗ denotes the convolution
operator. The pl(t) is the impulse response at the left ear. An audible binaural sig-
nal can be obtained by introducing raw mono signals at a suitable workflow location,
which is described in the Methods part.
We now have the complete tool for calculating plane wave density coefficients and
a basic understanding of the theoretical foundations needed for this thesis.

11


2. Theory

12


3
Methods

The entire content of this thesis is carried out in Matlab. In this section, I will
describe the implementation of my thesis, the various plotting implementations,
comparison methods, plotting methods in detail, etc.

3.1 General procedure for generating spatial au-
dio signals

To perform the auralization in this thesis, I mainly use the auralization toolkit
written by Jens Ahrens [9]. In this section, I’ll explain the workflow in general.
There are two methods applied in this paper; one of the simpler methods is to
convolve the sound signal directly with the HRTF of the direction from which we
preset the sound signal to come from. Through this method, we can generate a
signal which is considered as the reference result or so-called ground truth.
Another method is to use the approach expressed in the theory section to process
and obtain the signal, which will be used to compare with the reference signal. The
rough workflow is shown in the following flowchart(Figure.3.1):

Figure 3.1: Flowcharts used to show the steps of procedure for generating spatial
audio signals

13


3. Methods

The first step in this method is to generate a spatial grid to determine the sampling
nodes. In fact, the shape of the sampling grid and the grid nodes’ distribution affects
the algorithm’s stability and the quality of the generated results, but this will be
discussed in detail in the subsequent sections.
The second step of this method is to generate the impulse response data at each
sampling grid. In real applications, this data should be simulated using the FDTD
method. Due to the complex modeling and the large number of computations in-
volved in direct sound field simulation using the FDTD method, apart from this,
the primary purpose of this thesis is to investigate the effect of different sampling
grids and parameters not related to the FDTD method on the computational speed
and the quality of the results; as a result, this thesis will not directly use FDTD
method. Without using the FDTD method, this thesis determines the impulse re-
sponse data at each grid node by simulating a plane wave coming from a specific
direction directly.
The third step in this thesis is to compute the quadrature matrix, as the quadrature
matrix is only related to the sample grid itself, which can be known from Eq. (2.26);
this step can actually be done in parallel with the second step.
The fourth step is calculating ambisonic coefficients, which can be obtained by using
the IIFT on plane wave density coefficients. The wave density coefficients can be
obtained by multiplying the Moore-Penrose pseudoinverse of the quadrature matrix
on the left side of the matrix of the sampled signals (in this case, a set of impulse
responses) at each grid node.
The fifth part introduces an audio signal and converts it into an ambisonic signal.
For auralization, this thesis introduces an audio sound signal as the original audio.
The original signal needs to be convolved with ambisonic coefficients to produce an
ambisonic signal.
In step 6, we introduce the HRTF, as described in Eq. (2.31). This step involves
re-expressing the HRTF in terms of spherical harmonics; the HRTF measurement
is achieved through a process where a loudspeaker is placed in various positions
on a sphere, and sound is played, followed by measuring the signal received by
the listener’s binaural at the center of the sphere. The impulse response is then
calculated based on this data. And these points where the speakers are placed are
predefined, which follows Lebedev grid. Thus, the weights of the HRTFs obtained
by sampling in this way are known conditions when transformed into the spherical
harmonic domain.
Above is the general process for generating spatial audio signals. Although gener-
ating audio signals using the above process is not complicated, there are still many
details worth discussing.

3.2 Adjustment of the shape of the sampling grid
The grid-choosing scheme influences the calculations’ stability and the results’ qual-
ity. The following figure demonstrates one of the main grid types used in this thesis,
the cubic grid array(Volumetric grid). The default grid type used in this paper is a
uniformly distributed cubic grid.

14


3. Methods

Figure 3.2: Uniform distribution Cubic Array with side length L = 0.1 m, distance
between two adjacent points d = 0.0167 m, number of grid points N = 343 .

The different colors of the points in the figure are to show the distance of the different
grid points from the origin.

The code to create a uniformly distributed cubic grid is relatively simple:

1 L = 0.1; %side length
2 dx = 0.1/6; % distance between two adjacent points
3 spatial_axis = -L/2 : dx : L/2; % m
4 [x, y, z] = meshgrid(spatial_axis, spatial_axis,...
5 spatial_axis);

During subsequent tests, I tried to move the distribution of nodes towards the surface
of the cube based on the uniformly distributed cubic array.

15


3. Methods

Figure 3.3: Uniform distribution Cubic Array with side length L = 0.1 m, number
of points on axis Naxis = 7, number of grid points N = 343 .

As you can clearly see in this figure, the gird nodes are distributed closer to the
surface. To accomplish this, I wrote a function:

1

2 function [x, y, z] = creat_grid(Number_of_points_on_the_axis, L ,
coefficient_of_variation)

3

4 switch nargin
5

6 case 3
7

8 Number_of_gaps_on_half_axis = (Number_of_points_on_the_axis-1)/2;
9

10 dx_mean = L/(Number_of_points_on_the_axis-1); % leads to 343
points

11

12 N_desired = 8; % for prediciting requiered no. of points
13

14 % sampling intervall: inverse of aliasing frequency x c (N = omega
/c r)

15 dx_max = (2*pi * sqrt((L/2)^2 + (L/2)^2) + (L/2)^2) / N_desired;
16 dx_min = dx_max ./ 100;% make sure distance is not zero
17

18 if coefficient_of_variation == 0
19

20 dx = L / (Number_of_points_on_the_axis-1);
21

22 spatial_axis = -L/2 : dx : L/2; % m
23 [x, y, z] = meshgrid(spatial_axis, spatial_axis, spatial_axis);
24

25 % reshape
26 x = x(:).’;
27 y = y(:).’;
28 z = z(:).’;
29 else
30

31 %coefficient_of_variation= -1~+1
32

33 if mod(Number_of_gaps_on_half_axis,2) == 1
34

16


3. Methods

35 variation_of_dx = ((dx_mean-dx_min)./((
Number_of_gaps_on_half_axis-1)./2)).*
coefficient_of_variation;

36

37 else
38

39 variation_of_dx = ((dx_mean-dx_min)./((
Number_of_gaps_on_half_axis)./2)).*coefficient_of_variation
;

40

41 end
42

43 variation_of_dx_vector = (dx_mean-dx_min).*
coefficient_of_variation* (-1) : variation_of_dx : (dx_mean-
dx_min) *coefficient_of_variation;

44 dx = variation_of_dx_vector + dx_mean;
45 spatial_axis = zeros(size(variation_of_dx_vector));
46

47 for i = 1:1:length(variation_of_dx_vector)
48

49 spatial_axis(i) = sum(dx(1:i)); % m
50

51 end
52

53 x = [];
54 y = [];
55 z = [];
56

57 for i = 1:1:length(spatial_axis)
58

59 % Define the side length and resolution of the cube
60 sideLength = spatial_axis(i);
61 resolution = 2*i+1;
62

63

64 % Initialize the coordinates array
65 cubeShell = [];
66

67 % Generate grid points
68 [X, Y, Z] = meshgrid(linspace(-sideLength, sideLength,

resolution));
69

70 % Add points for each face of the cube
71 % Top and bottom faces (excluding the already added boundary

points)
72 cubeShell = [cubeShell; [reshape(X(:,:,1), [], 1), reshape(Y

(:,:,1), [], 1), reshape(Z(:,:,1), [], 1)]];
73 cubeShell = [cubeShell; [reshape(X(:,:,end), [], 1), reshape(Y

(:,:,end), [], 1), reshape(Z(:,:,end), [], 1)]];
74

75 % Left and right faces (excluding the already added boundary
points)

76 cubeShell = [cubeShell; [reshape(X(1:end,1,2:end-1), [], 1),
reshape(Y(1:end,1,2:end-1), [], 1), reshape(Z(1:end,1,2:end
-1), [], 1)]];

77 cubeShell = [cubeShell; [reshape(X(1:end,end,2:end-1), [], 1),

17


3. Methods

reshape(Y(1:end,end,2:end-1), [], 1), reshape(Z(1:end,end
,2:end-1), [], 1)]];

78

79 % Front and back faces
80 cubeShell = [cubeShell; [reshape(X(1,2:end-1,2:end-1), [], 1),

reshape(Y(1,2:end-1,2:end-1), [], 1), reshape(Z(1,2:end
-1,2:end-1), [], 1)]];

81 cubeShell = [cubeShell; [reshape(X(end,2:end-1,2:end-1), [], 1)
, reshape(Y(end,2:end-1,2:end-1), [], 1), reshape(Z(end,2:
end-1,2:end-1), [], 1)]];

82

83 % reshape
84 x = [x cubeShell(:,1).’];
85 y = [y cubeShell(:,2).’];
86 z = [z cubeShell(:,3).’];
87

88

89 end
90

91 % reshape
92 x = [x 0];
93 y = [y 0];
94 z = [z 0];
95

96 end
97

98 case 2
99

100 dx = L / (Number_of_points_on_the_axis-1);
101

102 spatial_axis = -L/2 : dx : L/2; % m
103 [x, y, z] = meshgrid(spatial_axis, spatial_axis, spatial_axis);
104

105 % reshape
106 x = x(:).’;
107 y = y(:).’;
108 z = z(:).’;
109

110

111 otherwise
112

113 fprintf(’Wrong␣Input␣Parameter␣Number’)
114

115 end
116

117

118

119 end

A cubic grid can be viewed as a nested set of cubic shells. creat_grid function does
not generate the mesh nodes directly but instead generates the cubic shells one layer
at a time; by adjusting the distance between the shells, creat_grid can make grid
nodes closer to the surface or closer to the center. I use a parameter, the coefficient
of variation Cv, in this code to measure grid variation. The coefficient varies between
−1 and 1; as the coefficient gets closer to 1, the grid points are closer to the center;

18


3. Methods

as the coefficient gets closer to -1, the grid points are closer to the surface.

3.3 The change in matrix stability expressed through
Condition Number

Different gird shape choices directly affect the computational stability of the matrix
in Eq. (2.26); this stability is demonstrated by the condition number. The condition
number of a matrix is an important concept in numerical linear algebra, used as a
measure of the sensitivity of the inverse of a matrix. In brief, the condition number
measures the extent to which the output result may change when the change in a
matrix is small. A higher condition number indicates that the matrix is closer to
singular, and the numerical stability of its inverse is poorer, i.e., a small input error
may lead to a large error in the output. When the condition number is low, the
matrix is considered well-conditioned, meaning its inverse is numerically stable.
The following figure illustrates the condition number of a uniformly distributed cubic
array with edge length L = 0.1 m, with Naxis = 7 points on the axis and N = 343
grid points.

2000 4000 6000 8000 10000 12000 14000

Frequency (Hz)

100

101

102

103

C
o
n
d
it
io

n
 n

u
m

b
e
r

Condition Number (Evenly distributed)

N
axis

 = 7

Figure 3.4: An example of the Condition number of a Uniform distribution Cubic
Array with side length L = 0.1 m, number of points on axis Naxis = 7, number of
grid points N = 343.

Another valuable way to show condition numbers is to normalize the frequency axis
in Fig. 3.4. This normalization is done in the following way:

kNormalized = kr = πfL

c
. (3.1)

An example figure is given here.

19


3. Methods

2 4 6 8 10 12 14 16

Normalized Frequency

100

101

102

C
o
n
d
it
io

n
 n

u
m

b
e
r

Condition Number (Evenly distributed)

N
axis

 = 7

Figure 3.5: An example of the Condition number plot in normalized frequency of
a Uniform distribution Cubic Array with side length L = 0.1 m, number of points
on axis Naxis = 7, number of grid points N = 343.

The above two Figures (Fig.3.4 and Fig.3.5) have expanded the frequency range of
the frequency axis for demonstration purposes. In the results section, the frequency
range of the frequency axis is set based on the following:

1. The lowest frequency depends on the size of the sampling grid, which is stan-
dardized to be at least half a wavelength that should fit inside the grid, cor-
responding to my default sampling grid with a 0.1 m side length, which has a
minimum frequency of 118 Hz. For simplicity, 100 Hz is used in the result part
as the lowest frequency in the plots.

2. The highest frequency depends on the order N of the spherical harmonics used
to calculate the quadrature matrix. Since the Bessel term decays sharply at
kr > N , the sound field reconstruction from the spherical harmonics becomes
inaccurate at kr > N . With the geometry of the sampling grid known, we can
calculate the corresponding maximum frequency, which in this case is 8734 Hz.
For simplicity, 10 000 Hz is used in the result part as the highest frequency in
the plots.

3.4 Compare results generated by different meth-
ods

As mentioned earlier, this thesis takes two approaches to generate binaural signals;
the one obtained by directly convolving the original audio signal with the corre-
sponding directional HRTF is the reference signal, which we consider to be the
expected signal, while the signal obtained by the other approach is the experimen-
tal signal. In this thesis, I will simultaneously compare the experimental signals
of several directions with the corresponding reference signals, and this comparison
occurs in the frequency domain. I used a scheme similar to spectrogram for this
comparison, where I replaced the time domain with an angle-based spatial domain
compared to spectrogram.

20


3. Methods

In this comparison, two terms are used in this paper to express differences, with the
following meanings:

1. Deviation of output(DO): Used to indicate the difference between two signals
under the same conditions. In general, if not otherwise specified(Relatively,
Absolutely), the term is calculated as follows:

DO = 20 log10
SigPWD (k)
SigHRTF (k)dB. (3.2)

2. Deviation speed of output(DSO): Used to indicate the difference between
two signals before and after a parameter change when a parameter changes
gradually. In general, if not otherwise specified, the term is calculated as
follows:

DSO = 20 log10
Sig1PWD (k)
Sig2PWD (k)dB. (3.3)

It is easy to see that in both of the above comparisons, when the result is closer to
0 dB, the more similar the two signals involved in the comparison.

21


3. Methods

22


4
Results

This paper tests the effect of changes in various parameters on the algorithm, but
there are two leading indicators in it; one is the change in the condition number
of the matrix mentioned in Eq. (2.26), which, as the most critical matrix in this
thesis, has a crucial impact on the overall stability of the algorithm. The other is
the effect of parameter changes on the binaural signals, which is directly related to
the quality of the auralization. Therefore, this paper will describe the results from
two aspects mentioned above.

4.1 Impact on the Condition Number

4.1.1 Influence caused by different grid size
In this paper, I have tested the condition number for different grid sizes to under-
stand the principle behind it. The size of the grid is gradually increased from 0.1 m
to 0.15 m in steps of 0.01 m. The result is shown below:

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Frequency (Hz)

100

101

102

103

C
on

di
tio

n 
nu

m
be

r

Condition Number (Evenly distributed)

L = 0.1
L = 0.11
L = 0.12
L = 0.13
L = 0.14
L = 0.15

Figure 4.1: Condition number of Different Grid Size Cubic Arrays with number
of points on axis Naxis = 7, number of grid points N = 343.

It is clear from the Figure.4.1 that the condition number decreases gradually from
low to high frequencies at each grid size and becomes unstable from about 6000 Hz
onwards and fluctuates but keeps at a low level. The high condition number at low

23


4. Results

frequencies can be explained by the fact that the size of the sampling grid is relatively
smaller than the wavelength of the low-frequency waves, leading to inaccuracies in
the sampling. The fluctuations in the condition number at high-frequency can also
be explained in terms of relative wavelength, i.e., at high frequencies, the grid node
spacing is significantly larger than the wavelength, leading to the appearance of
spatial aliasing effects. As a result, we can observe regular fluctuations at high
frequencies, and the spatial aliasing effect is most pronounced when the grid nodes
are spaced just enough to accommodate an integer number of high-frequency waves.
This is also mentioned in the paper by Gilles Chardon[10] et al.
The plot obtained under regular frequency scaling does not intuitively observe the
relation between the behavior of the condition numbers and the various grid sizes,
so this Figure.4.1 was replotted at normalized frequencies as follows.

1 2 3 4 5 6 7 8 9 10 11 12

Normalized Frequency

100

101

102

103

C
on

di
tio

n 
nu

m
be

r

Condition Number (Evenly distributed)

L = 0.1
L = 0.11
L = 0.12
L = 0.13
L = 0.14
L = 0.15

Figure 4.2: Condition number(Normalized Frequency) of Different Grid Size Cubic
Arrays with number of points on axis Naxis = 7, number of grid points N = 343.

At normalized frequencies, it appears that only one line is plotted in the Figure.4.2,
but in fact, the condition number curves of different grid sizes overlap almost per-
fectly. The almost perfect overlap of the multiple curves illustrates that the pro-
portional scaling of the sampling grid has a proportional effect on the condition
number. Another piece of information to note here is that multiple curves reach
their minimum at the same time when the normalized frequency is around 8, which
is a good indication that the Bessel term is only able to reconstruct the sound field
accurately at kr < N , as it decays significantly at kr > N .

4.1.2 Influence caused by different number of points inside
the grid

In this paper, I test the effect of different numbers of grid points on the condition
number; here, I denote the number of nodes located on the center axis of the sam-
pling grid by Naxis, and the total number of sampling points for this grid is N3

axis.
In performing this test step, the total number of nodes in the grid follows Eq. (2.27)

24


4. Results

to ensure that the data can support at least a spherical harmonic transform of order
8.
Unlike the effect of different grid sizes on the condition number, the number of
grid nodes not only affects the performance of the condition number in the high-
frequency part but also greatly influences its behavior in the low-frequency part.
The following Figure.4.3 shows the effect of different numbers of grid nodes on the
condition number for ordinary frequency scaling (left) and normalized frequency
scaling (right).

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Frequency (Hz)

100

101

102

103

104

C
on

di
tio

n 
nu

m
be

r

Condition Number (Evenly distributed)

N
axis

=5

N
axis

=6

N
axis

=7

N
axis

=8

N
axis

=9

1 2 3 4 5 6 7 8 9 10 11 12

Normalized Frequency

100

101

102

103

104

C
on

di
tio

n 
nu

m
be

r

Condition Number (Evenly distributed)

N
axis

=5

N
axis

=6

N
axis

=7

N
axis

=8

N
axis

=9

Figure 4.3: Condition number of Different number of point on axis Cubic Arrays
with grid side length L = 0.1 m

From the Figure.4.3, it can be seen that when Naxis = 5, i.e., N = 125, its corre-
sponding condition number curve is significantly higher than the other curves at all
frequencies. At high frequencies, this phenomenon can be interpreted as severe spa-
tial aliasing due to sampling points that are too sparse. The high condition number
at Naxis = 5 suggests that it is unsafe to follow Eq. (2.27) for grid point selection
strictly. When Naxis ≥ 6, the curves maintain a similar behavior at low frequencies
but vary at high frequencies. However, it can still be conjectured that their lowest
point is associated with the Bessel term. Overall, more sampling points ensure the
matrix’s stability at high frequencies.
Also, under the normalized frequencies scaled axis, it is clear that the conditional
number curves for different grid node numbers have different behavioral patterns.

4.1.3 Influence caused by different distribution of points in-
side the grid

In this thesis, I design a method to change the distribution pattern of the nodes
inside the mesh, the specific principle of which has already been stated in the Method
section, so I will not go into too much detail here. However, it is still important to
emphasize here that the coefficient of variation Cv ∈ [−1, 1]; the closer Cv is to 1,
means that the nodes inside the grid are denser in the central part. Vice versa, the
closer Cv is to -1, which means that the nodes inside the grid are closer to the grid
surface.

25


4. Results

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Frequency (Hz)

100

101

102

103

C
on

di
tio

n 
nu

m
be

r

Condition Number of different coefficient of variation

C
v
=-0.8

C
v
=-0.4

C
v
=0

C
v
=0.4

C
v
=0.8

1 2 3 4 5 6 7 8 9 10 11

Normalized Frequency

100

101

102

103

C
on

di
tio

n 
nu

m
be

r

Condition Number of different coefficient of variation

C
v
=-0.8

C
v
=-0.4

C
v
=0

C
v
=0.4

C
v
=0.8

Figure 4.4: Condition number of Different distribution Cubic Arrays with grid
side length L = 0.1 m, number of points on axis Naxis = 7, number of grid points
N = 343.

It can be seen that when the sampling points are closer to the surface of the grid,
there is a slight improvement in the performance of the condition number at low
frequencies, but it is tiny. As shown in the right picture of Figure.4.4, it can be seen
that at high frequencies, the uneven sampling approach has some improvement on
the fluctuations in the condition number, which can be interpreted as an effective
reduction of spatial aliasing by the uneven spacing of the grid points. Overall, a
sampling approach in which the distribution of grid points is close to the grid’s
surface is more favorable than a distribution in which the grid points are closer to
the center. This is also demonstrated by the following figure, which plots the change
in the average of the condition numbers over frequencies from 1000 Hz to 5000 Hz,
which Naxis goes from 5 to 9 for the three distributions of grid: the grid points closer
to the surface Cv = −0.8 and uniformly distributed, the grid points closer to the
center Cv = 0.8.

5 5.5 6 6.5 7 7.5 8 8.5 9
40

60

80

100

120

140

160

180

200

220

240
Average of Condition Number between 1000 Hz and 5000 Hz

Center
Edge
Normal

Figure 4.5: Average of Condition Number between 1000 Hz and 5000 Hz of Cubic
Arrays with grid side length L = 0.1 m

Figure.4.5 shows that the distribution of grid points closer to the surface of the cubic
grid is somewhat better for the condition number when Naxis > 5.

26


4. Results

4.2 Results generated by different parameters

In this section, I will show the differences between the audio signals. These differ-
ences are not limited to the differences between the audio signals generated by the
PWD-based method and those generated by the HRTF-based method; they also in-
clude the difference between the audio signals generated by the PWD-based method
under different parameter conditions and the rate of change of these differences.

4.2.1 Difference between ground truth output and the bin-
aural output of three Anatomy planes

In this section, this paper first investigates the difference between ground truth out-
put and the binaural output of three Anatomy planes. The three so-called anatom-
ical planes of the human body are the Transverse Plane, the Sagittal Plane, and the
Frontal Plane. The study of this subsection is conducted to understand the spatial
performance characteristics of the audio signals generated by the PWD method; the
three planes were selected based on the fact that they are intuitive as the charac-
teristic planes of the human body. The three planes assumed in this thesis can be
seen in the following Figure.4.6.

Figure 4.6: Human anatomy planes

The three sets of comparisons shown in this subsection all take place with the grid
size of 0.1 m and grid points number of 343 points uniformly distributed. The signals
compared are binaural signals of the right ear. I first compared the signal differences
in the transverse plane, i.e., the horizontal direction, as shown below in Figure.4.7.

27


4. Results

Figure 4.7: Deviation of the output depends on the incidence direction
(dB)(Transverse, right ear)

Figure.4.7 shows that the Difference between ground truth output and the binau-
ral output is not significant in the low frequencies range of about 50 Hz to 2000 Hz
and gradually increases at higher frequencies. From about 2000 Hz, this difference
demonstrates a clear directional dependence. The figure shows that when the sound
comes from the left side of the body, the difference of the signal at the high frequency
is that the high-frequency component of the signal generated by the PWD-based
method is significantly more than that of the HRTF ground truth signal. When
the sound comes from the right side of the body, the signals generated by the two
methods are very similar in frequency distribution, which indicates that the signal
generated by the PWD-based method is very close to the real situation. In other
directions, the PWD-based method’s high-frequency components of the signals are
significantly less than those of the HRTF ground truth signal. A possible reason
is that in the real world, low-frequency waves can propagate well through diffrac-
tion to the back of the occlusion, but high-frequency components cannot effectively
propagate through diffraction to the back of the occlusion. In my thesis, although
HRTF is involved in signal generation based on the PWD method, it still cannot
reproduce the diffraction well in the actual situation. Thus, the high-frequency com-
ponent can only be well simulated if the sound comes from the right and does not
travel to the right ear in a diffractive manner. However, this difference may not be
audibly significant since the human ear is not as sensitive to high-frequency sounds
as around 1000 Hz. It is still important to increase the frequency of the boundaries
at which high-frequency differences begin to occur.
Next, I calculated and plotted the difference in the sound signals as they came from
each direction in the Frontal plane.

Figure 4.8: Deviation of the output depends on the incidence direction
(dB)(Frontal, right ear)

28


4. Results

In the Figure.4.8, we can see something similar to the Figure.4.7 occurs at high
frequencies when the right ear becomes the occluded ear. At the same time, I can
reasonably assume that when sound comes from directions of the Sagittal plane, the
high-frequency components of the signal generated by the PWD-based method is
less than those of the ground truth signal because the right ear is not completely
obscured. When the right ear is not completely occluded (e.g., in direction 225◦ of
Figure.4.7 and direction 45◦ top of Figure.4.8), the high-frequency components of
the signal generated by the PWD-based method is less than those of the ground
truth signal is a fixed pattern.
Finally, I plotted the differences in the Sagittal plane of different directions and
displayed them in Figure.4.9 The general pattern is consistent with my predictions.
At the same time, I observed that the reduction of the high-frequency component
is not stable when the sound comes from different directions. A possible reason is
that the frequency range we studied here is close to or even beyond the frequency
range that can be simulated by our preset 8th-order spherical harmonics.
This subsection gives us a general idea of the distribution of the difference between
ground truth output and the binaural output over space. The distribution of this
difference in the Transverse plane is highly representative, and therefore, in the
subsequent sections, I will choose only this one plane to study.

Figure 4.9: Deviation of the output depends on the incidence direction
(dB)(Sagittal, right ear)

4.2.2 Influence caused by different distribution of points in-
side the grid

In the section on the condition number, we investigated the effect of different grid
point distributions on the condition number based on the changing coefficient of
variation. The conclusion is that by making the distribution of grid points closer
to the surface, we can improve the performance of the condition number. In this
subsection, I will investigate the effect of change in the coefficient of variation on
the generated binaural signals.
I tested the performance when the coefficient of variation Cv = −0.5 and plotted
the following figure. Here, it is necessary to explain what the words ’relatively’ and
’absolutely’ represent in the figure.

1. The term ’relatively’ indicates that this figure compares signals generated by
the PWD-based method, but one set of signals is generated when the grid
points are non-uniformly distributed, and the other set used as a reference is

29


4. Results

generated when the grid points are uniformly distributed.

2. The term ’absolutely’ indicates that the figure compares the difference between
the signal generated by the PWD-based method and the HRTF ground truth
signals.

Figure 4.10: Relatively deviation of the output depends on the incidence direction Cv =
−0.5

Figure 4.11: Absolutely deviation of the output depends on the incidence direction Cv =
−0.5

As shown in the Figure.4.10, changing the coefficient of variation does not signifi-
cantly affect the results. Notably, there is a visible reduction in the high-frequency
component of the sound signal at an angle of sound incidence of 90◦, i.e., when the
right ear becomes the occluded ear, which is a good phenomenon. This is also shown
in the article written by Jens Ahrens[11]: when using spherical surface sampling and
cubical surface sampling, the signal of the occluded ear is closer to the ground truth
signal at high frequencies. In short, volumetric sampling with internal point uniform
distribution is not a good sampling scheme.

As can be seen from the Figure.4.11, even if there is some improvement in the high-
frequency part of the signal of the occluded ear, this improvement is not significant.

I also investigated the case when Cv = 0.5 and got results similar to those in Fig-
ure.4.10 and Figure.4.11. The details are shown in Figure.4.12 and Figure.4.13.

30


4. Results

Figure 4.12: Relatively deviation of the output depends on the incidence direction Cv = 0.5

Figure 4.13: Absolutely deviation of the output depends on the incidence direction Cv = 0.5

In order to further investigate the effect of the changing coefficient of variation on
the signal, I chose the 90◦ as the sound incidence angle, investigated the sound
signals with different coefficients of variation, and obtained the rate of change of the
signal by dividing the signals. The result is shown below.

Figure 4.14: Deviation speed of the output depends on the coefficient of variation

Figure.4.14 illustrates almost no change in the signal by varying the coefficient of
variation when using 90◦ as the sound incidence angle. However, if one looks closely
at the high-frequency part of the figure, one can still see that the variation shows a
uniformly increasing or decreasing trend(deviation speed constantly greater than or
less than 0) in some very narrow frequency bands. This means that when the grid’s
points distribution is very extreme (too close to the surface or too concentrated in
the center), the change in distribution still significantly affects the high-frequency
part of the spectral characteristics.

4.2.3 Influence caused by different grid nodes number
In this section, I will investigate the effect of the number of grid nodes while keeping
the mesh size constant. I compared the difference between signals generated by the

31


4. Results

PWD-based method and the ground truth when the number of grid nodes gradually
increased from 53 = 125 to 103 = 1000. Here, 45◦ is chosen as the sound incident
angle. The results are as follows.

Figure 4.15: Deviation of the output depends on the number of grid points (dB)
with grid side length L = 0.1 m

Earlier, I investigated its effect on the condition number and came to the conclusion
that one should avoid the number of grid nodes being too close to the theoretical
lower bound given by the formula Q > (N +1)2, where N stands for the order of the
spherical harmonic we choose when try to calculate quadrature matrix. However, in
Figure.4.15, from a frequency point of view, the signal is still acceptable when the
number of grid nodes is 125, and the high-frequency improvement from increasing
the number of grid nodes is very limited.

4.2.4 Influence caused by the different impulse response
length

While performing earlier tests, I realized that the quadrature matrix is a set of
impulse responses when converted to the time domain. We can preset the length
of that impulse response cnm to greatly affect the computation time to generate a
signal. A shorter impulse response will make a shorter operation time. However,
intuitively, a shorter impulse response will also lead to information loss at low-
frequency, which is determined by the sampling principle. In this thesis, NIR denotes
the impulse response length of cnm. In the table below, I have listed the impulse
response length with its required computation time, which is obtained by averaging
the computational time of the program running.

Table 4.1: Computaion time t for NIR = 2048, 1024, . . . 8.

NIR 2048 1024 512 256 128 64 32 16 8
t(s) 33.22 17.62 10.17 6.2 4.48 3.39 2.88 2.75 2.73

From Table.4.1, one can know that the computational time decreases as the impulse
response length decreases. However, this relationship is not linear; when the impulse
response cnm is already short enough, the time used to call other functions is already
significantly greater than the time reduction resulting from shortening the impulse
response. In This thesis, I choose a very small value(NIR = 16) as the lowest limit

32


4. Results

of the impulse response length to explore the extreme cases; the sound incident
direction is 45◦. The results of the comparison between the signals are shown below.

Figure 4.16: Deviation of the output depends on the impulse length. Grid info:
Side length L = 0.1 m, Number of points on axis Naxis = 7, Number of grid points
N = 343.

When looking at the Figure.4.16 from right to left, it is interesting to note that when
the impulse response length is less than 128, the audio signal generated by the PWD-
based method differs from the ground truth significantly in all frequency bands. This
phenomenon is easy to explain; with a known sampling rate of fs = 48 000 Hz, it can
be estimated that the lowest frequency that can be sampled with an impulse response
length of 128 is about fs = 400 Hz, i.e., the impulse response does not express
information below this frequency, so the low-frequency information is missing. In
addition, if we consider the problem of spatial aliasing at high frequencies, the signal
itself becomes unpredictable in the frequency range where low-frequency loss and
spatial aliasing occur simultaneously.
To show at what point the signal stabilizes, I plotted the deviation speed of the
output depending on the impulse length.

Figure 4.17: Deviation speed of the output depends on the impulse length. Grid
info: Side length L = 0.1 m, Number of points on axis Naxis = 7, Number of grid
points N = 343.

From Figure.4.17, it can be told that the signal is stable only when the impulse
response length is greater than 256. This subsection demonstrates that it is feasible
to shorten the computational time by decreasing the impulse response length. An
excessively long impulse response length does not provide any advantage. Therefore,
it is important to choose the appropriate impulse response length.

33


4. Results

4.2.5 Influence caused by different grid size
In this section, I will discuss the effect of different grid sizes. To show the effect of
grid size on the results, I picked a very aggressive range of mesh sizes, which is from
0.1 m to 4.3 m and sound incident direction is 45 degree.

Figure 4.18: Deviation of the output depends on the grid size (dB). Grid info:
Number of points on axis Naxis = 7, Number of grid points N = 343.

The most conspicuous information in Figure.4.18 is that spatial aliasing occurs at
lower frequently as the grid size increases, creating a very distinct boundary in
the figure. In order to eliminate the effect of other information, I calculated the
deviation rate by dividing the signal produced by the PDW-based method in the
frequency domain. The result is shown below.

Figure 4.19: Deviation speed of the output depends on the grid size (dB). Grid
info: Number of points on axis Naxis = 7, Number of grid points N = 343.

In Figure.4.19, we can clearly observe the pattern in which spatial aliasing occurs.
Except for the lowest frequency at which spatial aliasing occurs, the other frequencies
at which spatial aliasing occurs form several stripes almost parallel to the boundary
formed by the lowest frequency at which spatial aliasing occurs.

34


5
Conclusion

This thesis performs a series of tests to explore the characteristics of the binaural sig-
nals generated by the PWD-based method and explores possible ways to reduce the
amount of calculation. This paper has generated the following conclusions through
the investigations in the above sections:

1. The way the grid is selected affects the results. Volumetric sampling is a
simple, easy-to-use grid that can be changed slightly to produce better results.
For example, spatial aliasing can be greatly avoided by breaking the uniformity
of the mesh nodes. Although this approach may result in a relatively high
condition number at high frequencies, it effectively avoids fluctuations in the
condition number at high frequencies. At the same time, making the grid
nodes closer to the surface of the sampling grid can effectively reduce the
condition number and produce signals with better spectral quality.

2. Reducing the impulse response length used throughout the process is a very ef-
fective way to increase computational speed. Shortening the impulse response
time reduces the time required for computation to one-fifth or even less when
only the spectral quality of the generated signal is evaluated.

3. Although increasing the number of grid nodes while keeping the grid size
constant stabilizes the condition number curve and very effectively improves
the performance of the condition number at high frequencies, a larger number
of grid nodes does not significantly improve the frequency characteristics of
the generated signal. Therefore, it is not advisable to try to improve the signal
quality by increasing the number of grid nodes while ensuring that the number
of grid nodes is higher than the minimum requirement.

However, all the conclusions generated in this paper are based on two criteria: con-
dition number and spectral information. In fact, many audible audio signals were
generated while performing the various tests. Listening experiments are bound to
give more valuable information.

35


5. Conclusion

36


Bibliography

[1] K. Kowalczyk and M. Van Walstijn, “Room acoustics simulation using 3-d
compact explicit fdtd schemes,” Audio, Speech, and Language Processing, IEEE
Transactions on, vol. 19, pp. 34 – 46, 02 2011.

[2] T. Xiao and Q. Huo Liu, “Finite difference computation of head-related transfer
function for human hearing,” The Journal of the Acoustical Society of America,
vol. 113, no. 5, pp. 2434–2441, 2003.

[3] J. Sheaffer, C. Webb, and B. M. Fazenda, “Modelling binaural receivers in
finite difference simulation of room acoustics,” in Proceedings of Meetings on
Acoustics, vol. 19, AIP Publishing, 2013.

[4] P. Mokhtari, H. Takemoto, R. Nishimura, and H. Kato, “Computer simulation
of hrtfs for personalization of 3d audio,” in 2008 Second International Sympo-
sium on Universal Communication, pp. 435–440, IEEE, 2008.

[5] B. Støfringsdal and P. Svensson, “Conversion of discretely sampled sound field
data to auralization formats,” Journal of the Audio Engineering Society, vol. 54,
no. 5, pp. 380–400, 2006.

[6] J. Sheaffer, M. Van Walstijn, B. Rafaely, and K. Kowalczyk, “Binaural re-
production of finite difference simulations using spherical array processing,”
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23,
no. 12, pp. 2125–2135, 2015.

[7] B. Rafaely, Fundamentals of spherical array processing, vol. 8. Springer, 2015.
[8] G. B. Arfken and H. J. Weber, “Mathematical methods for physicists 6th ed.,”

Mathematical methods for physicists 6th ed. by George B. Arfken and Hans J.
Weber. Published: Amsterdam; Boston: Elsevier, 2005.

[9] J. Ahrens, “auralization-toolbox,” 2023. https://github.com/
AppliedAcousticsChalmers/auralization-toolbox, last accessed on
February 25, 2024.

[10] G. Chardon, W. Kreuzer, and M. Noisternig, “Design of spatial microphone
arrays for sound field interpolation,” IEEE Journal of Selected Topics in Signal
Processing, vol. 9, no. 5, pp. 780–790, 2015.

[11] J. Ahrens, “A software tool for auralization of simulated sound fields,” Proc. of
the Institute of Acoustics, vol. 45, Sep. 2023.

37

https://github.com/AppliedAcousticsChalmers/auralization-toolbox
https://github.com/AppliedAcousticsChalmers/auralization-toolbox


Bibliography

38


Department of Architecture and Civil Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden
www.chalmers.se

www.chalmers.se

	List of Acronyms
	Nomenclature
	List of Figures
	List of Tables
	Introduction
	Background
	Purpose and Goals

	Theory
	Wave equation and FDTD method
	Plane wave density function
	Representation sound field in Spherical Harmonics
	Spherical Harmonics
	Spherical Fourier Transform
	Spherical Bessel and Hankel Functions
	Using Spherical Harmonics to express wave

	How to get Plane wave density function by microphone Array

	Methods
	General procedure for generating spatial audio signals
	Adjustment of the shape of the sampling grid
	The change in matrix stability expressed through Condition Number
	Compare results generated by different methods

	Results
	Impact on the Condition Number
	Influence caused by different grid size
	Influence caused by different number of points inside the grid
	Influence caused by different distribution of points inside the grid

	Results generated by different parameters
	Difference between ground truth output and the binaural output of three Anatomy planes
	Influence caused by different distribution of points inside the grid
	Influence caused by different grid nodes number
	Influence caused by the different impulse response length
	Influence caused by different grid size


	Conclusion