The Paraspax method applied on loud- speaker arrays Loudspeaker array-based synthesis of varying spaces includ- ing an investigation on how the sound field changes at different position within the array Master’s thesis in Master Program Sound and Vibration HANNA PERSSON Department of Architecture and Civil Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Master’s thesis ACEX30 Gothenburg, Sweden 2022 Master’s thesis ACEX30 The Paraspax method applied on loudspeaker arrays Loudspeaker array-based synthesis of varying spaces including an investigation on how the sound field changes at different position within the array HANNA PERSSON Department of Architecture and Civil Engineering Division of Applied Acoustics Audio Technology Group Chalmers University of Technology Gothenburg, Sweden 2022 The Paraspax method applied on loudspeaker arrays Loudspeaker array-based synthesis of varying spaces including an investigation on how the sound field changes at different position within the array HANNA PERSSON © HANNA PERSSON, 2022. Supervisor: Jens Ahrens, Division of Applied Acoustics Examiner: Jens Ahrens, Division of Applied Acoustics Department of Architecture and Civil Engineering Division of Applied Acoustics Audio Technology Group Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 31 772 1000 Department of Architecture and Civil Engineering Gothenburg, Sweden 2022 iv The Paraspax method applied on loudspeaker arrays Loudspeaker array-based synthesis of varying spaces including an investigation on how the sound field changes at different position within the array HANNA PERSSON Department of Architecture and Civil Engineering Chalmers University of Technology Abstract Binaural room impulse responses (BRIRs) describes the transmission from a sound source to a listeners left and right ear, unlike monaural room impulse responses which only contains one channel and therefore sounds the same to both ears. The first method to get BRIRs of a space is by recordings of a sound source using a dummy head with microphones in each ear for different head orientations and positions in the space. This could be both time consuming and costly and therefore research is trying to find new ways that are more practical and includes signal processing. The Paraspax method is a method for parametric spatial audio rendering with 6 DoF based on one monaural room impulse response. The method encodes monaural and spatial parameters offline into a parametric spatial sound field for arbitrary head orientations and room positions. The most important parameters are the amplitudes of the direct sound and up to 10 early reflections with corresponding times and directions of arrival (TOAs, DOAs). The TOAs are simulated from a reflection detection algorithm and the image source model provides the DOAs. These, together with the rest of the parameters, forms the basis of BRIRs synthesized for audio reproduction using headphones. The work of this thesis contains an extension of the BRIR synthesis into a loudspeaker array-based synthesis where the parametrized direct sound and early reflections are distributed over some loudspeakers arranged over a sphere. The resulting sound field is estimated for a listener positioned at different positions inside the loudspeaker array. The authors of the Paraspax have presented the method for a shoebox-shaped room but it is still unknown how it works for other environments and therefore a handful of room impulse responses will be tested. The thesis will answer what the minimum number of loudspeakers in the loud- speaker array is and how the sound field at different listening positions differs from the sound field created at the center of the array. It will also be shown how some parameters of the loudspeaker array influences the sound. Convolving an anechoic drums audio file with the synthesized sound field created by the loudspeaker array acts for virtually place the listener at different positions in the loudspeaker array and the resulting sound represents how the drums are perceived in the different environ- ments of test. The results obtained by the loudspeaker array containing a reduced number of loudspeakers are analyzed and compared with a 84-loudspeaker array. It will be shown that the loudspeaker array is highly dependent on the simulated DOAs, and especially the azimuth angles as it will appear that the loudspeakers should be placed around the listener. If the DOAs of the early reflections are var- ied enough in azimuth, it shows that three loudspeakers are enough. The sound behaves differently depending on the direction in which the listener moves, but by v increasing the number of loudspeakers or the radius of the array, the listener can generally move more freely with the exception of when the TOA differences between the different loudspeakers being too large. vi Acknowledgements I would like to express my gratitude to my supervisor and examiner Jens Ahrens at the Division of Applied Acoustics at Chalmers. Without you this project would not have been possible. Thank you for proposing this project as a master’s thesis, for your wise and humble words in guiding me through this work, and thank you for your quick response to emails that did not make it a hindrance at all that you were in the US for almost throughout the course of the project. I would also like to send my thanks to Wolfgang Kropp at the Division of Applied Acoustics who was kind enough to lend me his headphones, and also to the other employees of the Division who taught me their knowledge in acoustics and have been very helpful to me during my two years as a master student. I’m thankful to my classmates, and my dear partner and roommate Daniel Hall for acting as a sounding board during the course of the project. I also want to thank Daniel for the great support he gave me during my study period. I want to give my last thanks to my friend Christine Jeppsson just for being there. Hanna Persson, Gothenburg, October 2022 viii x Contents List of Figures xiii List of Tables xvii 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Theory 3 2.1 Echo density profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Image source model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Head related impulse responses . . . . . . . . . . . . . . . . . . . . . 6 3 Methods 7 3.1 Mixing time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Reverberation level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Spectral components . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.1 Reflection detection . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.2 Directions of arrival . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Late reverberation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Early diffuse sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.6 Extension on loudspeaker arrays . . . . . . . . . . . . . . . . . . . . . 20 3.6.1 Listener at the center . . . . . . . . . . . . . . . . . . . . . . . 21 3.6.2 Virtual loudspeaker array . . . . . . . . . . . . . . . . . . . . 23 3.6.3 Change listener position . . . . . . . . . . . . . . . . . . . . . 25 4 Results 27 4.1 At the center of the loudspeaker array . . . . . . . . . . . . . . . . . 27 4.2 Investigation of the sweet spot . . . . . . . . . . . . . . . . . . . . . . 36 5 Discussion 49 5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2 At the center of the loudspeaker array . . . . . . . . . . . . . . . . . 50 5.3 Investigation of the sweet spot . . . . . . . . . . . . . . . . . . . . . . 52 5.4 Influence of different parameters . . . . . . . . . . . . . . . . . . . . . 54 6 Future research 55 xi Contents 7 Conclusion 57 Bibliography 59 A Tested room impulse responses I A.1 Genesis 6 studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I A.2 Trollers gill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II A.3 Maes howe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II A.4 Arthur sykes rymer auditorium . . . . . . . . . . . . . . . . . . . . . III A.5 Koli national park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III A.6 Stairway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III A.7 Hoffman lime kiln . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI A.8 Central hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII A.9 Helsington church . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII A.10 Promenadikeskus concert hall . . . . . . . . . . . . . . . . . . . . . . VIII A.11 Innocent railway tunnel . . . . . . . . . . . . . . . . . . . . . . . . . . X A.12 Falkland palace royal tennis court . . . . . . . . . . . . . . . . . . . . XI A.13 Shrine and parish church of all saints . . . . . . . . . . . . . . . . . . XI A.14 Hamilton mausoleum . . . . . . . . . . . . . . . . . . . . . . . . . . . XIV A.15 Terrys factory warehouse . . . . . . . . . . . . . . . . . . . . . . . . . XIV A.16 Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XV B Results XVII xii List of Figures 2.1 Path of a first-order reflected sound ray from a sound source S to a receiver R using the image source S ′. . . . . . . . . . . . . . . . . . . 5 2.2 Path of a second-order reflected sound ray from a sound source S to a receiver R using the image source S ′′. . . . . . . . . . . . . . . . . . 5 3.1 Calculated echo density profiles using the Paraspax method of three types of RIRs; a tunnel, a semi outside environment and a warehouse of big volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Reverberation level of the Genesis 6 studio estimated by the RMS method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Reverberation level of the Shrine and parish church of all saints esti- mated by the EDC method. . . . . . . . . . . . . . . . . . . . . . . . 11 3.5 Direct sound and early reflections found by the reflection detection algorithm of the Paraspax method in the RIR of the Hamilton mau- soleum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Direct sound and early reflections found by the reflection detection algorithm of the Paraspax method in the RIR of the Helsington church. 13 3.6 Direct sound and early reflections found by the reflection detection algorithm of the Paraspax method in the RIR of the Koli national park. 14 3.7 DOAs of the direct sound and early reflections in azimuth and el- evation of Arthur sykes rymer auditorium, found by the Paraspax method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.8 The inside (left) and the floor plan marked with different source and receiver positions (right) of the Hoffman lime kiln chamber. . . . . . . 16 3.9 DOAs of the direct sound and early reflections in azimuth and eleva- tion of the Stairway, found by the Paraspax method. . . . . . . . . . 17 3.10 DOAs of the direct sound and early reflections in azimuth and eleva- tion of Hoffman lime kiln, found by the Paraspax method. . . . . . . 17 3.11 The early part of the measured RIR of the Helsington church plotted together with the detected direct sound and early reflections (direc- tional part) and the corresponding weighting function. The direc- tional part and weighting function is found by the Paraspax method. 19 3.12 The binaural diffuse reverberation built from binaural white noise and based on the measured RIR of the Helsington church plotted together with the inverse weighting function. . . . . . . . . . . . . . . . . . . . 20 xiii List of Figures 3.13 The directional and diffuse components that forms the early part of the synthesized BRIR of the Helsington church. . . . . . . . . . . . . 21 3.14 A spherical loudspeaker array of 84 loudspeakers positioned at the green dots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.15 Early reflections passed to two different loudspeakers of the array. One loudspeaker is assigned one reflection (left) while another is as- signed three reflections (right). . . . . . . . . . . . . . . . . . . . . . . 23 3.16 Measured HRIRs from a Neumann KU100 artificial head showing how sound is reaching the left and right ear when the sound source is positioned at 90◦ to the left (left) and at 180◦ right behind (right). 24 3.17 Sound travelling from the loudspeakers to a listener positioned at the center of the array (black dot) and to a listener at an arbitrary position (red dot) within a simplified loudspeaker array of four loud- speakers in the horizontal plane. . . . . . . . . . . . . . . . . . . . . . 25 4.1 A loudspeaker array in the vertical plane containing 7 loudspeakers positioned at the green dots right in front of the listener. . . . . . . . 29 4.2 A loudspeaker array in the horizontal plane containing 3 loudspeakers positioned at the green dots around the listener. . . . . . . . . . . . . 30 4.3 Measured monaural room impulse response of Maes Howe. . . . . . . 31 4.4 Loudspeaker array synthesis of Maes howe from a loudspeaker array in the horizontal plane containing 3 loudspeakers (left) and from the 84-loudspeaker array (right). . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Interaural coherence of Maes howe for a loudspeaker array in the hori- zontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker array (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.6 Loudspeaker array synthesis of the Helsington church for a loud- speaker array only changing in the horizontal plane using 3 loud- speakers (left) and its monaural RIR (right). . . . . . . . . . . . . . . 33 4.7 Measured monaural room impulse response of the Helsington church. 33 4.8 Interaural coherence of the Helsington church for a loudspeaker array in the horizontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker array (right). . . . . . . . . . . . . . . . . . . . . . . . 34 4.9 Loudspeaker array synthesis of the Promenadikeskus concert hall for a loudspeaker array in the horizontal plane containing 3 loudspeakers (left) and the 84-loudspeaker array (right). . . . . . . . . . . . . . . . 35 4.10 Interaural coherence of the Promenadikeskus concert hall for a loud- speaker array in the horizontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker array (right). . . . . . . . . . . . . . . . . 35 4.11 DOAs of the direct sound and early reflection of Trollers gill. . . . . . 37 4.12 DOAs of the direct sound and early reflection of the Koli national park. 37 4.13 DOAs of the direct sound and early reflection of the Central Hall. . . 38 4.14 Directions of the listener’s movements in the 3-loudspeaker array. . . 38 4.15 Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array in the horizontal plane at the listener position 1 meter in front of the center (left) and 2 meters in front of the center (right). . . . . . . . . 39 xiv List of Figures 4.16 Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array in the horizontal plane at the listener position 3.5 meter in front of the center (left) and 5.5 meters in front of the center (right). . . . . . 40 4.17 Interaural coherence of Maes howe for a 3-loudspeaker array in the horizontal plane at the listener position 1 meter in front of the center (left) and 3.5 meters in front of the center (right). . . . . . . . . . . . 41 4.18 Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array in the horizontal plane at the listener position 1 meter behind the center (left) and 2 meters behind the center (right). . . . . . . . . . . 41 4.19 Loudspeaker array synthesis of the Genesis 6 studio for a 3-loudspeaker array in the horizontal plane at the sweet spot. . . . . . . . . . . . . 42 4.20 Loudspeaker array synthesis of the Genesis 6 studio for a 3-loudspeaker array in the horizontal plane at the listener position 1.5 meters aside to the left of the center (left) and 3.5 meters aside to the left (right). 43 4.21 Loudspeaker array synthesis of the Genesis 6 studio for a 3-loudspeaker array in the horizontal plane at the listener position 1.5 meters aside to the right of the center (left) and 3.5 meters aside to the right (right). 44 4.22 HRIR from loudspeaker 3 to the listener at the new listener positions 1.5 m aside to the right (left) and 3.5 m aside to the right (right). . . 44 4.23 Loudspeaker array synthesis of Maes howe at the listening position 2 m diagonally in front of the center to the left (left) and at the listening position 2 m diagonally in front of the center to the right (right). . . 45 4.24 Loudspeaker array synthesis for the listener position 3.5 m diagonally forward to the left of the Promenadikeskus concert hall (left) and of Maes howe (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.25 Loudspeaker array synthesis for the listener position 3.5 m aside to the left of the Genesis 6 studio when the radius is decreased to 5 m (left) and increased to 20 m (right). . . . . . . . . . . . . . . . . . . . 47 A.1 Floor plan with source and receiver positions (left) and photo taken from the control room (right) of the Genesis 6 studio. . . . . . . . . . I A.2 The measured valley of Trollers gill (left) and floor plan with source and receiver positions (right). . . . . . . . . . . . . . . . . . . . . . . II A.3 The outside location of Maes Howe (left) with floor plan (right). . . . II A.4 The interior of Maes Howe. . . . . . . . . . . . . . . . . . . . . . . . . III A.5 Floor plan with measurement positions of source and receiver at the Arthur sykes rymer auditorium. . . . . . . . . . . . . . . . . . . . . . IV A.6 The interior of the Arthur sykes rymer auditorium. . . . . . . . . . . IV A.7 The Koli national park at summer. . . . . . . . . . . . . . . . . . . . V A.8 The floor at which measurements were made at the Stairway (left). The floors below the measurement floor (right). . . . . . . . . . . . . V A.9 Floor plan of Hoffman lime kiln with source and receiver positions. . VI A.10 The exterior (left) and interior (right) of the Hoffman lime kiln. . . . VI A.11 Floor plan of the Central hall with source and receiver positions. . . . VII xv List of Figures A.12 The interior of Central hall. The speaker used in the measurements is visible on stage (right). The hall is equipped with bleachers at the back and a bunch of chairs at the front (left). . . . . . . . . . . . . . VII A.13 Floor plan of the Helsington church with source and receiver positions.VIII A.14 The interior of Helsington church. The loudspeaker at the altar (right) and the microphone at position "R6" (left). . . . . . . . . . . . IX A.15 Floorplan of the Promenadikeskus concert hall with source and re- ceiver positions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX A.16 The interior of the Innocent railway tunnel. . . . . . . . . . . . . . . X A.17 Source and receiver positions of the Innocent railway tunnel. . . . . . XI A.18 The interior of the Falkland palace royal tennis court. . . . . . . . . . XII A.19 Floor plan of the Shrine and parish church of all saints together with source and receiver positions. . . . . . . . . . . . . . . . . . . . . . . XII A.20 The interior of the Shrine and parish church of all saints. . . . . . . . XIII A.21 The exterior (left) and the interior, showing the microphone and loud- speaker position (right) of Hamilton mausoleum. . . . . . . . . . . . . XIV A.22 The interior of Terrys factory warehouse including microphone posi- tion (left) and loudspeaker position (right). . . . . . . . . . . . . . . . XV xvi List of Tables 3.1 Window lengths in echo density profile when calculating mixing time for three different measured RIRs. . . . . . . . . . . . . . . . . . . . . 8 4.1 Numbers of loudspeakers required in the loudspeaker array for differ- ent impulse responses. . . . . . . . . . . . . . . . . . . . . . . . . . . 28 A.1 Information about the tested room impulse responses. . . . . . . . . . XVI B.1 Various results for all tested room impulse responses. . . . . . . . . . XVIII xvii List of Tables xviii 1 Introduction Spatial audio can be obtained by convolving audio signals by spatial room impulse resonses (SRIR). SRIRs consists of a spatial description of the sound field in a given room in addition to its monaural parameters. The goal is to create a perceptually plausible virtual environment that is coherent with the real environment. Audio in virtual environments is often used together with other things that require high computer load, such as visuals, and therefore it is important to simultaneously maintaining a low computation time. This can be fulfilled by parametrizing the monaural and spatial parameters offline into a parametric spatial description of the sound field. An example of a method that do this is the Paraspax method [1]. The Paraspax derives monaural and spatial parameters for a 6 degrees of freedom (DoF) virtual environment which corresponds to arbitrary head orientations and room positions of the listener. This parametric description forms the basis of binaural room impulse responses (BRIRs) which are synthesized by a synthesis algorithm of the Paraspax method. The work of this thesis is to use the Paraspax method to create parametric SRIRs for a number of different monaural room impulse responses (RIRs) of dif- ferent environments. Furthermore, the BRIR synthesis in the Paraspax method creates 2-channel (left and right) BRIRs for any head orientation and listener trans- lation applied for head phones and can be used for real-time rendering. Within this work, the BRIR synthesis will instead be extended to a loudspeaker array-based synthesis such that the monaural and spatial parameters of the RIR are distributed over a number of loudspeakers arranged along a sphere that forms the array. The loudspeaker array synthesis is first generated for a listener that is positioned at the center of the array, where the work will investigate the lowest amount of loudspeak- ers required in the array in order to not reduce the quality of the sound field created by the spherical loudspeaker array used as the starting point. In a next step, the listener changes position inside the array and it will be examined to what extent the listener can move from the center before the sound image will change and not sound as good anymore. 1.1 Background The method of the Paraspax [1] has yet only been tested with one monaural omnidi- rectional room impulse response for a shoebox-shaped room of dimensions 11.73 m × 4.74 m × 4.62 m (length × width × height). The floor of the room is of concrete and the walls are plain, where one of the two long sides consists of large glass panes. 1 1. Introduction The reverberation time was measured to 0.9 s. However, to examine any limita- tions of the method, it is of interest to know how the method behaves with varying environments, for example outside environments with low reverberation times that only contributes to a low number of reflections as well as warehouses or churches of high reverberation times. Descriptions of the tested monaural RIRs together with measurements positions of loudspeaker and microphones can be found in Appendix and they are available online at [2] and [3]. In addition, the method have only been used for headphone reproduction but any research on how it behaves with loudspeaker arrays is yet to be done. 1.2 Related works As the method of this thesis is built upon the Paraspax method [1], the presented theory and parametrization of the monaural RIRs presented in Methods before the extension to loudspeaker arrays takes place is taken from the Paraspax method. The Paraspax method is inspired by the Binauralization of omnidirectional room impulse responses (BinRIR) algorithm which is a method for parametric spatial au- dio rendering with 6 DoF [4]. Just as for the Paraspax, only a single measured omnidirectional RIR is required to obtain a set of BRIRs of the room of interest. The process of synthesizing the late reverberation in BinRIR is reused in the Paras- pax and will be explained in detail in Methods. However, listening tests where synthesized BRIRs was compared with measured BRIRs resulted in uncertainties in the method and especially regarding the incidence direction of the early reflec- tions and therefore, the Paraspax has extension and improvements from the BinRIR algorithm. 2 2 Theory 2.1 Echo density profile The mixing time of a room impulse response is the moment in time where the specular part meets the diffuse part. The Paraspax method calculates the mixing time according to Abel et al. [5], where the echo density from a reverberation impulse response is measured. Objects and reflective surfaces in a reverberant environment interacts with sounds to create reflections. These reflections will in turn interact with the environment to create even more reflections. When measuring a RIR, these reflections increases in time until the echo density can be seen statistically, or more specific, the sound pres- sure amplitudes of an impulse response is assumed to have a Gaussian distribution of zero mean with evolving color and level. The echo density profile η(t) is measured over time with the property that once an acoustic space is fully mixed it takes on a Gaussian distribution. Over a sliding reverberation impulse response window of length 2δ + 1 in samples, it is defined as the number of impulse response taps laying outside the standard deviation of the window divided by the expected number of samples lying outside a standard deviation erfc(1/ √ 2) = 0.3173 for a Gaussian distribution η(t) = 1/erfc(1/ √ 2) 2δ + 1 t+δ∑ τ=t−δ 1{|h(τ)| > σ}. (2.1) If the argument of 1{·} is true, it returns one, otherwise zero, h(t) denotes the reverberant impulse response and the window standard deviation is σ = √√√√ 1 2σ + 1 t+δ∑ t=τ−δ h(τ). (2.2) By normalizing by the expected standard deviation of a Gaussian noise, the resulting number of taps outside the standard deviation is a number between 0 and 1. Few prominent reflections separated in time and level will contribute to a larger standard deviation resulting in an echo density profile close to 0. As the reflections occurs more frequently and decreases in amplitude over time, the echo density increases over time. The mixing time is then defined as the time at which the echo density measure reaches 1 the first time. The choice of sliding window lengths affects the echo density profile. Shorter windows are expected to have a high variance about its local mean as it includes 3 2. Theory fewer impulse response taps. The window should be long enough so that it covers a few reflections but short enough to provide good statistics. Impulse responses of closely overlapping reflections should have longer windows so that the variation between the different windows should not be so great. Shorter windows is a good choice for impulse responses having only a few prominent reflections. However, too short window lengths can contribute to jumps in the echo density profile when no reflection is within the window. According to Abel et al. [5] a good choice for window length is between 20 and 30 ms. Moreover, Abel et al. presents time varying window lengths with the idea of having shorter windows at the beginning of the impulse response where the echo density haven’t had time to increase yet and then let the window increase with the increasing echo density. Within work, however, only constant window lengths will be considered. 2.2 Image source model The image source model is used to find the specular reflection pattern from a sound source to a receiver within an enclosure, i.e. a room, and here a shoebox-shaped room will be considered. The method contains some simplifications about the sound field. The sound waves are idealized as sound rays which, when in a homogeneous medium, travels along straight lines and from this diffraction is neglected. In ad- dition, interference is not considered but instead the intensity of the sound field components are added. Sound in rays are perfectly reflected at a boundary, which is not the case in real world where some of the energy is scattered in an omnidirec- tional pattern. As the reflections are repeated, the scattered energy increases until the majority of sound energy is diffuse. For this reason, the image source model should only be used for predicting the early reflections in a room and not the diffuse sound. Let S be a sound source and R be a receiver, both located in a room of plane and smooth walls. The sound energy travels from S at a constant speed of sound along the rays and decreases with the distance attenuation of 1/r2, where r is the traveled distance. Except from the direct sound ray, the rays are reflected in the walls before they hits R. A reflected ray can be seen as originating behind the wall from a virtual sound source S ′ called image source. The image source is mirrored on the line perpendicular to the wall such that the distance between the wall and S is the same as the distance between the wall and S ′, as illustrated in Figure 2.1. In this way, the path of the first-order reflected ray from S to R corresponds to the path from S ′ to R. The intensity of the reflected ray is reduced by a factor 1-α, where α is the absorption coefficient of the wall [6]. If the first-order reflected ray hits some wall a second time, the process above is repeated, creating a second-order image source S ′′. The second-order image source S ′′ is in turn mirrored on the line perpendicular to the first-order image source S ′, as seen in Figure 2.2 The intensity of the resulting second-order reflection is then reduced by a factor (1-α1)(1-α2), where α1 and α2 denotes the absorption coefficients of the first and second wall, respectively. This process can then again be repeated to obtain a third-order reflection and so on. The impulse response is then obtained by summing up signals from the source S and each image source S ′, S ′′,.. 4 2. Theory Figure 2.1: Path of a first-order reflected sound ray from a sound source S to a receiver R using the image source S ′. Figure 2.2: Path of a second-order reflected sound ray from a sound source S to a receiver R using the image source S ′′. 5 2. Theory 2.3 Head related impulse responses Humans cannot pick up sound equally from all directions as the omnidirectional microphone can. How sound is received to us depends on the size and shape of our pinnae, head and upper torso, among others. Moreover, the distance from the source to each ear can differ, the head can shadow the ears differently and the sound waves can either directly enter the ear canal or first get reflected in the torso or pinnae. The resulting modified signal received from a sound source is then processed by our auditory system and gives us the ability to localize the source. The location is estimated in our brain by comparing binaural cues, i.e. cues received by both ears, such as interaural time and level differences. Humans can locate sounds in three dimensions (distance, azimuth and elevation). Azimuth and elevation are angles originating from a spherical coordinate system. The elevation angle describes the vertical angle of a sphere, defined from a fixed zenith at 0◦ and extends to 180◦, while the azimuth is the angle, defined up to 360◦, of the orthogonal projection on a horizontal plane that is orthogonal to the zenith and goes through the origin. Thus, HRIRs relates the location of the source to the location of the ears and can be used to create virtual sound sources. The frequency domain counterpart of the HRIR is called the head related transfer function (HRTF). Sets of HRIRs and HRTFs always comes in pairs, corresponding the left and right ear. BRIRs are obtained either by convolution of a RIR with a set of HRIRs in time domain or by filtering the RIR in frequency domain with the analogous HRTFs set. HRIRs are commonly measured with a dummy head in an anechoic chamber. A dummy head have the shape and size of a human head including pinnae and ear canals where the microphones are placed. In this way the recording is made up from the dummy head’s and especially a human’s perspective. There are a handful of dummy heads which differs in design, for example KEMAR and Neumann KU100. The measurements are performed by rotating the dummy head in the horizontal plane of high resolution in front of a loudspeaker at a constant distance throughout the measurement. The result is a set of HRIRs that corresponds to how humans pick up sound when the source is positioned at various positions around us. In the far field, when the distance between the source and dummy head r is greater than 1 m, the HRIRs is attenuated by a factor 1/r. For distances smaller than 1 m, the measured differences between the left and right ear will increase and it is therefore more common to perform the measurements with at least 1 m between the source and dummy head. 6 3 Methods To run the Paraspax method [1] with any monaural RIR the distance from source to receiver as well as source direction in azimuth and elevation are required. The method got its name from the three keywords parametrization, spatialization and extrapolation that forms the basis of the method. In the first part of the parametriza- tion, following standard monaural room acoustic parameters according to ISO 3382-2 [8] are calculated; reverberation time (RT60, RT30, RT20), early decay time (EDT ), clarity (C80, C50), definition (D50, D80), early decay curve (EDC) and direct-to- reverberant ratio (DRR). The parameters are calculated in both octave bands and broadband spectrum in the frequency range of human hearing (20 Hz - 20 kHz [9]). Then the amplitude and time of arrival (TOA) for the direct sound and early reflec- tions are estimated. As a last step in the parametrization, the reverberation level is calculated, defined as the level of the diffuse sound field at the TOA of the first early reflection. The spatialization sets directions of arrival (DOAs) to the direct sound and early reflections in spherical coordinates (azimuth and elevation). By doing this a 3 DoF sound field for arbitrary head orientations of the listener is yield. The method can be extended to create a sound field of 6 DoF for arbitrary head orientations as the listener moves through the room. This is done in the extrapolation part where the amplitudes, TOAs and DOAs of the direct sound and early reflections are modified corresponding to a virtual space. The result is a BRIR synthesized for headphone reproduction. By extending the method on loudspeaker arrays, the extrapolation part is replaced by a loudspeaker array setup, where the parameterized directional components of the measured RIR are distributed between the loudspeakers according to their respective DOAs. Each loudspeaker signal is then assigned synthesized early diffuse sound and late reverberation. The parts of the method described in section 3.1-3.5 is taken directly from the Paraspax method and shows how a RIR is parameterized and spatialized, followed by how the diffuse reverberation is synthesized. It will also be reported how the Paraspax method behaves for a variety of rooms, that are listed in Appendix. In section 3.6 the extension of the method will be explained in detail, where the parts that are re-used from the Paraspax method are noted. Except for these parts, the extension on loudspeaker array is created from scratch. 7 3. Methods 3.1 Mixing time A RIR is composed of direct sound, early reflections and late reverberation. The direct sound and early reflections are spectral components and appears in the begin- ning of the RIR. The late reverberation that consists only of diffuse sound belongs to the latter part and its start in time is determined by the mixing time. The Paraspax method divides and processes the early part from the late part and connects them once all respective components are found and it is therefore important that the mixing time is predicted exactly in the transition between the directional and diffuse sound field of the RIR. The mixing time is a good predictor if it is estimated after all early reflections. If it instead appears too early in time, some of the early reflections will be part of the diffuse sound. Nor must it be estimated too late in time since the diffuse sound in the early part of the synthesized BRIR will depend on when the mixing time starts, and the same applies to each loudspeaker signal of the loudspeaker array. The echo density profile is calculated for estimating the mixing time and by default, the Paraspax method uses a window length of approximately 21.3 ms. By adjusting the window lengths, the mixing time is shifted, where shorter lengths contributes to earlier mixing times and longer lengths gives later mixing times. Table 3.1 shows the window length and mixing time in milliseconds of three selected monaural RIRs. The first is called the Innocent railway tunnel and is a tunnel previously accommodated with two railway tracks. It is now used for pedestrians and bicycles as the tracks got replaced by paving. The tunnel have a semicircular cross section of dimensions 4.5 m × 6 m (height × width) and it extends as far as 517 m [2]. The Falkland palace royal tennis court is a semi outside environment with the size of 2300 m3 and no roof. Its walls and floor is made of a concrete-like material. The Terrys factory warehouse is an empty industrial building of 4500 m3. Name Window length in ms Mixing time in ms Innocent railway tunnel 42.67 114.88 Falkland Palace Royal tennis court 25 168.56 Terrys factory warehouse 21.3 176 Table 3.1: Window lengths in echo density profile when calculating mixing time for three different measured RIRs. The full list containing the mixing times with respective window lengths of all RIRs is presented in Appendix, showing that the default window length only is ideal for 5 out of 15 tested RIRs. There are no clear pattern between suitable window length and type of RIR, meaning that there are no rule that works in every case. The window lengths used in the end was selected by trial and error so that the estimated mixing time appears in the transition of the early and late part of the RIR. Furthermore, the window lengths changes drastically independent of reverberation time, indicating that the echo density profile measure of a RIR is insensitive to reverberation time, just as stated by Abel and Huang [5]. 8 3. Methods The echo density profile of the three RIRs listed in Table 3.1, is presented in Figure 3.1, where respective mixing times are marked with circles. The mixing times is set to when the echo density reaches 1 the first time, according to Abel and Huang [5]. Figure 3.1: Calculated echo density profiles using the Paraspax method of three types of RIRs; a tunnel, a semi outside environment and a warehouse of big volume. Typical for a reverberation impulse response is a echo density profile starting around 0 and then increasing towards 1. As seen in Figure 3.1, this is the case for all three RIRs, and also for all tested 15 RIRs. Furthermore, the narrow shape of the tunnel allows the sound to be reflected quickly which results in an echo density profile starting above 0 in contrast to the tennis court and warehouse, and especially the warehouse. Its big, empty space causes the sound to travel longer before it gets reflected, but as soon as it does so the space is quickly fully mixed. This is seen in Figure 3.1 as it begins to grow fast. Both the space of the tunnel and the tennis court are equipped with cavities where the sound disappears without being reflected back. This results in a space that never fully gets mixed. This can be seen in the graphs of their respective echo density profile which jumps up and down. This is especially the case for the tennis court which has a larger opened surface compared to the tunnel. The mixing time for the tunnel is estimated earliest in time, followed by the tennis court and then the warehouse. This estimation holds for any measurement setup as the echo density profile measure is independent of measurement setup within the same room. The increase in echo density profile only depends on the room’s shape and volume. 9 3. Methods 3.2 Reverberation level Reverberation level contains information about both directional and diffuse rever- beration. Within the Paraspax there are three different methods for estimating the reverberation level called the MAX, RMS and EDC method. What differentiates the various methods is how the envelope of the absolute pressure response |p| is esti- mated. Just as perceived by the names of the different methods, the MAX and RMS method uses a sliding window of 1 ms and then the maximum respective the root- mean-square of that window is calculated. The EDC method uses the previously calculated early decay curve and transforms it into a level curve. The following steps are the same for all three methods. A first-order polynomial fit is used for the envelope from two to three times the mixing time in order to guarantee that no early reflections distorts the envelope. The remaining values of the decay curve is estimated by linear extrapolation. The reverberation level is defined as the level of the diffuse sound field at the TOA of the first early reflection, but the amplitude of the reverberation can be found from the decay curve at any time, for example at the mixing time. According to the authors of the Paraspax, the reverberation level can be esti- mated using any of the three methods, but the MAX method is the one that provides the best estimates. This conclusion holds also for the tested RIRs, where the MAX method was used for all of them except the Genesis 6 studio and the Shrine and parish church of all saints north street that got the best reverberation level esti- mates using the RMS and EDC method, respectively. The reverberation level of these spaces as well as sliding window and polynomial fit can be seen in Figure 3.2 and 3.3. These plots shows that respective polynomial fit follows the decay of the amplitude in dB and is thereby a good representation of the diffuse sound in the spectral components. The reverberation level is calculated as -6.7 dB for the Genesis 6 studio and -9.4 dB for the Shrine and parish church of all saints north street. The reverberation level for the other RIRs are listed in the Appendix. 3.3 Spectral components The Paraspax method divides the early part of the RIR into spectral components and early diffuse sound. The spectral components of the direct sound and early reflections are estimated in time, defined by the respective TOAs, with correspond- ing amplitudes. They are then assigned with DOAs in spherical coordinates which describes the angle at which they reach the listener. The parts of the processing that produces these parameters are described below, followed by the processing that produces the synthesized early diffuse sound. The directional components param- eterized by the Paraspax method is used for the loudspeaker array, and a number of decorrelated copies of the early diffuse sound are synthesized, one for each loud- speaker. The direct sound is one single event in the RIR and is easily found by applying a 1 ms long window to the onset. Then the TOA is defined as the time index of the absolute maximum of the pressure response within this window. To get the 10 3. Methods Figure 3.2: Reverberation level of the Genesis 6 studio estimated by the RMS method. Figure 3.3: Reverberation level of the Shrine and parish church of all saints esti- mated by the EDC method. 11 3. Methods corresponding RMS amplitude a new asymmetric window is applied around the TOA. The window is of length 1.5 ms, starting 0.5 ms before the TOA and ending 1 ms after due to summing localization, i.e. if two or more sound waves arrive within a time interval of 1 ms or smaller then all sound sources contributes to the direction of the perceived total sound [10]. The amplitude is then defined as the RMS average of the window. The method succeeded in finding these components of the direct sound for all tested RIRs. Moreover, the method finds up to 10 early reflections for each RIR which makes the approach a bit more complex. The TOAs and amplitudes will be tracked by a reflection detection algorithm explained in subsequent section. 3.3.1 Reflection detection When the TOA and amplitude of the direct sound is found, a reflection detection algorithm is used to find the TOAs and amplitudes of the early reflections. First, the TOAs are found by applying a sliding window of 1 ms to the whole RIR or up to two times the estimated mixing time. If the energy of a time index in the window is three times higher than the median energy of the whole window then a reflection is defined at this time index, i.e. the TOA of the reflection. The RMS amplitudes of the early reflections at the TOAs are calculated in an asymmetrical window in the same way as for the direct sound. The high resolution of the window length is needed in order to capture the ground reflection that is important, especially for outside environments where the ground reflection might be the only reflection. Furthermore, the high resolution normally provides more than 10 early reflections which are first selected according to summing localization; if more than one TOA is found within a time span of 1 ms then the one with the highest RMS amplitude is defined as a reflection while the other(s) are removed from the early reflections. As there still may be many reflections in the selection list, they are sorted by their amplitudes in descending order such that the early reflections selected in the end corresponds to the loudest reflections. The Paraspax method selects between 6 and 10 most prominent reflections due to interest of lowering the computational load. The goal is to not use an unnecessary number of reflections that slows down the processing but at the same time to use as many as necessary to fully recreate a space. A previous study where the aim was to use a minimal set of salient early reflections showed by listening experiments that 6 reflections are enough to repro- duce parametric spatial audio rendering that is indiscernible from a fully-rendered reference for speech content based on the image source model [10]. The image source model is one out of three different approaches for calculating the DOAs of the early reflections in the Paraspax method, and it is also the image source model implemented by the Paraspax method that is used within this thesis in subsequent section. The absolute pressure response of the RIR is plotted together with the TOAs and RMS amplitudes, detected by the Paraspax method, of the direct sound and early reflections of three tested RIRs in Figure 3.4-3.6. For these RIRs the reflection detection algorithm found a different number of reflections. 12 3. Methods Figure 3.5: Direct sound and early reflections found by the reflection detection algorithm of the Paraspax method in the RIR of the Hamilton mausoleum. Figure 3.4: Direct sound and early reflections found by the reflection detection algorithm of the Paraspax method in the RIR of the Helsington church. 13 3. Methods Figure 3.6: Direct sound and early reflections found by the reflection detection algorithm of the Paraspax method in the RIR of the Koli national park. Figure 3.4-3.6 shows that the mixing time appears after all early reflections which means it is a good predictor. The reflection algorithm detected 10 early reflections for the Helsington church while it only found 9 for the Hamilton mausoleum. The Koli national park is an outside environment only contributing to a single reflection. 3.3.2 Directions of arrival The Paraspax method uses three different approaches for estimating the DOAs of the selected early reflections whose TOAs and amplitudes have been previously found. The approach used here is the image source model (see Section 2.2), but for those interested in the other approaches, based on pseudo-randomized or precomputed DOAs, are referred to [1]. As mentioned before, the method requires predefined source direction and source distance which can be used to estimate the DOA of the direct sound. Alternatively, the direct sound DOA can be found from the image source simulation together with the DOAs of the reflections. Geometrical data (source and receiver positions and room dimensions) is neces- sary for spatialization using the image source model. The room dimensions are an approximation of a shoebox-shaped room, an approximation that may differ greatly from reality for some of the tested rooms that, for example, have arched sides, con- tain small passages or misses one or some of the walls. The order of image sources is set to 2 which would be enough since first-order image sources in an empty shoebox- shaped room contributes to six dynamically reproduced early reflections from the four walls, floor and roof. The Paraspax method also allows for preferring first-order 14 3. Methods reflections but this was not the case in this study. The image source simulation of the Paraspax method derives all second-order reflections from the simulated room, each with a corresponding TOA and DOA. These TOAs are compared with the TOAs obtained from the reflection detection such that those with the smallest TOA differences are defined as the same reflec- tion. Then the azimuth and elevation of the corresponding DOA describes in which direction the reflection will arrive to the listener. In the following, the environments of three tested RIRs are described and their respective DOA pattern found from the simulated image source model of the Paraspax method will be shown. Arthur sykes rymer auditorium is said to reproduce sound of high quality thanks to its unique acoustics. Its preferred noise criterion (PNC) is better than the PNC 15 standard, which means it shuts out outside noise. It is a rectangular-shaped auditorium so the space itself is shoebox-shaped. The simulated DOAs of the direct sound (red) and early reflections (fushia) in the auditorium are presented in Figure 3.7, showing that reflections hits the receiver only in two azimuth angles but in various elevation angles. Due to its rectangular shape, the result is more accurate than the following examples, but the image source model does not take into account the inclined medical floor and its interior. However, the image source model is a popular approach for generating early reflections due to its efficient simulation [11]. Figure 3.7: DOAs of the direct sound and early reflections in azimuth and elevation of Arthur sykes rymer auditorium, found by the Paraspax method. The Stairway of a university is located in a shoebox-shaped room of which rises in height. The impulse response is assumed measured at one of the mid floors so 15 3. Methods that the source and receiver positions are positioned in the middle of the room. Note that reflections in the steps will not be estimated in an empty shoebox-shaped room of the image source model. The tunnel-like chamber Hoffman lime kiln has arched side walls and roof. It is a large U-shaped stone construction, whose impulse response was measured at position "R1", seen in Figure 3.8, differs markedly from a rectangular room. The curved area, seen in the floor plan, is used in the image source simulation, approximated as a shoebox-shaped room of dimensions 25 m × 4.72 m × 2.3 m (length × width × height). Figure 3.8: The inside (left) and the floor plan marked with different source and receiver positions (right) of the Hoffman lime kiln chamber. The DOAs of the direct sound and early reflections of the Stairway and Hoffman lime kiln can be seen in Figure 3.9 and Figure 3.10 respectively. Due to where the source and receiver are positioned in the Stairway, the sound waves are more likely to get reflected in the side walls than in the floor and roof and therefore, almost all reflections have 0◦ in elevation angle but have the greater spread in the azimuth. However, the stairs contribute to floor and ceiling formations where some sound waves would be reflected in reality which will not be included here. The approximation of the space of Hoffman lime kiln allows some of the sound waves to get reflected instead of disappearing through the long parallel corridors. It means that the image source simulation contributes to more reflections than in real world compared to the Stairway where the case is opposite. 3.4 Late reverberation The diffuse reverberation of the Paraspax method is synthesized in the same way as in the BinRIR algorithm [4]. Just as in the Paraspax, the measured RIR is 16 3. Methods Figure 3.9: DOAs of the direct sound and early reflections in azimuth and elevation of the Stairway, found by the Paraspax method. Figure 3.10: DOAs of the direct sound and early reflections in azimuth and eleva- tion of Hoffman lime kiln, found by the Paraspax method. 17 3. Methods separated and divided into direct sound, early reflections and late reverberation, which are treated separately before being put together. The aim is to create 2-channel late reverberation for a pair of ears that is spatially equally distributed. It means that the late reverberation is omnidirectional, having on average the same proportion of sound energy from every direction. To obtain this, the signals are decorrelated, i.e. the cross-correlation is reduced. Interaural coherence is the measure of similarity between the reverberation received by each of the two ears, where a low value is desirable and creates a more pleasant sound in comparison to correlated diffuse sound which sounds strange and not very diffuse. In a first step, binaural white noise is generated and filtered with an interaural coherence filter which gives each channel slightly different parameters. The binaural noise is then split into time segments of 2.67 ms [4] and convolved with small chunks of the measured RIR of 0.67 ms to adapt the noise into the energy decay curve. The length of the windows was determined using a listening test. All time sections are windowed with raised-cosine ramps and then finally added all together with the overlap-add method which overlap all time sections so that the cosine ramps entering each other before adding them. This gives a smooth and coherence result. The late reverberation for the loudspeaker array is synthesized in the same way as in the Paraspax method, but a few more decorrelated copies are generated, one for each loudspeaker signal. So instead of creating a 2-channel decorrelated late reverberation for headphone reproduction as in the Paraspax method, a N -channel decorrelated late reverberation is synthesized, where N is the number of loudspeakers in the array. 3.5 Early diffuse sound The late reverberation synthesized in the Paraspax method is based on the measured RIR in the whole time range so it extends therefore in both the early and latter part. The diffuse sound in the latter part is used as late reverberation. However, diffuse sound is also used in the early part of the synthesized BRIR and added to the parameterized directional components. The motivation comes from a study based on listening experiments [12], which has shown that by adding diffuse sound to the spectral components in the early part of a BRIR contributes to higher perceptual quality then if only spectral components are used. Therefore, early diffuse sound will also be present in each loudspeaker signal of the loudspeaker array. In the Paraspax method, it is synthesized by taking the early part (up to 2-3 times the mixing time) of the synthesized late reverberation from the previous step and performing some further processing described below. Since the loudspeaker array requires a N -channel late reverberation where each channel is decorrelated, the early diffuse sound will consist of a N -channel decorrelated early diffuse sound, one for each loudspeaker in the array, that is built upon the N -channel late reverberation. The following description of how the early diffuse sound is obtained by the Paraspax method is the same for the loudspeaker array, except that a few more copies are synthesized. Together with the late reverberation, the early diffuse sound field is estimated from a weighting function based on the selected early reflections obtained from the 18 3. Methods reflection detection. It is created by taking a sliding window of 1 ms to the absolute pressure response and convolving it by a Hanning window of 3 ms. Its values at the TOAs of the direct sound and selected early reflections are windowed with a 1.5 ms window and set to 1. The strong edges that arisen are smoothed out with a 1 ms window. The weighting function of the Helsington church, whose direct sound and early reflections are presented in Figure 3.4, can be seen in Figure 3.11 illustrated by the yellow curve. Figure 3.11: The early part of the measured RIR of the Helsington church plotted together with the detected direct sound and early reflections (directional part) and the corresponding weighting function. The directional part and weighting function is found by the Paraspax method. To get the early diffuse sound, the weighting function is inverted as the square root. Let wf be the weighting function. The inverse weighting function is then (wf)−1 = √ 1 − wf. (3.1) Reverberation level is also used in BRIR synthesis when estimating the early diffuse sound as preserves the diffuse sound in the spectral components. In order for the directional parts to still be prominent and not masked by the diffuse sound, the inverse weighting function is limited to not exceeding the value of the reverberation level. The inverse weighting function of the Helsington church and its diffuse rever- beration in the early part of one of the N channels is plotted in Figure 3.12. The binaural diffuse sound in the early part extends up to the 2 times the mixing time. 19 3. Methods Figure 3.12: The binaural diffuse reverberation built from binaural white noise and based on the measured RIR of the Helsington church plotted together with the inverse weighting function. The early diffuse sound is obtained by multiplying the two functions in Figure 3.12, and the obtained results can be seen in Figure 3.13, where it is plotted together with the directional part obtained by multiplying the weighting function with the measured RIR in order easily show that the early diffuse sound never exceeds the amplitudes of the direct sound and the early reflections. 3.6 Extension on loudspeaker arrays The monaural and spatial parameters calculated in the previous steps can be used for spatial audio reproduction. The extension on loudspeaker arrays uses a number of loudspeakers as the source, instead of headphones that are used in the Paras- pax method. Instead of using a physical loudspeaker array, head-related impulse responses can be used to create virtual sound sources which makes it possible to virtually place a listener inside the array using headphones. The listener is then able to virtually move within the array by adjusting each loudspeaker signal in rela- tion to the new distance and angle between the listener and the loudspeakers. The simulated sound field at the center and how it is constructed will be explained first followed by the construction of the sound field as the listener changes position. 20 3. Methods Figure 3.13: The directional and diffuse components that forms the early part of the synthesized BRIR of the Helsington church. 3.6.1 Listener at the center The loudspeaker array is defined as a number of loudspeakers placed over a sphere. The listener is placed at the origin of the sphere so the position of each loudspeaker relative the listener can be represented by spherical coordinates (azimuth and eleva- tion). In total, the loudspeaker array consists of 84 loudspeakers and their positions are illustrated in Figure 3.14 by the green dots. The loudspeaker array contains 7 elevation angles with a resolution of 25◦, starting at 75◦ above and ending at -75◦ below the listener. The azimuth angle extends around the listener from 0◦ to 330◦ with a resolution of 30◦. The loudspeakers lying on the circle in the horizontal plane orthogonal to the zenith, at the fourth elevation row, are in line with the ears of the listener. The north and south poles of the sphere are not equipped with any loudspeaker. The paper presented by Müller and Ahrens [13] shows that listeners who performed a listening test could not hear any clear differences in SRIRs con- taining elevated early reflections or not. Although there are larger perceived spatial differences in loudspeaker-based reproduction compared to reproduction using head- phones, the elevated reflection has to be strong in order for a listener to hear clear differences when projecting this reflection onto the horizontal plane. In a first step, the sound field is rotated such that the DOAs of the direct sound and early reflections are converted into the global coordinate system relative the listener in which the loudspeaker array is defined. By doing this the direct sound will always be played-back from the loudspeaker positioned right in front of the listener that corresponds to 0◦ in azimuth and elevation. Then the asymmetrical windows of 1.5 ms containing the amplitudes of the early reflections at the respective 21 3. Methods Figure 3.14: A spherical loudspeaker array of 84 loudspeakers positioned at the green dots. TOAs are distributed over the loudspeakers. The loudspeaker position in spherical coordinates that matches the DOA of a reflection the best is the loudspeaker that will play-back that reflection. Figure 3.15 shows the loudspeaker signals of two of the loudspeakers in the array of the Helsington church when all reflections has been distributed over the loudspeaker array. One early reflection is passed to one of the loudspeakers as its position matches the DOA of that reflection. For the other loudspeaker, the incidence angle of three reflections matches its position. The time it takes for each loudspeaker signal to reach the listener (the TOA of the loudspeaker array) is determined by the radius of the loudspeaker array, rLA, which is set to 10 meters, TOALA = rLA c ≈ 29.2 ms, (3.2) where c = 343 m/s is the speed of sound in air. Each loudspeaker signal will therefore be shifted according to this radius so that the time of arrival of the direct sound is the same for all tested RIR as the same loudspeaker array is used to reproduce all rooms. Both the late reverberation and early diffuse sound are synthesized (and the de- scription of how it is synthesized can be read in section 3.4 and 3.5) such that they consists each of a 84-channel of decorrelated signals that corresponds to each loud- speaker in the array. The method has now simulated a set of spectral components, early diffuse sound and late reverberation, represented by each loudspeaker. The listener can either physically be placed inside a loudspeaker array whose respective 22 3. Methods Figure 3.15: Early reflections passed to two different loudspeakers of the array. One loudspeaker is assigned one reflection (left) while another is assigned three reflections (right). loudspeakers plays its assigned loudspeaker signal, or the loudspeaker array can be created virtually by using HRIRs corresponding to each loudspeaker position in the array for creation of virtual sources. 3.6.2 Virtual loudspeaker array When simulating a virtual loudspeaker array it can be play-backed using headphones and the aim is therefore to synthesize a 2-channel BRIR that corresponds to the total contribution of all loudspeaker signals in the array. In the Paraspax method, HRIRs are simulated as spherical harmonics coefficients at the spatial order of M ≤ 35 which corresponds to arbitrary head orientations of the listener. The HRIR set used is measured from a Neumann KU100 artificial head. This set of HRIRs can be used for creating virtual loudspeakers, where only HRIRs of the head orientations that corresponds to the loudspeaker positions relative the listener are used. The HRIRs are used to get the transmission of the loudspeaker signals from each loudspeaker to the listener. The HRIR for two different loud- speaker positions (90◦ in azimuth to the left, and 180◦ in azimuth, right behind the listener) are shown in Figure 3.16. The elevation angle of the two HRIRs is at 0◦. As seen in Figure 3.16, the sound coming from the loudspeaker positioned to the left of the listener will be perceived as louder by the left ear for almost the entire time range. The sound reaching the right ear is attenuated as the sound path is obstructed by the listener’s head. However, for the loudspeaker placed at 180◦ right behind the listener, both ears will hear approximate the same amount of the signal 23 3. Methods Figure 3.16: Measured HRIRs from a Neumann KU100 artificial head showing how sound is reaching the left and right ear when the sound source is positioned at 90◦ to the left (left) and at 180◦ right behind (right). at the same time. The loudspeaker signals containing the assigned spectral components, early dif- fuse sound and late reverberation is convolved with its respective HRIR and by doing this, each loudspeaker signal will have a left and right channel. As the de- sired output of the virtual loudspeaker array is the combined contribution from all loudspeakers, the loudspeaker signals are added to form a 2-channel loudspeaker array-based synthesis representing how the sound is perceived by the listener that is virtually placed inside the array. As the part of the loudspeaker signals that consists of the directional components only contains energy at the TOAs of the direct sound and early reflections, the contributions from all loudspeakers can easily be added. For the early diffuse sound and late reverberation, on the other hand, a further step is required when summing up the various loudspeaker signals. The early diffuse sound that is played-back from each loudspeaker are incoherent with equal RMS. By adding two such signals, the sound increases by 3 dB. To counteract this, the sum of the early diffuse sound of each loudspeaker is divided by the square root of the number of loudspeakers used in the array, p̃(t) = ∑N n=1 p̃n(t)√ N , (3.3) where p̃(t) is the RMS sound pressure amplitudes added from all loudspeakers and N denotes the number of loudspeakers. The same applies the late reverberation. To move the listener inside the loudspeaker array, the listener can either physi- cally move, or the loudspeaker signals in the virtual loudspeaker array can be mod- 24 3. Methods ified according to the new loudspeaker positions relative the listener for each new listener position. 3.6.3 Change listener position In the Paraspax method, only the direct sound and early reflections are modified as the listener moves the head or changes its position. The early diffuse sound and the late reverberation is kept constant during the audio reproduction, but the reproduction is still accurate due to changing the directional components in the method also results in a change in the DRR [1]. However, for a listener moving within a loudspeaker array, the distance to the different sources changes with each new listener position and therefore should the whole signal of each loudspeaker change. The first step is to define the new loudspeaker positions in spherical coordinates relative the new listening position. Each loudspeaker signal still contains the same components, but the incident angle of the sound from each loudspeaker to the listener changes, where certain loudspeaker signals are strengthened in some areas while they are weakened in others. Figure 3.17 shows an example of how the sound rays of the loudspeaker signals reaches a listener, positioned at the new position, marked with a red dot, and how it differs from when the listener is positioned at the center, marked with a black dot. The loudspeaker array is illustrated as 4 loudspeakers in the horizontal plane, positioned at varying azimuth angles at the elevation angle of 0◦ for simplicity. Figure 3.17: Sound travelling from the loudspeakers to a listener positioned at the center of the array (black dot) and to a listener at an arbitrary position (red dot) within a simplified loudspeaker array of four loudspeakers in the horizontal plane. 25 3. Methods As seen in Figure 3.17, when the listener moves inside the array, the distance between the listener and every source changes and is not the same for all loudspeak- ers, as it was when the listener was positioned at the center. The TOA of each loudspeaker signal therefore has to be modified according to the new distance to the listener. The new distances for each loudspeaker relative the listener are calculated and the loudspeaker signals are shifted accordingly. A variable d is defined as the difference between the distance at the sweet spot, dss, and the distance at the new position, dnp, dn = dssn − dnpn, for n = 1, 2..., N. (3.4) Negative values of dn denotes that the new position is closer to loudspeaker n than before, while greater distances gives positive values. The respective loudspeaker signals are amplified or reduced, depending on dn. The Paraspax method uses the inverse-square law saying that the sound energy radiating from a point source decreases proportional to the square of the distance. This distance attenuation, however, is often too extreme for a loudspeaker which cannot really be equated to a point source. A factor for the distance attenuation that works in most cases for loudspeakers is the square root of the distance rn between loudspeaker n and the new listening position, such that p̃n(t) ∝ 1 √ rn , if dn > 0, (3.5) p̃n(t) ∝ √ rn, if dn < 0, (3.6) where p̃n(t) denotes the RMS sound pressure from loudspeaker n containing the directional as well as the diffuse sound. For the virtual loudspeaker array, the loudspeaker signals are convolved with a new set of HRIRs that corresponds to the loudspeaker positions relative the new listener position, as seen in Figure 3.17. 26 4 Results The method is tested for the 15 selected monaural RIRs that can be found in [2] and [3]. The aim is to reduce the number of loudspeakers used in the loudspeaker array while maintaining the sound quality. The number of loudspeakers is determined when the listener is positioned at the center of the array. Using the resulting number of loudspeakers, the listener will move within the loudspeaker array and it will be examined how far from the center the listener can move without changing the sound image. Then the influence of different parameters of the loudspeaker array is investigated and it will be examined if these parameters also changes how far the listener can move from the center. The results are obtained from the virtual loudspeaker array such that the listener is virtually placed inside the loudspeaker array. The resulting loudspeaker array synthesis that represents the total contribution from all loudspeakers is convolved with an anechoic drums audio file. The resulting audio is analyzed by listening to the auralization and conclusions are drawn from the listening along with analysis of corresponding plots. 4.1 At the center of the loudspeaker array When the listener is positioned at the center of the array, the aim is to reduce the number of loudspeakers while maintaining the sound quality. It could be both costly and time consuming to assemble a loudspeaker array containing many loudspeakers. The sound field created by the loudspeaker array of 84 loudspeakers presented in Figure 3.14 will be used as a reference and compared with two simplified versions of a loudspeaker array. The number of loudspeakers in the simplified versions is reduced until the sound quality differs from the sound quality created by the 84- loudspeaker array. The two simplified loudspeaker arrays that will be tested here contains loudspeakers whose positions only varies in azimuth or elevation, unlike the 84-loudspeaker array whose loudspeaker positions have different azimuth and elevation angles. The simplified loudspeaker array having loudspeaker positions with varied azimuth angle is in the horizontal plane and the loudspeakers are positioned at ear level of the listener on the circle at the fourth row in the 84-loudspeaker array in Figure 3.14 when the elevation angle is 0◦. In the other simplified version, the loudspeakers are positioned right in front of the listener at the azimuth angle of 0◦, and at different elevation angles such that the loudspeaker array is in the vertical plane. The auralization from the 84-loudspeaker array compared with the auralization 27 4. Results from the simplified loudspeaker arrays resulted in that the quality of the sound field is still maintained for the number of loudspeakers that are presented in Table 4.1 for all of the 15 tested environments. For the environments where the number of loudspeakers are not specified in Table 4.1, the simplified loudspeaker arrays could not be used. Number of loudspeakers Number of loudspeakers Name (varying azimuth) (varying elevation) Genesis 6 studio 3 - Trollers gill - - Maes howe 3 - Arthur sykes rymer auditorium - - Koli national park - - Stairway 3 - Hoffman lime kiln 3 - Central hall - - Helsington church 3 - Promenadikeskus concert hall 3 - Innocent railway tunnel 3 - Falkland palace royal tennis court 3 - Shrine and parish church of all saints 3 - Hamilton mausoleum 3 - Terrys factory warehouse 4 - Table 4.1: Numbers of loudspeakers required in the loudspeaker array for different impulse responses. For the loudspeaker array in the vertical plane, up to seven loudspeakers was used at different elevation angles, but this loudspeaker array setup could not achieve binaural sound at all. The 7-loudspeaker array setup is presented in Figure 4.1 from the side, where the listener is placed at the origin of the sphere and the mid- loudspeaker at 0◦ in elevation is at the listener’s ear level. The loudspeakers are positioned from 60◦ above the listener to -60◦ below the listener with a 20◦ resolution. A lower amout of loudspeakers in the loudspeaker array in the vertical plane looks as in Figure 4.1, but with greater spacing between the loudspeakers due to the higher resolution. The reason why the simplified version of the loudspeaker array in the vertical plane cannot be used can be seen from Figure 4.1. The loudspeakers are positioned right in front of the listener and each loudspeaker signal will therefore reach the left and right ear of the listener equally and create a mono sound. In order for the loudspeaker array with varying elevation to create binaural sound, there should be variations in the azimuth as well so that the sound reaches the listener from behind and the sides as well. As seen in Table 4.1, three loudspeakers are enough in 10 cases for the simplified loudspeaker array in the horizontal plane. The corresponding 3-loudspeaker array 28 4. Results Figure 4.1: A loudspeaker array in the vertical plane containing 7 loudspeakers positioned at the green dots right in front of the listener. is presented from above in Figure 4.2, where the loudspeakers are represented by green dots placed at an equal azimuth angle between each other at 0◦, 120◦ and 240◦. Henceforth these loudspeakers will be called "L1", "L2" and "L3", respectively. The listener is positioned at the origin of the circle, facing the rightmost loudspeaker of azimuth angle at 0◦. For all the environments that required three loudspeakers in the loudspeaker array in the horizontal plane, presented in Table 4.1, the 3-loudspeaker array created a sound field which can be equated to the sound field created by the 84-loudspeaker array. The sound field sounds spacious and dynamic, and it sounds like the sound from the different drums in the audio are coming from different directions. It also sounds wide in comparison to when a 2-loudspeaker array is used. Then the sound quality drops drastically. The loudspeakers are then positioned right in front and behind the listener at 0◦ and 180◦ so that the spaciousness in the sound decreases and it sounds more flat and narrow. It is also harder to hear from which direction the sound from the different drums come from. The position of the loudspeakers results in that the loudspeaker signals reaches the listener equally and therefore it sounds more monaural than binaural. The Terrys factory warehouse is the only environment where it is preferred to use a 4-loudspeaker array in the horizontal plane over a 3-loudspeaker array. The loudspeakers are then placed around the listener with equal azimuth angle between them as in Figure 4.2, but with the loudspeaker positions at 0◦, 90◦, 180◦ and 270◦ in azimuth. However, the reflections are distributed only over the three loudspeakers positioned at 0◦, 180◦ and 270◦, while the loudspeaker signal at 90◦ only contains diffuse sound. When the 3-loudspeaker array is used, only the two loudspeakers at 29 4. Results Figure 4.2: A loudspeaker array in the horizontal plane containing 3 loudspeakers positioned at the green dots around the listener. 0◦ and 120◦ are assigned reflections. What distinguishes the 3-loudspeaker array from the 4-loudspeaker array is that the 3-loudspeaker array creates a more flat and mono sound than the 4-loudspeaker array that sounds richer and more binaural. In addition, the 3-loudspeaker array creates some disturbing echo which could not be heard in the loudspeaker array of 84 and 4 loudspeakers. Moreover, the DOAs of the reflections are more in line with the loudspeaker positions of the 4-loudspeaker array compared to when three loudspeakers are used. The measured monaural RIR of Maes howe is shown in Figure 4.3 and the syn- thesis created from the 3-loudspeaker array in the horizontal plane of Maes Howe is shown in Figure 4.4 together with the synthesis from the 84-loudspeaker array for comparison. In these plots, it can be seen that the method successfully recre- ated the structure of the monaural RIR, and that a reduced number of loudspeakers gives rise to similar plots as for the 84-loudspeaker array. In the 3-loudspeaker array the direct sound is played from loudspeaker L1. The DOAs in azimuth of the 10 early reflections are approximated to the angles of the loudspeaker positions at 120◦ and 240◦ such that seven reflections are played-back from loudspeaker L2 and three reflections are played-back from loudspeaker L3. This can be seen in the plot, which is dominated at the left ear because L2 is positioned on the left hand side of the listener. When using an array of 4 loudspeakers instead of 3 for this particular environment, the azimuth of the early reflections are more in line with the azimuth of the loudspeaker positions. The synthesis, both from a 4-loudspeaker array as well as a 3-loudspeaker array, sounds similar as to the synthesis of the loudspeaker array of 84 loudspeakers. However, since the goal is to reduce the number of loudspeakers 30 4. Results Figure 4.3: Measured monaural room impulse response of Maes Howe. Figure 4.4: Loudspeaker array synthesis of Maes howe from a loudspeaker array in the horizontal plane containing 3 loudspeakers (left) and from the 84-loudspeaker array (right). 31 4. Results Figure 4.5: Interaural coherence of Maes howe for a loudspeaker array in the horizontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker array (right). in the array, the loudspeaker array containing three loudspeakers is the one to be chosen at the end. The frequency-dependent interaural coherence (IC) can be seen in Figure 4.5 for the 3- and 84-loudspeaker array. The IC of the 3-loudspeaker array shows values close to 1 for frequencies up to 100 Hz before it decreases for higher frequencies. The IC then continues up and down and is higher that 0.5 at frequencies around 2 500 Hz, between 3 200 Hz and 3 900 Hz and from 5 200 Hz to 5 600 Hz as well as above 12 700 Hz, indicating that the signals in the synthesis are more correlated than decorrelated at these frequencies. The IC of the 84-loudspeaker array shows that the signals are decorrelated to a higher degree in general compared to when a reduced number of loudspeakers are used, and this is especially applied to the mid-frequency region. For the Helsington church, three and one reflections are assigned to loudspeaker L2 and L3, respectively, while the direct sound and six reflections are played-back from loudspeaker L1. The synthesis from the 3- and 84-loudspeaker array are shown in Figure 4.6, which is dominated at the left ear because most reflections are assigned to L2. However, it is not as left ear dominated as the synthesis of Maes howe and the reason is that most reflections are played-back from the loudspeaker in front of the listener. Its monaural RIR is presented in Figure 4.7, where it can be seen that the space contributes to more and denser reflections than the synthesis that only contains 10 early reflections. However, by comparing the synthesis with the measured RIR, it is clearly audible that it is the same space. Its IC is presented in Figure 4.8 together with the IC from the 84-loudspeaker array, which shows that the loudspeaker signals have incoherent directional compo- 32 4. Results Figure 4.6: Loudspeaker array synthesis of the Helsington church for a loudspeaker array only changing in the horizontal plane using 3 loudspeakers (left) and its monau- ral RIR (right). Figure 4.7: Measured monaural room impulse response of the Helsington church. 33 4. Results Figure 4.8: Interaural coherence of the Helsington church for a loudspeaker array in the horizontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker array (right). nents in the frequency range between 300 Hz and 14 500 Hz. It can also be seen from the plots in Figure 4.8 that the signals gets more coherent when the number of loudspeakers in the array is reduced. The result of having 3 loudspeakers in the array in the horizontal plane is clear for the Genesis 6 studio, Maes howe, Stairway, Hoffman lime kiln, the Helsington church, the Innocent railway tunnel, Falkland palace royal tennis court and Hamilton Mausoleum, which all have at least four reflections distributed over loudspeaker L2 and L3, and the rest of the reflections are assigned to loudspeaker L1. The Promenadikeskus concert hall and the Shrine and parish church of all saints north street have only 3 reflections assigned to both L2 and L3 and here the result is not that obvious. The synthesis of the Promenadikeskus concert hall from the 3- and 84-loudspeaker array are presented in Figure 4.9, where no clear differences can be pointed out from the plots but they differ more from each other by listening to the results. From the 3-loudspeaker array it is harder to here the spaciousness since the sound arrives mostly from the loudspeaker at the front and dominates the total output while the other two loudspeakers that are necessary for creating binaural sound contributes to a lower sound as only three reflections are distributed over these loudspeakers. The corresponding IC can be seen in Figure 4.10, showing incoherent directional events in the frequency range from 400 Hz to approximate 10 000 Hz. The signals are more correlated in the mid frequencies in the 3-loudspeaker array compared to the 84-loudspeaker array, but they are still more correlated than decorrelated. The syntheses in Figure 4.4-4.9 shows that in overall, the method performs well. 34 4. Results Figure 4.9: Loudspeaker array synthesis of the Promenadikeskus concert hall for a loudspeaker array in the horizontal plane containing 3 loudspeakers (left) and the 84-loudspeaker array (right). Figure 4.10: Interaural coherence of the Promenadikeskus concert hall for a loud- speaker array in the horizontal plane containing 3 loudspeakers (left) and for the 84-loudspeaker array (right). 35 4. Results The TOAs of the direct sound and early reflections in the loudspeaker array synthe- ses are in line with the TOAs in the monaural RIRs, apart from that the syntheses are shifted according to the radius of the loudspeaker array. However, the amplitude of the direct sound and early reflections in the syntheses are reduced in comparison to the monaural RIR for some environments, while it is louder in the syntheses for some other environments. Even though a maximum of ten early reflections are found in the method, the RIRs retain their shape when they are recreated for loudspeaker array synthesis, and the resulting binaural audio from the convolution process recre- ates what it would have sounded like to be in the original location. Depending on how many reflections that are assigned to each loudspeaker in the array, the synthe- ses can be left or right ear dominated, which is the case for all tested RIRs as most reflections tend to come from one side of the room. The interaural coherence of the tested environments shows that the signals at the left and right ears are coherent for low frequencies below around 300 Hz and for high frequencies above 10 kHz. In the frequency range between these frequencies the signals are incoherent. For some of the tested RIRs, a reduced number of loudspeakers in the array contributes to the signals getting more coherent than the signals of the 84-loudspeaker array. Also, for those spaces that are not rectangular in shape, but the geometry has been approximated to a shoe-box room, the interaural coherence is overall higher in the whole frequency range, compared to the spaces where the true dimensions could be used. The DOAs of the direct sound and early reflections of the RIRs that did not perform well on the loudspeaker array are presented in Figure 4.11-4.13, while the DOAs of the Arthur sykes rymer audiotorium was already presented in Figure 3.7 in the Methods chapter. What these environments have in common is that the DOAs of the early reflections have little or no variation in azimuth. Therefore, a loudspeaker array that only varies in azimuth will not create binaural sound as it requires some reflections to reach the listener from behind. However, the early reflections of these environments varies in elevation but binaural sound can not be created for a loudspeaker array that varies only in elevation due to the positions of the loudspeakers, which are placed in front of the listener. 4.2 Investigation of the sweet spot The investigation of the sweet spot described in this section assumes a loudspeaker array in the horizontal plane containing 3 loudspeakers, which is the required number of loudspeakers for most of the environments tested. The sound field created by the loudspeaker array is synthesized every 0.5 meter when the listener moves forward, backwards and straight to the sides as well as diagonally forward and backwards, both to the left and right. The different directions of the listener’s movement in the 3-loudspeaker array can be seen in Figure 4.14. The environments in which the method could not reproduce any binaural sound using the loudspeaker array will not be included in this investigation. Also, different parameters of the loudspeaker array will be changed in order to examine its influence on how the sound image changes at different positions. As the listener moves inside the loudspeaker array, the distance between the 36 4. Results Figure 4.11: DOAs of the direct sound and early reflection of Trollers gill. Figure 4.12: DOAs of the direct sound and early reflection of the Koli national park. 37 4. Results Figure 4.13: DOAs of the direct sound and early reflection of the Central Hall. Figure 4.14: Directions of the listener’s movements in the 3-loudspeaker array. 38 4. Results listener and each loudspeaker will change. A reduced distance to a particular loud- speaker makes this loudspeaker signal dominating the total output from all loud- speakers as the TOA of that loudspeaker signal decreases as well as the amplitude increases. For each listener position in the loudspeaker array, the angle between listener and loudspeaker changes and thus new HRIRs are created for each and ev- ery step. At certain positions, the signal from a particular loudspeaker reaches the listener with a lower level due to how it is angled towards the listener. By moving straight forward towards loudspeaker 1, the directional components assigned to this loudspeaker are amplified and as the TOA of these directional com- ponents decreases, they gets separated from the rest of the directional components. This can be seen in Figure 4.15, where the loudspeaker array synthesis of Maes howe is presented for the listener positioned 1 and 2 meters in front of the center, respectively. Figure 4.15: Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array in the horizontal plane at the listener position 1 meter in front of the center (left) and 2 meters in front of the center (right). Only the direct sound is playedback from loudspeaker 1 and all early reflections are assigned to loudspeaker 2 and 3. As seen in Figure 4.15, the direct sound gets more and more amplified as well as more separated from the rest of the reflections as the listener gets closer to loudspeaker 1. At the same time, the amplitude of the reflections and diffuse sound played-back from the loudspeakers behind the listener decreases and they arrives to the listener later in time. The quality of the sound image deteriorates already 0.5 meters from the center as it now sounds more flat and not as rich as it does at the center, and the sound color does not sound as wide as before. It continues to sound worse until the listener is 3.5 meters in front of the center when it starts to sound as it does at the sweet spot again. The syntheses at 39 4. Results Figure 4.16: Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array in the horizontal plane at the listener position 3.5 meter in front of the center (left) and 5.5 meters in front of the center (right). 3.5 and 5.5 meters in front of the center is shown in Figure 4.16, where it can be seen that the reflections gets more and more attenuated until they are a part of the diffuse sound at 5.5 meters from the center. The IC of Maes howe when the listener has moved 1 m respective 3.5 m in front of the center is presented in Figure 4.17, where it can be seen that the left and right signals becomes more coherent the further from the center the listener moves. The opposite happens when the listener instead moves backwards from the cen- ter. The direct sound reaches the listener after the reflections do and the further from the center the listener is, the higher the reflections become in amplitude while the direct sound is attenuated. This is illustrated in Figure 4.18, where it can be seen that at 2 meters from the center, the direct sound is already attenuated a lot. Furthermore, amplitude of the reflections increases for listening positions up to and including 3.5 meters from the center before they starts to attenuates. This is due to that the distance between the listener and loudspeaker 1 and 2 starts to increase again for listening positions further than 3.5 meters backwards from the center. The reflections contribute to a richer sound compared to how it sounded when the lis- tener moved forward, but the direct sound do not sound as clear as it did when the listener was positioned at the center and this affects the sound quality negative. For the Genesis 6 studio, six early reflections are distributed over loudspeaker 2 and 3, where only one of them are played-back from loudspeaker 2. The other 4 reflections and the direct sound are assigned to loudspeaker 1. This creates a loudspeaker array synthesis which is dominated at the right side, as seen in Figure 4.19. By moving straight aside to the left, loudspeaker 2 will dominate the total 40 4. Results Figure 4.17: Interaural coherence of Maes howe for a 3-loudspeaker array in the horizontal plane at the listener position 1 meter in front of the center (left) and 3.5 meters in front of the center (right). Figure 4.18: Loudspeaker array synthesis of Maes howe for a 3-loudspeaker array in the horizontal plane at the listener position 1 meter behind the center (left) and 2 meters behind the center (right). 41 4. Results Figure 4.19: Loudspeaker array synthesis of the Genesis 6 studio for a 3- loudspeaker array in the horizontal plane at the sweet spot. output from all loudspeakers and as this loudspeaker are closer to the left ear, the signal will be amplified at the left ear. But both loudspeaker 1 and 3 will dominate the right ear because the listener’s head will shadow these loudspeaker signals at the left ear. This will in turn amplify the sound on this side of the listener. This can be seen in Figure 4.20, where the one reflection played-back from loudspeaker 2 is only heard by the left ear and appears earlier in time than the direct sound and rest of the reflections. Moreover, the whole loudspeaker array synthesis is dominated at the right ear as most reflections reaches the listener from this side. The amplitude of the loudspeaker signals of loudspeaker 1 and 3 attenuates more and more the further to the left the listener is, while it increases for loudspeaker 2. The sound quality is good as far as 1.5 meters from the center then it drastically deteriorates the further from the center the listener moves. The separation in time of the reflections contributes to a more flat sound and it sounds like the previous wide sound color has tapered off, but the spaciousness, though, can still be heard. The opposite applies when moving to the right, where loudspeaker 3 will be the dominate loudspeaker. This is illustrated in Figure 4.21. It sounds just as when moving to the left, but the listener can move further to the right compared to the left, before the quality goes down. The HRIR from loudspeaker 3 to the listener at the new listener positions 1.5 m as well as 3.5 m aside to the right is presented in Figure 4.22, where it can be seen how the listeners head, upper torso and pinna filters the loudspeaker signal at the two ears. When the listener moves diagonally forward, both to the left and right, loud- 42 4. Results Figure 4.20: Loudspeaker array synthesis of the Genesis 6 studio for a 3- loudspeaker array in the horizontal plane at the listener position 1.5 meters aside to the left of the center (left) and 3.5 meters aside to the left (right). speaker 1 dominates the total output from all loudspeakers and reaches the listener before the other loudspeaker signals do. The listener can move further from the cen- ter in the loudspeaker array of Maes howe, in comparison to the Promenadikeskus concert hall, which only has two and one reflection assigned to loudspeaker 2 and 3, respectively, where the quality deteriorates already by moving 7 cm diagonally for- ward. The quality deteriorates because the sound sounds flat and the spaciousness decreases. This is also the case for Maes howe, with the exception that it sounds just as at the sweet spot up to 3.5 meters from the center. Due to the symmetry of the loudspeaker array on each side of the listener, the same occurs when the listener moves diagonally forward to the left as to the right, but the results are reversed. The further away from the center the listener is, the less the loudspeakers are angled towards the listener, which contributes to the sound being reduced in amplitude on both ears. For listening positions up to 2.8 m diagonally in front of the center of the array, the loudspeaker signals behind the listener are attenuated, not only due to how the loudspeakers are angled relative the listener but also due to the increase in distance between the loudspeakers and the listener. This applies especially to the loudspeaker positioned on the same side to the one the listener moves towards. This can be seen in Figure 4.23, where the syntheses of Maes howe at listening position 2 m diagonally to the left and to the right is presented. 43 4. Results Figure 4.21: Loudspeaker array synthesis of the Genesis 6 studio for a 3- loudspeaker array in the horizontal plane at the listener position 1.5 meters aside to the right of the center (left) and 3.5 meters aside to the right (right). Figure 4.22: HRIR from loudspeaker 3 to the listener at the new listener positions 1.5 m aside to the right (left) and 3.5 m aside to the right (right). 44 4. Results Figure 4.23: Loudspeaker array synthesis of Maes howe at the listening position 2 m diagonally in front of the center to the left (left) and at the listening position 2 m diagonally in front of the center to the right (right). As seen in Figure 4.23, the direct sound is amplified equally in both cases, but is dominated at different ears. Moreover, as the listener moves to the left, loudspeaker 2 becomes closer to the listener than loudspeaker 3. The opposite happens when the listener moves to the right and this can be seen in Figure 4.23 by looking at the reflections. The reflections assigned to loudspeaker 2 are amplified when the listener moves to the left, while the reflections distributed to loudspeaker 3 are attenuated. When the listener instead moves to the right, the reflections playeb- back from loudspeaker 3 will be amplified while the loudspeaker signal of loudspeaker 2 is attenuated. Moreover, the reflections of loudspeaker 3 are higher in amplitude when the listener moves to the right in comparison to the reflections of loudspeaker 2 when the listener moves to the left. At the position 3.5 m diagonally in front of the center of the array to the left, the minimum distance between listener and loudspeaker 2 occurs, which highly amplifies this loudspeaker signal. The distance to loudspeaker 3, on the other hand, increases and the reflections played from this loudspeakers are highly attenuated. This can be seen in Figure 4.24, where the reflections, in the syntheses of the Promenadikeskus concert hall as well as Maes howe at this listener position, played-back from loud- speaker 2 are drastically amplified, while the loudspeaker signal of loudspeaker 3 is attenuated to the extent that the reflections of loudspeaker 3 are lower in amplitude than the diffuse sound coming from the other speakers. 45 4. Results Figure 4.24: Loudspeaker array synthesis for the listener position 3.5 m diagonally forward to the left of the Promenadikeskus concert hall (left) and of Maes howe (right). The same but opposite applies to loudspeaker 2 and 3 when the listener do the same move, but to the right. As the listener continuous to move further away from the center of the array, the loudspeaker signals of loudspeaker 2 and 3 gets more and more attenuated as the distance between the listener and these loudspeakers increases. When the listener moves diagonally forward to the left, the signals f