Stochastic Methods for Beam Weight Optimization for Large Antenna Arrays A Time Efficient and Robust Method for Beam Weight Optimization Using Non-ideal Data Master’s thesis in Data Science and AI Master’s thesis in Engineering Mathematics and Computational Science MATILDA SELVARAJ TIVESTEN JOHAN ÖDESJÖ DEPARTMENT OF MATHEMATICAL SCIENCES CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2025 www.chalmers.se www.chalmers.se Master’s thesis 2025 Stochastic Methods for Beam Weight Optimization for Large Antenna Arrays A Time Efficient and Robust Method for Beam Weight Optimization Using Non-ideal Data MATILDA SELVARAJ TIVESTEN JOHAN ÖDESJÖ Department of Mathematical Sciences Division of Applied Mathematics and Statistics Chalmers University of Technology Gothenburg, Sweden 2025 Stochastic Methods for Beam Weight Optimization for Large Antenna Arrays A Time Efficient and Robust Method for Beam Weight Optimization Using Non- ideal Data MATILDA SELVARAJ TIVESTEN JOHAN ÖDESJÖ © MATILDA SELVARAJ TIVESTEN & JOHAN ÖDESJÖ, 2025. Supervisor: Dr. Artem Roev, Ericsson AB Examiner: Prof. Rebecka Jörnsten, Department of Mathematical Sciences Master’s Thesis 2025 Department of Mathematical Sciences Division of Applied Mathematics and Statistics Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 31 772 1000 Cover: Radiation pattern of a reference field used during beam weight optimization. Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria Printed by Chalmers Reproservice Gothenburg, Sweden 2025 iv Stochastic Methods for Beam Weight Optimization for Large Antenna Arrays A Time Efficient and Robust Method for Beam Weight Optimization Using Non- ideal Data Matilda Selvaraj Tivesten, Johan Ödesjö Department of Mathematical Sciences Chalmers University of Technology Abstract Traditional methods for beam weight optimization suffer from poor convergence and long run times for large antenna arrays and are limited to ideal antenna ele- ment models. To overcome these limitations, more efficient optimization algorithms are needed. Previous research has demonstrated that stochastic approaches such as genetic algorithms and particle swarm optimization are promising alternatives for solving this problem. This thesis investigates the tailoring of these stochastic opti- mization algorithms for beam weight optimization using high-dimensional, realistic, simulated data. We propose the use of an absolute radiation pattern scale as opposed to a rela- tive one, which allows us to simplify the objective function and promote a more efficient energy usage. We decrease runtimes by restricting the optimization to a low-rank search space, implementing warm starts, and leveraging GPU acceleration. We find that these tailored methods show robust performance and converge signif- icantly faster compared to current methods. The resulting beams also consistently outperform those currently in use. We demonstrate the effectiveness of our approach on antenna arrays of up to 1,152 subarrays, where antennas of 288 subarrays or more are considered large. Keywords: antenna arrays, beamforming, advanced antenna systems, stochastic optimization, genetic algorithms, particle swarm optimization. v Acknowledgements This thesis would not have been possible without the guidance, support, and en- couragement from many different people. Thank you to our academic supervisor, Prof. Rebecka Jörnsten, for being a leading light in the worlds of statistics and optimization. Thank you to our Ericsson supervisor, Dr. Artem Roev, for sharing your expertise on antenna arrays. Thank you to Ericsson for hosting us, to our line manager Dr. Bhushan Billade for having us, and to Håkan Karlsson for answering all of our questions, even the bad ones. Thank you to our proofreaders Alexander Mayer, Dr. Helena Ödesjö, Ulf Ödesjö, and Alma Cavallin for your time and feedback. Thank you to Bella for the endless coffee and enthusiasm. Matilda Selvaraj Tivesten & Johan Ödesjö, Gothenburg, June 2025 vii List of Acronyms Below is the list of acronyms that have been used throughout this thesis listed in alphabetical order: BW Beam Weights CAR Coverage Angular Range CM Current Method EIRP Effective Isotropic Radiated Power ESAP Embedded Subarray Patterns EI Expected Improvement GA Genetic Algorithms GoB Grid of Beams MSE Mean Square Error PSO Particle Swarm Optimization ix Nomenclature Below is the nomenclature of indices, sets, parameters, and variables that have been used throughout this thesis. Indeces m Number of rows of antenna array n Number of columns of antenna array Coordinates { x, y, z} Cartesian coordinate system { θ, φ } Far field coordinate system, θ = π 2 , φ = 0 corresponds to propaga- tion at normal incidence { r, θ, φ } Spherical coordinates Antennas E(r, θ, φ) Electrical field G(θ, φ) Far field function GA(θ, φ) Resulting far field function of an antenna array U(θ, φ) Radiation intensity or Radiation pattern D(θ, φ) Directivity G(θ, φ) Gain GR(θ, φ) Realized gain EIRP(θ, φ) Effective Isotropic Radiated Power PR Total radiated power PO Total accepted power xi PM2 Power available from the matched transmission line feeding the an- tenna ESAPj(θ, φ) Embedded subarray pattern j bwn Beam weight n N Number of beam weights Optimization Fmain Set of angles (θ, φ) that defines the main lobe area Fside Set of angles (θ, φ) that defines the side lobe area L(bw) Objective function F(bw) Fitness function xii Contents List of Acronyms ix Nomenclature xi List of Figures xv List of Tables xix List of Algorithms xxi 1 Introduction 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Electromagnetic Field Theory 3 2.1 Azimuth and Elevation Coordinates . . . . . . . . . . . . . . . . . . . 3 2.2 The Far Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Directivity and Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 EIRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.6 Polarisation of Plane Waves . . . . . . . . . . . . . . . . . . . . . . . 7 2.7 Aperture Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Antenna Arrays 9 3.1 Beam Types and Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Antenna Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 The Superposition Principle . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Beamforming with Antenna Arrays . . . . . . . . . . . . . . . . . . . 15 3.5 Single- and Dual-Polarized Beamforming . . . . . . . . . . . . . . . . 16 3.6 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Optimization 21 4.1 Classical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Stochastic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5 Method 31 5.1 Specification of the Issue Being Investigated . . . . . . . . . . . . . . 31 xiii Contents 5.2 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3 Design of Objective Function . . . . . . . . . . . . . . . . . . . . . . . 32 5.4 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.5 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.6 t-Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.7 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.8 Extension to Dual-Polarized Beamforming . . . . . . . . . . . . . . . 41 6 Results 43 6.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2 Comparison to Current Method . . . . . . . . . . . . . . . . . . . . . 46 6.3 Dimensionality Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.4 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.5 Runtime in Real Application . . . . . . . . . . . . . . . . . . . . . . . 56 7 Discussion 59 7.1 Fulfilment of Desiderata . . . . . . . . . . . . . . . . . . . . . . . . . 59 7.2 Search Space Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 61 7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8 Conclusion 65 8.1 Fulfilment of the Desiderata . . . . . . . . . . . . . . . . . . . . . . . 65 8.2 Further Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Bibliography 69 A Method Appendix I A.1 Paired t-Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I B Result Appendix XI B.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI B.2 Comparison to Current Method . . . . . . . . . . . . . . . . . . . . . XII B.3 p-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIV xiv List of Figures 2.1 Coordinate system used for antennas, with a Cartesian coordinate system for reference. The antenna array is situated in the yz-plane and faces the positive x-direction. . . . . . . . . . . . . . . . . . . . . 3 2.2 In the far field region, the electrical field can approximated as a plane wave. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2 Wide beams are used to identify the optimal traffic beam with which to serve a user. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Wide beams are used to identify the optimal traffic beam with which to serve a user. The beams are arranged in a GoB that cover the CAR. 11 3.4 Creation of an envelope over a set of traffic beams. . . . . . . . . . . 11 3.5 One antenna element consisting of two antennas rotated 45◦ in oppo- site directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.6 A 3 × 1-subarray consists of three antenna elements arranged into three rows and one column. . . . . . . . . . . . . . . . . . . . . . . . 13 3.7 A 3 × 8 antenna array consists of 24 subarrays arranged into 3 rows and 8 columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.8 Constructive and destructive interference between two vectors (green and red arrows respectively). The black arrow represents the resulting vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.9 Representation of how the electrical field is calculated from the ESAPs and BWs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.10 Cut of an example beam. The pink line is the reference field in the main lobe area and the blue line is the optimized field. The black and grey dotted lines are the side lobe and main lobe thresholds, respectively. The red area represents the main lobe loss. . . . . . . . 17 3.11 An example of the radiation pattern of a wide beam and its main lobe. 18 3.12 Cut of an example beam. The pink line is the reference field in the main lobe area and the blue line is the optimized field. The black and grey dotted lines are the side lobe and main lobe thresholds, respectively. The red area represents the side lobe loss. . . . . . . . . 19 4.1 Visualization of a generation of GA with its main operations. The blue ovals are populations, and the smaller circles are individuals. . . 22 4.2 Visualization of a swarm of particles and their velocities. The blue star represents the current best position of the blue particle and the pink star represents the current best position of the entire swarm. . . 25 xv List of Figures 4.3 The mean of the surrogate function is shown as a dashed green line, and the area within 1.96 standard deviations from the mean of the surrogate model is shown shaded in green. The mean of the true function is shown as a dashed red line, and all sampled points are shown as red dots. The acquisition function is shown in blue, with a blue dot denoting the next query point. . . . . . . . . . . . . . . . . . 28 5.1 A flowchart of the method from data and user inputs to optimized beam weights. The dotted line only applies when GA is the algorithm used in the optimization step and represents the use of traffic beam BWs as warm starts. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Comparison of two ways of initializing beam weights. Each point represents one beam weight. . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 Result of paired t-test for GA for initial positions that are half random and half traffic beam weights. α is the level of noise added unto the traffic beams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.1 Comparisons of main lobes of beams optimized by GA and PSO and the reference beam. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2 Comparisons between cuts of beams (blue lines) optimized by GA and PSO, and the reference beam (pink line). Main and side lobe thresholds are shown as well (grey and black line respectively). . . . 45 6.3 Comparison of best loss of final iteration for different metrics for 20 runs of GA and PSO. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.4 A wide beam optimized by CM. In b) and c) the pattern optimized by CM is blue and the radiation pattern of the reference field is pink. The grey and black dashed lines represent the side and main lobe thresholds respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.5 GoB of wide beams optimized by different methods and a GoB of the corresponding traffic beams. . . . . . . . . . . . . . . . . . . . . . . . 48 6.6 Total loss for all beams in the GoB shown in Figure 6.5 and differ- ence in total loss between CM and both GA and PSO. Each loss corresponds to the wide beam in the same position in the GoB. . . . 49 6.7 Distributions of final loss of 20 runs of GA and PSO for search spaces of rank 1/2, 1, and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.8 Comparison between time and loss for different algorithms for differ- ent ranks. Results for GA are shown in blue, and results for PSO are shown in pink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.9 Loss distributions GA and PSO for different ranks. Results for GA are shown in blue, and results for PSO are shown in pink. . . . . . . . 52 6.10 Comparison between time and loss for different algorithms for differ- ent array sizes. Results for GA are shown in blue, and results for PSO are shown in pink. . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.11 Total loss for 20 runs of GA and PSO until convergence using single- and dual-polarized beamforming. . . . . . . . . . . . . . . . . . . . . 54 xvi List of Figures 6.12 Variation of total loss for 20 beam weights with Gaussian noise added to either phase or amplitude. Red arrow indicates maximum allowed error according to internal Ericsson standards. . . . . . . . . . . . . . 55 6.13 Comparisons of beams with noise with 3σP = 22◦ added to phases and 3σA = 2dB added to amplitudes. Also a beam with no noise added for reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.14 Beams marked 1, 2, and 3 can be approximately calculated using beam weights of the beam marked X by mirroring its amplitudes vertically, horizontally, or both. . . . . . . . . . . . . . . . . . . . . . 57 A.1 Results from paired t-test for optimization using genetic algorithm and side lobe thresholds relative to optimized field vs reference field. Both methods were ran 20 times. . . . . . . . . . . . . . . . . . . . . II A.2 Results from paired t-test for particle swarm optimization and side lobe thresholds relative to optimized field vs reference field. Both methods were ran 20 times. . . . . . . . . . . . . . . . . . . . . . . . III A.3 Results from paired t-test for optimization using constrained vs un- constrained GA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV A.4 Results from paired t-test for constrained vs unconstrained PSO. . . . V A.5 Results from paired t-test for PSO with initial positions completely random vs half random and half traffic beam beam weights. . . . . . VI A.6 Results from paired t-test for GA with initial positions completely random vs half random and half traffic beam beam weights. . . . . . VII A.7 Result of paired t-test for GA for initial positions that are half random and half traffic beam weights. α is the level of noise added unto the traffic beams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII B.1 Comparison of best taper loss of final iteration for 20 runs of GA and PSO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI B.2 Taper loss for all beams in the GoB shown in Figure 6.5, and difference in total loss between CM and both GA and PSO. . . . . . . . . . . . XII B.3 Main lobe loss for all beams in the GoB shown in Figure 6.5, and difference in main lobe loss between CM and both GA and PSO. . . . XIII B.4 Side lobe loss for all beams in the GoB shown in Figure 6.5, and difference in main lobe loss between CM and both GA and PSO. . . . XIV xvii List of Figures xviii List of Tables 6.1 Mean loss and runtime over 20 runs of GA and PSO. . . . . . . . . . 45 6.2 Standard deviation of final loss for all loss metrics for GA and PSO. . 46 6.3 Mean loss and runtime over 20 runs of GA and PSO when terminated using convergence criterion. . . . . . . . . . . . . . . . . . . . . . . . 47 6.4 Mean difference in total loss and taper loss between GoBs optimized by CM compared to GA and PSO. For reference, taper loss is usually in the range two to three. . . . . . . . . . . . . . . . . . . . . . . . . 49 6.5 Mean runtime over 20 runs for different search space ranks. . . . . . . 51 6.6 Mean total loss and runtime for 20 runs of GA and PSO for an antenna array with 32 subarrays. . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.7 Mean total loss and runtime for 20 runs of GA and PSO for an antenna array with 1152 subarrays. . . . . . . . . . . . . . . . . . . . . . . . . 53 6.8 Mean runtime over 20 runs for single- and dual-polarisation. . . . . . 54 6.9 Mean total loss per beam over GoB for different kinds of warm starts. 57 6.10 Mean runtime per beam over GoB for different kinds of warm starts. 58 6.11 Estimated times needed to optimize the entire GoB of an antenna with 288 subarrays for five frequencies using different warm start methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 A.1 p-values for all paired t-tests, rounded to two significant figures. . . . IX B.1 Mean taper loss over 20 runs of GA and PSO. . . . . . . . . . . . . . XI B.2 Standard deviation of final taper loss for GA and PSO. . . . . . . . . XI B.3 Mean difference in total loss and taper loss between all 24 wide beam optimized by current method compared to GA and PSO. . . . . . . . XIV B.4 p-values for all two-sample t-tests, rounded to two significant figures. XV xix List of Tables xx List of Algorithms 1 Genetic Algorithm for High Dimensional Problems . . . . . . . . . . . 23 2 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 Particle Swarm Optimization with Inertia . . . . . . . . . . . . . . . 26 5 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 27 xxi List of Algorithms xxii 1 Introduction Ericsson is one of the leading providers of information and communication technolo- gies to service providers, while also investing in technical development. One area under research at Ericsson is antenna arrays. These can conduct traffic with user equipment by creating peaks in the intensity of the electrical field, known as beams, in the direction of the user by manipulating signals called beam weights. Beam weights that result in simple beams can be calculated analytically, while more com- plicated beams require more complex tools, such as optimization methods. A larger antenna array can create stronger beams and better specify direction, and therefore, the current trend appears to be toward larger antenna arrays for 6G. However, current methods for beam weight optimization used at Ericsson suffer from long run times and poor convergence for larger products. Producing beam weights for large antenna arrays in an efficient and precise manner is a challenge that Ericsson needs to tackle to unlock further development opportunities within the area of large antenna arrays. The performed initial studies show that the use of tailored algorithms is a promising alternative to the generic optimizers used thus far, for example through the use of CNNs to predict optimal beam weight values, such as by Frilund and Olofsson [1], although this method was only explored for small antenna arrays. The aim of this thesis is to create an optimization algorithm tailored to the problem of beam weight optimization for large antenna arrays. The algorithm needs to be able to consis- tently converge in a reasonable time while keeping the desired spatial performance of Ericsson’s Active Antenna System products. 1.1 Problem Statement The objective of this thesis is to design a time-efficient optimization algorithm that can be used to optimize beam weights for large antenna arrays such that the result- ing electrical field is fitted to a given reference electrical field. Previously used methods are unable to use real embedded subarray pattern data in the optimization process and can only handle ideal data. This is a drawback, as the physical characteristics of the antenna array affect the resulting electrical field. Ericsson uses a number of different metrics to evaluate the quality of a beam weight 1 1. Introduction choice for a given reference field. For the purposes of this master thesis, a subset of these metrics will be selected to be used as components of the objective function, but all of them will be used when evaluating the quality of the solutions. These metrics are introduced in Section 3.6. The total list of desiderata for the final algorithm is stated here. The algorithm should: • Consistently produce beam weights that correspond to an electrical field with desired metric values with respect to a given reference field. • Run and converge in a shorter time than the current methods. • Produce results that are robust against small variations resulting from the manufacturing of antenna and measurement components. • Optimize beam weights using measured and simulated realistic ESAPs. 1.2 Constraints In this thesis, there are a number of constraints set from the beginning to limit the scope of the project. One such limitation is that we do not take into account emissions outside the frequency bandwidth of the antenna, i.e., all antennas are as- sumed to have no out-of-band emissions. We further assume that all electrical fields propagate through free space and do not consider other mediums. Finally, we use realistic simulated data, and no data measured from real antennas since no such measurements exist for the specific antenna size we mainly investigate. Due to the computational cost of producing realistic simulated embedded subarray patterns, we only have one such set available. This data is generated using the first two assump- tions. However, data could be simulated without these two assumptions, though they have a negligible effect and are therefore generally included as it simplifies the process. 2 2 Electromagnetic Field Theory In this chapter, we introduce the majority of the electromagnetic field theory that is necessary to understand the method and results of this thesis. We explain concepts such as relevant coordinate systems, scales, units, and polarisations. 2.1 Azimuth and Elevation Coordinates When working with antennas, it is common to use a coordinate system that is similar to spherical coordinates. We use the angle in azimuth φ to refer to the angle between the x-axis and the projection of a point onto the xy-plane. The angle in elevation θ refers to the angle between the z-axis and the point. This means that φ is in the range [−180, 180] and θ is in the range [0, 180]. The coordinate system is illustrated in Figure 2.1. Figure 2.1: Coordinate system used for antennas, with a Cartesian coordinate system for reference. The antenna array is situated in the yz-plane and faces the positive x-direction. 2.2 The Far Field When working with antennas, the focus is often on the far field, which is defined as the region where 3 2. Electromagnetic Field Theory r ≥ 2D2 λ , where λ is the wavelength, r is the distance from the source to the measuring point, and D is the largest diameter of the antenna [2, p. 33]. Figure 2.2: In the far field region, the electrical field can approximated as a plane wave. The far field is of interest because the electrical field can be approximated as a plane wave [3], as shown in Figure 2.2. There, the electrical field’s dependency on r becomes nearly perfectly separable from the θ- and φ-dependency and the field can be approximately expressed as E(r, θ, φ) = 1 r e−j 2π λ rG(θ, φ), (2.1) where G(θ, φ) is the directional dependency of the field, known as the far field func- tion. This is the reason that the azimuth/elevation coordinate system omits a radial coordinate. The far field function is complex valued and therefore also affects the phase of E [2, p. 35]. Given the far field function G(θ, φ) the radiated power per unit solid angle can be calculated as U(θ, φ) = 1 2ζ |G(θ, φ)|2, where U(θ, φ) is known as the radiation intensity and ζ is the wave impedance [2, p. 36]. When we refer to the radiation intensity for a set of angles, we call it the radiation pattern. 2.3 Directivity and Gain The directivity D(θ, φ) of an antenna in a given direction is defined as the ratio between the radiation intensity in that direction and the radiation intensity averaged 4 2. Electromagnetic Field Theory over all directions [4, p. 12]. This is written as, D(θ, φ) = 4π U(θ, φ) PR , where PR is the total radiated power of the antenna and is defined as, PR = ∫ 2π 0 ∫ π 0 U(θ, φ) sin(θ)dθdφ. When directivity is mentioned without a directional dependency, it is usually as- sumed that it is in the direction that maximizes the radiation intensity [5, p. 39]. However, in this thesis, we always refer to directivity with a directional dependence. Directivity does not measure the antennas’ efficiency, only its directional capabilities, and therefore gain, G(θ, φ) is often used as accounts for both. The gain of an antenna in a given direction is defined by IEEE as the ratio between the radiation intensity in that direction and the radiation intensity produced if the power accepted by the antenna was radiated isotropically, i.e., equally in all directions [4, p. 17]. Thus, gain is defined as G(θ, φ) = 4π U(θ, φ) PO , (2.2) where PO is the total accepted power of the antenna. According to [4, p. 30] PO ·η = PR, where η ∈ [0, 1] is the radiation efficiency, which is a measure of how large part of the accepted power is radiated by the antenna. Thus, equation (2.2) can be rewritten as, G(θ, φ) = 4πη U(θ, φ) PR = ηD(θ, φ). In the same way as for directivity, when gain is mentioned without a direction, it is usually assumed that it is the direction that maximizes the radiation intensity [5, p. 59]. Likewise, in this thesis, we always refer to gain with a directional dependence. Gain is not enough when we want to know the performance of our antenna in relation to the supplied power. It accounts for the losses inside the antenna, but to account for the energy loss between the energy source and the antenna, we use realized gain. It is defined as the gain of an antenna multiplied with an impedance mismatch factor, M2 ∈ [0, 1], which is the ratio between the total accepted power of the antenna, PO and the power available from the matched transmission line feeding the antenna, PM [4, pp. 19, 31]. Thus, realized gain is GR(θ, φ) = M2G(θ, φ). In the same way as for gain and directivity, when realized gain is mentioned without a direction, it is usually assumed that it is the direction that maximizes the radia- tion intensity. As with directivity, we always refer to realized gain with a directional dependence in this thesis. 5 2. Electromagnetic Field Theory 2.4 EIRP Effective Isotropic Radiated Power (EIRP) is a measure of how much energy an antenna is transmitting in a given direction and is often used when studying the performance of an active antenna. According to “IEEE Standard for Definitions of Terms for Antennas” [4, p. 15] it can be calculated in the following way, EIRP(θ, φ) = POG(θ, φ). (2.3) Since gain can be rewritten as G(θ, φ) = 1 M2 GR(θ, φ) and PO = M2PM , the relation can also be written as EIRP(θ, φ) = PMGR(θ, φ). (2.4) Furthermore, since gain can also be rewritten as G(θ, φ) = ηD(θ, φ) and PO = 1 η PR, we can also write this relation as EIRP(θ, φ) = PRD(θ, φ). (2.5) Equation (2.5) is used more often than (2.3) as PR can be measured directly, whereas PO is very difficult to measure. However, Equation (2.4) is the easiest to use, as the power in the matched transmission line, PM , is the easiest to measure. 2.5 Units In this thesis, the primary focus will be on the far field, where we measure directivity and gain in dBi. The i in dBi stands for isotropic because we measure relative an isotropic antenna, i.e., an antenna that radiates equally in all directions. This is according to convention defined as DdBi = 10 · log10 (D) [dBi] GdBi = 10 · log10 (G/Giso) [dBi], where Giso is the gain of an isotropic antenna. The unit dBm is used to express power, for example U(θ, φ), PO, or PR, in dB relative to 1 mW. That is, it is used to express a power P , measured in mW, in logarithmic scale [6, p. 30]. It is converted according to PdBm = 10 · log10 (P ) [dBm]. If the power, P , is in W we convert it to dBm according to PdBm = 10 · log10 (P ) + 30 [dBm]. EIRP is also often expressed in dBm, and if it is measured in W it is converted according to 6 2. Electromagnetic Field Theory EIRPdBm = 10 · log10 (EIRP ) + 30 [dBm]. Every time directivity, gain, or EIRP is mentioned in the context of this thesis, it is in units of dBi and dBm, respectively. 2.6 Polarisation of Plane Waves As mentioned in Section 2.2, all electrical fields can be approximated as a plane wave if considered in a sufficiently small area in the far field. The electrical field of a plane wave propagating in the ẑ-direction can according to [2, p. 23] be described as E = Ete −j 2π λ z = [Exx̂ + Eyŷ]e−j 2π λ z where Et is the E-field with the propagation vector e−j 2π λ z omitted. It is possible to decompose Et into its two polarisations, x̂ and ŷ. The components Ex and Ey can be expressed as the projection of E onto the orthogonal unit vectors x̂ and ŷ respectively. We use x̂ and ŷ as an example here, but this is applicable to any set of orthogonal unit vectors. If the polarisations can be individually controlled, then it is possible to send twice the amount of information by sending different signals along the different polarisa- tions. This is true for any pair of orthogonal polarisations, however, for this to work well the polarisations must be well separated, i.e. one must be able to detect one of them without the other interfering. Each antenna aims to send only one polarisation, which we refer to as the co-polar polarisation for that antenna. In the non-ideal case it is almost certain that some signals will be sent in the other polarisation, which is referred to as cross-polar [2, p. 23]. Thus, the electrical field from an antenna can be decomposed into the co-polar and cross-polar components, Eco and Exp, where Eco is parallel to the co-polar unit vector ĉo and Exp is parallel to the cross-polar unit vector x̂p. Therefore, the electrical field can be written as E = [Ecoĉo + Expx̂p]e−j 2π λ z. (2.6) The propagation direction ẑ is orthogonal to both ĉo and x̂p which are generally orthonormal vectors [2, p. 24]. 2.7 Aperture Size The aperture of an antenna is the area where most of the radiated power passes through [2, p. 57]. In general it is difficult to exactly define the aperture of an antenna and some antenna types lack the concept completely. 7 2. Electromagnetic Field Theory The effective aperture area of an antenna in a given direction is Ae = PO Wt , where PO is the total accepted power by the antenna and Wt is the power density of a plane wave in the given direction [4, p. 13]. The max directivity of an antenna is directly proportional to the effective aperture area when the field of the antenna has a uniform distribution over the aperture [2, pp. 57, 240]. Max directivity can therefore be calculated as Dmax = Ae 4π λ2 . This means that larger antennas allow us to create narrower, more focused beams, concentrating more of the power in one chosen direction. 8 3 Antenna Arrays This chapter aims to describe the fundamentals of antenna arrays and their use in beamforming. We begin with a description of the different beam types and their uses, before outlining the architecture of an antenna array. We then detail the use of antenna arrays in beamforming. Following that, we also introduce the metrics that are used to evaluate the quality of an optimized set of beam weights. 3.1 Beam Types and Uses The main utility of antenna arrays lies in the ability to manipulate their electri- cal fields as desired through adjustment of the phases and amplitudes of the beam weights. In this way, antenna arrays allow for steering which directions they are radiating energy in. When this is used to radiate energy in a chosen direction, we refer to this peak in the radiation pattern as a beam. We refer to the manipulation of beams through, for example, the choice of beam weights, as beamforming [7, p. 2]. In 5G, antenna arrays are mainly used to create two types of beams, traffic beams and wide beams [7, pp. 2–3]. Traffic beams, or pencil beams, are beams with low side lobe levels and a single main lobe with small half-power beamwidth in both azimuth and elevation [4]. They have high max directivity, which is used to send and receive information from user equipment [7, pp. 2–3]. Figure 3.1a illustrates how a traffic beam communicates with a user device, and Figure 3.1b shows a representation of a cross-section of the radiation pattern of an ideal traffic beam. (a) Representation of a traffic beam conducting traffic to and from a user device. (b) Representation of a cross section of the radiation pattern of an ideal traffic beam. 9 3. Antenna Arrays Each antenna array has a set of traffic beams that cover the entire angular range that the antenna serves, also known as the coverage angular range (CAR). The traffic beam with the strongest connection to the user equipment is used to provide service to the user. However, to choose which traffic beam to use, we need to know which creates the strongest signal. This can be done by testing each traffic beam one at a time and then choosing the traffic beam that created the strongest signal for the user. The problem with this approach is that there are many traffic beams available to an antenna, making it inefficient to test each one. To aid in this process, a second type of beam called a wide beam [7, p. 3] is used. Wide beams are wider than traffic beams, so fewer are needed to cover the CAR. Each wide beam corresponds to a subset of the traffic beams. When searching for the best traffic beam to provide service to a user, each wide beam is tested to find the one that creates the strongest signal for the user. The traffic beams correspond- ing to the chosen wide beam are then tested to find the one with the strongest connection to the user. In this way, it is possible to find the best traffic beam to serve the user without having to check every single traffic beam. This process is illustrated in Figure 3.2. (a) The wide beams are tested to find the one with the best connection. (b) The traffic beams corresponding to the best wide beam are tested to find the one with the best connection. Figure 3.2: Wide beams are used to identify the optimal traffic beam with which to serve a user. The wide beams are arranged in a grid that covers the CAR. This grid is known as a grid of beams (GoB), which can refer to either a grid of wide beams, or to traffic beams, which are also arranged in a grid. The process of selecting a traffic beam is also illustrated in Figure 3.3. 10 3. Antenna Arrays (a) The wide beam with the best connection to the user equipment is selected, and is outlined here in yellow. (b) The traffic beams corresponding to the best wide beam are tested to find the one with the best connection to the user equipment. The selected traffic beam is outlined here in yellow. Figure 3.3: Wide beams are used to identify the optimal traffic beam with which to serve a user. The beams are arranged in a GoB that cover the CAR. The desired wide beam shape is one that represents the strength of the signal for the corresponding traffic beams as accurately as possible. In the ideal case, the strength of the signal between a user and a wide beam is directly proportional to the strength of the signal between the user and the best of the traffic beams corresponding to that wide beam. For this reason, we build our desired wide beam radiation pattern, which will be referred to as the reference field, from the radiation patterns of the corresponding traffic beams. This is done by creating an envelope around these beams, which refers to taking the maximum over all the traffic beams at each point, as represented in Figure 3.4b. (a) Representation of a cross section of several ideal traffic beams corre- sponding to a single wide beam. (b) Representation of a cross section of the envelope of several ideal traffic beams corresponding to a single wide beam. Figure 3.4: Creation of an envelope over a set of traffic beams. The radiation pattern is a representation of the amount of energy that is sent in different directions. A traffic beam utilizes all the energy available from the antenna. For this reason, an envelope of several traffic beams creates a curve that implies that the antenna is emitting more energy than is available to it. This is rather improb- able. For this reason, the envelope is divided by the square root of the number of traffic beams used to create the beam, which lowers the power to a level where the 11 3. Antenna Arrays energy output is not greater than the energy available to the antenna. This is the ideal wide beam shape. It is possible to calculate the beam weights that create a traffic beam analytically. However, this does not work for wide beams due to their complicated shape. For this reason, optimization methods are employed to find beam weights that create a radiation pattern that is similar to that of a given reference field. 3.2 Antenna Arrays As was shown in Section 2.7, larger antennas have higher directivity and for some uses, for example long-distance communication, much higher directivity than an- tennas of ordinary size can create is required. For this purpose, larger antennas or antenna arrays can be used. Antenna arrays consist of multiple antenna elements that are distributed over a surface, the aperture area, and thus antenna arrays with large apertures can be constructed. They are also useful in that they provide a large flexibility when shaping the radiation pattern [8, p. 285]. There are three different types of antenna arrays: linear, planar, and conformal. The type mainly discussed in this thesis is planar antenna array. The antenna elements of a planar array are spread out on a plane surface [2, p. 334]. The antenna elements can be any type of antenna or even several antennas per element, for example, two orthogonal half-wave dipole antennas. 3.2.1 Antenna Elements As mentioned above, the structure of antenna elements can vary, but in the scope of this thesis we will work with antenna elements which consists of two orthogonal antennas. In this case, the antennas produce fields orthogonal to each other that propagate in the same direction. In Figure 3.5 we present an example of an antenna element with two orthogonal antennas. This setup is called ±45-polarisation, since both antennas are rotated 45◦ in opposite directions from the top of the antenna element. Another very common way to build such an antenna element is called H/V-polarisation and consists of one antenna rotated 90◦ and one rotated 0◦ from the top, i.e., one is horizontal and the other is vertical. 12 3. Antenna Arrays Figure 3.5: One antenna element consisting of two antennas rotated 45◦ in opposite directions. In the ideal case, the electrical field of each antenna will consist of only one polari- sation. We illustrate this using the antenna element in Figure 3.5. In the ideal case, the E+45 component of the field will be created entirely by the +45 antenna and the E−45 component will be created entirely by the −45 antenna. However, in reality, each antenna usually creates some low-level noise on its cross-polarisation. 3.2.2 Several Antenna Elements Make a Subarray Several antenna elements can be arranged next to each other to create a subarray. Figure 3.6 shows an example of a subarray with 3 rows and 1 column. It is referred to as a 3 × 1-subarray. Each subarray has 2 ports. Each port corresponds to one polarisation, in this case either +45 or −45. Figure 3.6: A 3×1-subarray consists of three antenna elements arranged into three rows and one column. The excitations sent through the ports are called beam weights (BWs). These are the smallest steerable parameters that will be discussed in this thesis. Beam weights are complex valued, and when a beam weight is set to a non-zero value, the corre- sponding elements in that subarray create an electrical field. Multiplying the beam weight by some factor a will create an electrical field that is a times stronger, and adding a phase θ will create an electrical field that is shifted θ degrees in phase. 3.2.3 Several Subarrays Make an Antenna Array An antenna array consists of a number of subarrays arranged in a grid. An example with 3 rows and 8 columns can be seen in Figure 3.7. An antenna array with m 13 3. Antenna Arrays rows and n columns has 2 ·m · n ports, one for each polarisation for each subarray. By manipulating the beam weights, it is possible to steer the direction and modify the shape of the electrical field created by the antenna array. Each port has a corresponding beam weight, so there are 2 · m · n beam weights to choose for an m× n antenna array. Figure 3.7: A 3 × 8 antenna array consists of 24 subarrays arranged into 3 rows and 8 columns. 3.3 The Superposition Principle Electric fields are vector fields and can therefore be combined according to the superposition principle. When there are several sources creating electrical fields in the same space, the total electric field at each point is the sum of the vectors of the different electrical fields in that point [2, p. 135]. The total electrical field Etot can therefore be written as Etot (x) = ∑ j∈S Ej (x) (3.1) where S is the set of sources, Ej is the electrical field created by the source j ∈ S, and x is any point in the space. If the vectors are pointing in a similar direction, the resulting vector will be longer than either, while if the phase difference is large, the resulting vector will be shorter than both. This is called constructive and destructive interference respectively, and is illustrated in Figure 3.8. 14 3. Antenna Arrays (a) Constructive interference. (b) Destructive interference. Figure 3.8: Constructive and destructive interference between two vectors (green and red arrows respectively). The black arrow represents the resulting vector. 3.4 Beamforming with Antenna Arrays For each port on an antenna array, an electrical field is created when the corre- sponding beam weight is set to one, and all other beam weights are set to zero. The resulting far field function is called an embedded subarray pattern (ESAP). As there is one ESAP for each port, an m× n antenna array has 2 ·m · n ESAPs. For each ESAP we can calculate the resulting far field function when that same beam weight is instead set to reiθ, and all other beam weights are still set to zero, by multiplying the ESAP with reiθ. This is done by multiplying the amplitude in each point by r and adding θ to the phase of the vector in each point, as illustrated in Figure 3.9a. By summing over the products of each ESAP with its corresponding beam weight, we get the resulting far field function GA(θ, φ) of the antenna array, which we can write as GA(θ, φ) = ∑ j∈S bwj · ESAPj(θ, φ), (3.2) where bwj is the beam weight corresponding to ESAPj(θ, φ), and S is the set of ports. This calculation is illustrated in Figure 3.9b. By shifting the phases of different beam weights we can steer where we will see constructive and destructive interference in the far field, and we can affect the strength of the field in different points by adjusting the amplitudes of the different beam weights. In this way, we are able to create desired radiation patterns by manipulating the beam weights [7, p. 2]. 15 3. Antenna Arrays (a) The field created by one polari- sation of a subarray is the embedded subarray pattern multiplied with the beam weight. (b) The field created by an antenna array is the sum of the fields created by each polarisation of each subarray. Figure 3.9: Representation of how the electrical field is calculated from the ESAPs and BWs. 3.5 Single- and Dual-Polarized Beamforming In Section 3.2.2 we stated that each subarray of an antenna array has two ports, one for each polarisation. Sometimes we choose to optimize the beam weights in such a way that the two ports for a subarray are assigned the same beam weight. This is called single polarized beamforming and gives us m · n complex variables to optimize for an antenna array of size m×n. The alternative is called dual polarized beamforming which means that we optimize individual beam weights for each port. This results in 2 ·m · n complex variables to optimize. These two approaches have different strengths and weaknesses. With single polarized beamforming there are half as many beam weights that need to be optimized, which can reduce the runtime of an optimization algorithm substantially depending on the time complexity of the optimization. On the other hand, dual polarized beamforming has more degrees of freedom which might allow the algorithm to find a more exact solution. 3.6 Metrics In this section, we introduce three metrics used when evaluating the quality of a choice of beam weights. For some of these metrics, we need a reference field to compare the resulting radiation pattern against. We assume that the far field function are measured in discrete points, at every whole degree of θ and φ, as this is a standard format for measured ESAPs. 3.6.1 Taper Loss Taper loss is a measure of how much of the maximum available power of an antenna is utilized for a particular set of beam weights. This is calculated from the ampli- tudes of the beam weights. Each beam weight has a maximal amplitude constrained by the amount of energy available per port. This implies that when the amplitudes of all beam weights are set to the maximum allowed value, the antenna utilizes all available power. In this case, the taper loss is zero. It is usually assumed that we 16 3. Antenna Arrays are attempting to create the strongest possible beam with the power available to the antenna array. For this reason, we normalize the beam weights and assume that the largest amplitude is set to the largest allowed value. To account for this normalization, we divide each beam weight amplitude by the largest beam weight amplitude in the taper loss equation, effectively performing the normalization as a part of the taper loss calculation in case the beam weights have not been normalized. The taper loss is defined as LT := −10 · log10 ( N∑ n=1 ( |bwn| maxn |bwn| )2 /N ) , where N is the number of beam weights and bwn is the beam weight corresponding to the nth ESAP. 3.6.2 Main Lobe Loss One very important loss is the main lobe loss which measures the difference between the field calculated from the investigated beam weights, and the reference field, by taking the mean square error (MSE) of the radiation pattern of both fields in the EIRP scale. However, the fields are not compared in all indices, but only in the main lobe area Fmain. The main lobe loss is illustrated in cross-section in Figure 3.10. The total main lobe loss is the volume between the reference beam and op- timized beam. Note that the figure is in a logarithmic scale and that the side and main lobe thresholds are lower than they might appear in the figure. −90 −60 −30 0 30 60 90 ϕ (deg) 15 20 25 30 35 40 45 EI RP (d B ) Cut at θ=100 Opti i)ed cur(e Reference cur(e Side lobe threshold Main lobe threshold Figure 3.10: Cut of an example beam. The pink line is the reference field in the main lobe area and the blue line is the optimized field. The black and grey dotted lines are the side lobe and main lobe thresholds, respectively. The red area represents the main lobe loss. 17 3. Antenna Arrays The first step of calculating the main lobe area is finding the highest value of the ra- diation pattern of the reference field, Umax, and its corresponding index, (imax, jmax). Then let P be the set of all indices of the reference field that have a radiation inten- sity higher than Umax − 10 [dBm], which is referred to as the main lobe threshold. We then define the main lobe area Fmain to be the largest connected subset of S that contains the index (imax, jmax). For more details, see the implementation in Section 5.3.2. -90 -60 -30 0 30 60 90 ϕ (deg) 0 30 60 90 120 150 180 θ (d eg ) Epower −10 0 10 20 30 40 EI RP (d Bm ) (a) An examplatory wide beam. -90 -60 -30 0 30 60 90 ϕ (deg) 0 30 60 90 120 150 180 θ (d eg ) Epower 34 36 38 40 42 44 EI RP (d Bm ) (b) The main lobe of the examplatory wide beam. Figure 3.11: An example of the radiation pattern of a wide beam and its main lobe. The main lobe loss is then defined as LM := 1 |Fmain| ∑ θ,φ∈Fmain ([UGA (θ, φ)]dBm − [UR(θ, φ)]dBm)2 , where UGA (θ, φ) is the radiation pattern of the antenna array and UR(θ, φ) is the radiation pattern of the reference field. 3.6.3 Side Lobe Loss A side lobe is an unwanted peak in the radiation pattern of an antenna array. They are unwanted because they leak energy in undesirable directions, which wastes en- ergy, and can also interfere with other antennas. The side lobe loss is a measure of how much the side lobes exceed the side lobe threshold, Umax − 12 [dBm]. If no side lobes exceed the threshold the loss is zero. The side lobe loss is illustrated in cross- section in Figure 3.12. The total side lobe loss is the volume between the optimized beam and the side lobe threshold. Note that the figure is in a logarithmic scale and that the side and main lobe thresholds are lower than they might appear in the figure. Similar to the main lobe loss, this is not calculated for all indices, only the side lobe area Fside. We define this set as the complement set of F ′ main, which is calculated in 18 3. Antenna Arrays almost the same way as Fmain in Section 3.6.2. The only difference is that we instead use Umax − 12 [dBm] as the threshold. For more details, see the implementation in Section 5.3.2. −90 −60 −30 0 30 60 90 ϕ (deg) 15 20 25 30 35 40 45 EI RP (d B ) Cut at θ=100 Opti i)ed cur(e Reference cur(e Side lobe threshold Main lobe threshold Figure 3.12: Cut of an example beam. The pink line is the reference field in the main lobe area and the blue line is the optimized field. The black and grey dotted lines are the side lobe and main lobe thresholds, respectively. The red area represents the side lobe loss. The side lobe loss is then defined as LS := ∑ θ,φ∈Fside max{0, [UGA (θ, φ)]dBm, − [Umax − 12]dBm}. Note that since we are working in a logarithmic scale, the side lobe threshold, Umax− 12 [dBm], would be approximately 1 16Umax in a linear scale. 19 3. Antenna Arrays 20 4 Optimization In the field of optimization, one studies methods for finding the optimum of a func- tion. The optimum is the best possible value, often defined as the point within a set search space where the function achieves either its maximum or minimum value. There are many different approaches to solving optimization problems, the oldest and most well-studied branch being classical optimization [9, pp. 1–2]. However, the main focus of this thesis will be methods from a separate branch, namely stochas- tic optimization. In this chapter, we begin with a brief introduction to these two branches, before detailing the specific algorithms that will be used in this thesis. 4.1 Classical Optimization Classical optimization methods are deterministic algorithms for solving optimization problems where the objective function can be written as a well-defined mathematical function. Most classical optimization algorithms are poorly suited to non-convex problems, and problems where the objective function is highly non-linear or of high dimension. For problems of these types, it is often prudent to turn to other branches of optimization, such as stochastic optimization [9, pp. 1–2, 33]. 4.2 Stochastic Optimization Stochastic optimization methods are, in contrast to classical optimization methods, never deterministic. They have elements of randomness in them, though this does not mean they are completely without rhyme or reason. Many stochastic optimiza- tion methods are inspired by phenomena in nature, such as foraging behaviour of ant colonies, swarms of animals, and evolution. From these phenomena, principles such as cooperation and adaption, which can be regarded as a type of optimization, have been taken and converted to mathematical equations to be used in optimiza- tion. Two stochastic methods inspired by nature are genetic algorithms (GA) and particle swarm optimization (PSO) [9, pp. 2–4]. 4.2.1 Genetic Algorithms Genetic algorithms (GA) are a type of evolutionary algorithms, which are stochastic optimization methods inspired by the biological phenomena of evolution as described by Charles Darwin [9, p. 35]. Similar to how a gradual change of a population over generations can increase individuals’ chances to survive and reproduce, i.e. increase 21 4. Optimization their fitness, we want to get closer and closer to an optimum over the iterations of an optimization algorithm. Figure 4.1: Visualization of a generation of GA with its main operations. The blue ovals are populations, and the smaller circles are individuals. The first step of genetic algorithms is to define an encoding of solutions to chro- mosomes. Next, a population is created, which consists of several chromosomes. Then, a fitness function is defined. Its purpose is to evaluate each chromosome. The resulting value is referred to as a chromosome’s fitness. A common way to define the fitness function is F (x) = 1 L (x) , where L (x) is an objective function which we want to minimize. The main part of genetic algorithms consists of manip- ulating chromosomes to increase their fitness over generations. A new generation is created from the previous generation using three operations. The first operation is called crossover, which creates new chromosomes using parts of current chromo- somes. Secondly, there is mutation which makes stochastic changes to elements of the chromosomes, also known as genes. Lastly, there is elitism, which selects chro- mosomes for the new generation based on their fitness. Chromosomes with higher fitness are more likely to be selected. There are many ways the three operations; crossover, mutation, and elitism, can be implemented. For this project we used the operations as defined by M L Shahab, F Azizi, B A Sanjoyo, et al. [10] which are modified to work well in high dimensions. Algorithm 1 is a pesudocode for this genetic algorithm, and Figure 4.1 presents a simple representation of a generation of GA. The operations are described in more detail in the following sections. 22 4. Optimization Algorithm 1 Genetic Algorithm for High Dimensional Problems Define a chromosome representing the solution Define fitness function F (x) Generate an initial population consisting of m chromosomes Initialize the maximum number of generations n for t = 1,2...,n do for i = 1,2...,m do Generate child by Crossover Update obtained child by Mutation end for Elitism end for Crossover The first step of crossover is selecting two chromosomes from the population. The method used is tournament selection with size n and tournament probability ptour. Thus, n random chromosomes are chosen from the population, and the one with the highest fitness is chosen with probability ptour. If it is not chosen it is discarded, and the same process is repeated recursively until one individual is chosen or there is only one left. In that case, the last remaining one is chosen. This process is executed twice, so that two individuals are chosen in total. Genes from the two chosen chromosomes are then mixed as shown in Algorithm 2 to create a new chromosome. The new chromosome always has the same or higher fitness than its parent chromosomes due to the construction of the algorithm. Algorithm 2 Crossover x← Chromosome of length N chosen via tournament selection y ← Chromosome of length N chosen via tournament selection xc ← Copy of x for i = 1, 2, ..., N do x(1) ← xc x(2) ← xc with value at index i set to yi x(3) ← xc with value at index i set to yi+xi 2 j ← argmaxj∈{1,2,3}{F (x(j))} xc ← x(j) end for return xc Mutation We perform mutation on the chromosomes returned by crossover with mutation probability set to one, which means each index is mutated. The process of mutation is described in Algorithm 3. The mutated chromosome is always better than the 23 4. Optimization original. Mutation is important since it contributes to chromosomes moving out of local optima of the fitness function and adds genetic diversity to the population. Algorithm 3 Mutation n← The number of the current generation x← Chromosome of length N created using Crossover xc ← Copy of x for i = 1, 2, ..., N do r ← random value sampled from uniform distribution on interval [− 1 2n , 1 2n ] x(1) ← xc x(2) ← xc with value at index i set to xi + r x(3) ← xc with value at index i set to xi − r j ← argmaxj∈{1,2,3}{F (x(j))} xc ← x(j) end for return xc Elitism The version of elitism used in this project is called "Elitism replacement with fil- tration" and was introduced by M L Shahab, F Azizi, B A Sanjoyo, et al. [10]. It chooses the next generation as the m chromosomes with the highest fitness out of both the last generation and the m newly generated chromosomes. However, we first remove duplicate chromosomes and generate random chromosomes if there are in total less than m chromosomes left. This is to counteract potential “inbreeding”, i.e., if the chromosomes become too similar, by removing duplicates and introducing new genes. 4.2.2 Particle Swarm Optimization We also investigated particle swarm optimization (PSO). PSO is a type of stochastic optimization algorithm that mimics the behaviour of a swarm of animals or insects. Similar to the population in GA, PSO has a group of possible solutions, which we refer to as a swarm. Each solution in the swarm is referred to as a particle. The particles in a swarm are able to share information about where they have found the lowest loss values in the search space. The loss of a particle x is calculated using the objective function f(x). Each particle stores information about the position with the lowest loss that it has found, as well as the position with the lowest loss that any member of the swarm has found. This is visualized in Figure 4.2. 24 4. Optimization Figure 4.2: Visualization of a swarm of particles and their velocities. The blue star represents the current best position of the blue particle and the pink star represents the current best position of the entire swarm. The first step in the algorithm is to initialize the swarm with all its particles. Each particle is initialized at a random position in the search space, with a random ve- locity. The search space is assumed to be a hyperrectangle defined by the maximal and minimal values xj,max and xj,min for each dimension j. In each iteration, each particle i updates its position xi based on the position of their personal best xpb i , and the position of the swarm best xsb. How much the particles steers toward these points is determined by two positive constants: the cognition factor c1 which affects how much the particles move toward their personal bests, and the social factor c2 which affects how much the particles steer toward the swarm best. Next, the velocity vi is checked to ensure that it does not exceed a set speed limit vmax. If it does, the vector is normalized to be at the speed limit. Then the position of the particle is calculated from the current position of the particle and its updated velocity together with the time step ∆t. This new position is then evaluated, and if the loss is lower than its personal best, then the particle’s personal best is updated. If the loss is lower than the swarm best, then the swarm best is updated as well. A common version of PSO is PSO with inertia, where inertia ω is a nonnegative constant that affects how much of the previous velocity is kept in the velocity up- date. The pseudocode for this PSO variant is presented in Algorithm 4. 25 4. Optimization Algorithm 4 Particle Swarm Optimization with Inertia Initialize particles with random positions and velocities xij ← xj,min + U [0, 1](xj,max − xj,min), i = 1, ..., n, j = 1, ..., n vij ← α ∆t (−xj,max−xj,min 2 + U [0, 1](xj,max − xj,min)), i = 1, ..., n, j = 1, ..., n for k = 1, 2, ..., K do Evaluate each particle and update their personal best and the swarm best xpb i ← xi if f(xi) < f(xpb i ), i = 1, ..., n xsb ← xi if f(xi) < f(xsb), i = 1, ..., n Update each particle’s velocity vij ← ωvij + c1r1j ( xpb ij −xij ∆t ) + c2r2j ( xsb j −xij ∆t ) , i = 1, ..., n, j = 1, ..., m Restrict the velocities to the speed limit vi ← vmax |vi| vi if |vi| > vmax, i = 1, ..., n Update each particle’s position xi ← xi + vi∆t, i = 1, ..., n end for return xsb Here, r1 and r2 are random numbers between 0 and 1. 4.3 Bayesian Optimization Bayesian optimization is a non-parametric optimization method suitable for prob- lems where the objective function acts as a black-box function that is very expensive to evaluate. It attempts to find the point that minimizes the objective function using as few evaluations as possible by strategically choosing the next point to evaluate in each iteration [11]. Here we present a brief overview of the optimization method and introduce pseudocode, before discussing the main elements of the method in more detail. The method uses a Gaussian process as a surrogate model for the objective function. The surrogate model attempts to model the behaviour of the objective function as accurately as possible. The method also uses an acquisition function. For each point in the domain, the acquisition function estimates the value gained from sampling the objective function in that point. This value is usually defined in terms of the expected amount of information gained and the value of the surrogate function in that point. In each iteration, the acquisition function is maximized to find the op- timal point to sample in. Next, the objective function is evaluated in the chosen point, and the result is used to update the surrogate model using Gaussian process regression [11]. Algorithm 5 summarizes Bayesian optimization in pseudocode as presented by Frazier [11]. 26 4. Optimization Algorithm 5 Bayesian Optimization Place a Gaussian prior on f Observe f at n0 points according to an initial space-filling experimental design. Set n = n0 while n ≤ N do Update the posterior probability distribution on f using all available data. Let xn be a maximizer of the acquisition function over x, where the acquisition function is computed using the current posterior distribution. Observe yn = f(xn). Increment n end while Return a solution: either the point evaluated with the largest f(x), or the point with the largest posterior mean. Figure 4.3 illustrates an example of how one iteration of the algorithm affects the surrogate model and the acquisition function. Observe how an additional observa- tion is added in the point that maximized the acquisition function and that this reduces the uncertainty in the surrogate model and improves the estimated mean so that it is now closer to the true value. Note that this is a toy example and that in most cases we would not have access to the true shape of the function or any of its derivatives when performing Bayesian optimization. 4.3.1 Gaussian Process Regression A Gaussian process is defined as a collection of random variables, of which any finite collection has a multivariate Gaussian distribution. Each of these random variables is tied to a point in some domain D, often time or space. The Gaussian process is completely defined by its mean function µ(x) and its covariance function Σ(x, x′). We often choose µ0(x) = 0 for all x ∈ D as our prior. The prior covariance ma- trix Σ0(x, x’) is chosen to reflect our beliefs about the covariance between different points and is sometimes referred to as a kernel function [12]. The kernel function is chosen to encode our beliefs about the correlation between different points in the domain based on the distance between them. A common assumption is that points that are closer should have a higher correlation. This would mean that when we evaluate the objective function in a point x′ and use this to update the surrogate model, points that are close to x′ in the domain are affected more by the update. If we know that the function is periodic with a known period λ, then we can choose a kernel function that represents this, so that when we update the prior in a point x′, the prior is most affected in the points x′ +nλ for any n ∈ Z. 27 4. Optimization (a) Surrogate model with the true un- known model (top) and acquisition function (bottom) after updating the prior with 6 random points. (b) Surrogate model with the true unknown model (top) and acquisition function (bottom) after 1 iteration of Bayesian optimization. Figure 4.3: The mean of the surrogate function is shown as a dashed green line, and the area within 1.96 standard deviations from the mean of the surrogate model is shown shaded in green. The mean of the true function is shown as a dashed red line, and all sampled points are shown as red dots. The acquisition function is shown in blue, with a blue dot denoting the next query point. We assume that the prior placed on the mean is µ0(x) = 0 and that σ0 is constant over D. Then, given a kernel function Σ0(x1, x2) and a set of new data points (x1:n, y1:n), where x1:n denotes a set of n points and y1:n denotes the corresponding objective function values, we update the mean and variance functions according to Rasmussen and Williams [12] in the following way: µn(x) = Σ0(x, x1:n) ( Σ0(x1:n, x1:n) + σ2 0I )−1 f(x1:n) σ2 n(x) = Σ0(x, x)− Σ0(x, x1:n) ( Σ0(x1:n, x1:n) + σ2 0I )−1 Σ0(x1:n, x). 4.3.2 Acquisition Function The acquisition function usually reflects some combination of the uncertainty in the surrogate model and the expected mean. In Figure 4.3 we use expected improvement 28 4. Optimization (EI) as our acquisition function. The improvement from sampling at a point is 0 if the objective value y′ is higher than or equal to the lowest value we have found so far, ymin, and the improvement is ymin − y′ if y′ < ymin. Since we cannot know what the improvement will be at any point without sampling there, we instead calculate the expected improvement. Therefore, the expected improvement prioritizes points where the surrogate model has a low mean but a high variance, since points with high variance carry more uncertainty and could potentially be much better than the expectation [11]. 29 4. Optimization 30 5 Method In this chapter, we specify the goals of this thesis in more detail. We then proceed to give an overview of the algorithms and detail the design choices made during the development of the central beam weight optimization methods of this thesis. 5.1 Specification of the Issue Being Investigated The main objective of this thesis is to create an algorithm for optimizing beam weights such that the corresponding radiation pattern minimizes the objective func- tion, which we defined as a linear combination of the main lobe loss and the side lobe loss. The algorithm should take as input; the embedded subarray patterns of the antenna for which the beam weights should be optimized, the corresponding traffic beam beam weights, and the indices of the traffic beams to be used in the construction of the reference field. The reference field is a theoretical beam that is constructed from traffic beams as described in Section 3.1. The total list of desiderata for the final algorithm is stated in more detail here. The algorithm should: • Consistently produce beam weights that correspond to a radiation pattern with desired metric values with respect to a given reference field. – The metrics used to evaluate beam weights are main lobe loss, side lobe loss, and taper loss, as introduced in Section 3.6. – The desired metric levels are based on the results of the current method. • Run and converge in a shorter time than the current methods. – Current methods need approximately 1-2 hours per beam weight opti- mization. – The algorithm should also have a reasonable time complexity with regard to the dimension of the problem. • Produce results that are robust against small variations resulting from the manufacturing of antenna and measurement components. – These variations may include noise in the ESAPs and rounding errors for beam weights applied in real situations. • Optimize beam weights using measured and simulated realistic ESAPs. – This is in contrast to current methods which are only able to handle ideal ESAPs. 31 5. Method 5.2 Algorithm Overview In Figure 5.1 a flowchart illustrates the most important steps of the algorithm be- tween the input data (ellipses) and the output (circle). Figure 5.1: A flowchart of the method from data and user inputs to optimized beam weights. The dotted line only applies when GA is the algorithm used in the optimization step and represents the use of traffic beam BWs as warm starts. The ESAPs and traffic beam BWs are used to calculate the radiation pattern of the traffic beams. The beam mapping is a list of the indices of the traffic beams to be used in the construction of the reference field. The traffic beam radiation patterns and the beam mapping are used to calculate the reference field, as described in Sec- tion 3.1. The ESAPs and the reference field are then passed to the optimizer, which returns a set of optimized beam weights. The optimizer minimizes the objective function, which we define in Section 5.3. 5.3 Design of Objective Function The objective function evaluates the quality of a set of beam weights and is com- posed of two components, the side lobe loss and the main lobe loss, which are both calculated using a chosen reference field. Originally, the objective function consisted of the three metrics introduced in Section 3.6 which are commonly used at Ericsson for beamforming, for example in the master thesis written at Ericsson in 2024 by Frilund and Olofsson [1]. However, due to a difference in the choice of scale compared to Frilund and Olofsson [1], the taper loss component was rendered unnecessary and could be omitted, see Section 5.3.1. Thus, the final objective function L (bw), is a weighted sum of the main lobe loss and the side lobe loss, L (bw) = α1LM(bw) + α2LS(bw), where LM(bw) is the main lobe loss and LS(bw) is the side lobe loss of the beam weights bw, and α1 and α2 are positive constants. The exact values of α1 and α2 32 5. Method are omitted from this report for confidentiality reasons. L (bw) is henceforth what we refer to as the total loss. In the case of GA, the fitness function is defined as F (bw) = 1 L (bw) . 5.3.1 Taper Loss When working in a scale that is relative to the used energy, such as realized gain, it is necessary to have some separate measure of the amount of radiated energy. For this reason, previous algorithms, such as the one by Frilund and Olofsson [1], include taper loss as part of the objective function. However, our algorithms perform the optimization in EIRP scale which is an absolute scale that also measures the radiated energy, and thus the taper loss becomes a superfluous component of the objective function, and we can remove it. 5.3.2 Main and Side Lobe Areas The definitions of the main and side lobe areas mentioned in Section 3.6 have the restriction that the area must be connected, as without that restriction a side lobe of the reference field with a high amplitude might be considered part of the main lobe area. Earlier implementations, such as Frilund and Olofsson [1], did not impose such restrictions, and thus we had to devise the following method to find those areas. We assume that the radiation intensity of the antenna array, UGA (θ, φ), is measured at the points (θi, φj), i = 1, .., nθ, j = 1, .., nφ, where nθ and nφ is the size of the set of θ and φ values, respectively. Let M be a matrix of size nθ × nφ, where each index corresponds to a point at which UGA (θ, φ) is measured. We define Mij = 1, if UGA (θi, φj) > Umax − 10, 0, UGA (θi, φj) ≤ Umax − 10, (5.1) where Umax is the peak of the radiation pattern of the reference field. In other words,Mij is set to one if the corresponding point in UGA (θi, φj) is larger than the main lobe threshold and to zero otherwise. Therefore, there is a large connected area of ones around the index corresponding to Umax, i.e. where our main lobe is, and potentially other smaller areas of ones as well. We now want to remove all smaller connected areas such that only the one corresponding to the main lobe is left. Convert M to a graph where Mij is a neighbour to all values in {Mi+1,j,Mi−1,j,Mi,j+1,Mi,j−1} that exist. If (i, j) is an index at the edge of M then some of the four neighbours won’t exist. On this graph we perform a modified breadth first search starting in the index which corresponds to Umax. The breadth first search only cares about the indices which have value one, and ignores indices with value zero, and thus we never explore outside the main lobe. The process of finding the side lobe area is similar but not identical. Each element of M is set to one if the corresponding point of the radiation pattern is instead larger than the side lobe threshold, Umax−12. The breadth first search is performed 33 5. Method the same as for the main lobe area, and then the side lobe area can be found as the complement set of the chosen indices. 5.4 Optimization Algorithms Many optimization algorithms were considered for this project, but the ones chosen were genetic algorithms and particle swarm optimization. We mainly investigated stochastic optimization methods as they generally perform well for problems with complex objective functions, and there are a plethora of methods with different strengths and weaknesses that can be utilized. One reason we chose GA and PSO specifically was due to them being successfully applied to similar problems such as by Boeringer and Werner [13] who used both PSO and GA for optimizating beam weights and Duzel, Saoud, Shayea, et al. [14] who made a comprehensive review of the use of GA to optimize the phase of beam weights. The version of GA we have used was designed by M L Shahab, F Azizi, B A Sanjoyo, et al. [10] to perform better in high dimensions, which is of import since our thesis concerns specifically large antenna arrays. 5.5 Search Space The search space of an optimization problem is the subset of all feasible solutions that we allow the optimizer to explore. Therefore, the dimension of our search space is the number of parameters that we need to optimize, which depends on several factors. These include the size of the antenna, the rank of the beam weights, which we discuss in Section 5.5.1, and whether we are performing single- or dual polarized beamforming. For full rank dual polarized beamforming we need to optimize two m × n matrices of complex beam weights. Furthermore, we regard the phase and amplitude of each beam weight as separate variables during optimization, and there- fore, have a total of 4 ·m · n values to optimize. For single-polarized beamforming the dimension instead becomes 2 ·m · n. Another option is to represent them as the real and imaginary part, however, that would have complicated certain calculations, as we wanted the option to constrain the amplitudes, and in certain cases keep the amplitudes fixed during optimization. 5.5.1 Low-Rank Restrictions Let A be an m × n matrix. If A has exactly r linearly independent columns, then A is said to be of rank r. This means that it is possible to express A as A = BC (5.2) where B is an m× r matrix and C is an r × n matrix. This implies that A, which has mn variables, can be expressed as the matrix product of B and C, which have a total of r(m + n) variables. In the case where r is small, this can be a significant 34 5. Method reduction. If r = min{m, n}, we say that A is of full rank. We reduce the dimension of the problem by restricting each beam weight matrix to be of low rank. For each beam weight matrix BW this is accomplished by optimizing two matrices, BW1 and BW2, of sizes m× r and r × n respectively. We assume BW = BW1BW2, (5.3) i.e., that BW is of rank r. BW1 and BW2 are complex valued matrices, but as explained in Section 5.5, we do not optimize the complex values, but rather the phases and amplitudes separately. Therefore, the values we optimize are arg BW1 and arg BW2, and the amplitudes |BW1| and |BW2|. This means that the num- ber of optimized parameters is 2r(m + n) for each beam weight matrix. Thus, the dimension of the search space is 2r(m + n) for single-polarized beamforming and 4r(m + n) for dual-polarized beamforming. For large antenna arrays, this significantly reduces the number of variables to op- timize, and thereby speeds up the optimization algorithms. Previous work done at Ericsson has used beam weights of low rank, and the initial tests of lower ranks yielded positive results. To limit the scope of the project, we restricted ourselves to only investigating rank 1 solutions in depth, and most decisions were made based on results for the rank 1 case. 5.5.2 Initialization When generating random starting points for GA and PSO, we decided between two different methods. The alternatives were generating beam weights and encoding them to our search space or generating directly in the search space. The difference is that the generated beam weights must be encoded to the search space, which has low rank, and thus information will be lost in the encoding. We wanted to see how this affected the distribution when they are decoded compared to the values generated directly in the search space. We generated 20 sets of beam weights for an antenna array with 288 subarrays, i.e., 5760 beam weights. Their representations after they are encoded and decoded are shown in Figure 5.2a in a phase-amplitude space. We also generated the same number of beam weights directly in a rank 1 search space, and their representations after they are encoded are shown in Figure 5.2b. The values generated directly in the search space have a more spread out distribution compared to the generated beam weights, and thus we chose to generate directly in the search space, as the goal is to have the initial values cover as large area as possible to easier be able to find the optimum. We also want amplitudes to be close to the maximum allowed value in the final beam weights as this allows more of the available energy to be utilized. 35 5. Method 0.0 0.2 0.4 0.6 0.8 1.0 Amplitude −3 −2 −1 0 1 2 3 Ph as e Distribution of Generated Beam Weights (a) 5760 uniformly generated beam weights that have been encoded to rank 1 and then decoded. 0.0 0.2 0.4 0.6 0.8 1.0 Amplitude −3 −2 −1 0 1 2 3 Ph as e Distribution of Beam Weights Generated in Search Space (b) 5760 decoded beam weights gen- erated uniformly in the rank 1 search space. Figure 5.2: Comparison of two ways of initializing beam weights. Each point represents one beam weight. 5.6 t-Tests Paired t-tests are used to test if there is a significant difference in mean value between two sets of observations, X0 and X1, which are continuous and normally distributed. There are two kinds of t-tests which are applicable in different situations depending on the dependence between observations in X0 and X1 [15]. 5.6.1 Paired t-Tests Paired t-tests are used to compare the mean values of two sets of observations, X0 and X1, when we can pair each observation in X0 with one in X1 in such a way that for these two observations all controllable variables are identical except for the ones being tested. By calculating the mean difference of each pair, we obtain a more accurate measure of the effect of the changed variable and reduce the impact of other environmental variables and differences between subjects [15]. More specifically, paired t-tests are used to test the null hypothesis H0 : µ1−µ0 = 0 under the assumptions that X0 and X1 are drawn from two continuous normal dis- tributions, and each sample of X0 is paired with a sample of X1 in such a way that the pairs Xj = {X0j, X1j} j = 1, 2, ..., n are independent, but the data within each pair is correlated [15]. We calculate Xdj = X1j −X0j and Xd = ∑n j=1 Xdj n , and estimate the variance of Xd as S2 d = 1 n(n−1) ∑n j=1(Xdj −Xd)2. Thus, we can calculate t = Xd Sd , (5.4) 36 5. Method and the p-value is calculated via the student’s t-distribution with n − 1 degrees of freedom [15]. The exact calculation depends on the formulation of the alternative hypothesis, and the variants we use are presented in Section 5.6.3. 5.6.2 Two-Sample t-Tests Two-sample t-tests are used when all observations in both X0 and X1 are indepen- dent of each other. Aside from that, all other assumptions made for paired t-tests hold for two-sample t-tests as well [15]. We therefore calculate the estimated variance for both sets of observations, S2 0 and S2 1 , and if the true variances are equal we can estimate the variance of Xd as S2 d = ( 1 n0 + 1 n1 ) (n0−1)S2 0+(n1−1)S2 1 n0+n1−2 , where n0 and n1 are the number of observations in X0 and X1. Thus, we calculate the test statistic t = Xd Sd , (5.5) and the p-value is calculated via the student’s t-distribution with n0 +n1−2 degrees of freedom [15]. The exact calculation depends on the formulation of the alternative hypothesis, and the variants we use are presented in the next section. 5.6.3 Use of t-Tests We use t-tests to compare the distribution of the total loss for optimized beam weights for two methods of optimization. The observations in X0 are the total loss of beam weights optimized using the standard or current method, and they are compared against the observations in x1 from the alternative method. Using an alternative hypothesis H1 we determine whether the alternative method results in a statistically significant change in the total loss. All p-values are also presented in Table A.1 in Appendix A or Table B.4 in Appendix B depending on which chapter they were calculated in. We use two different alternative hypotheses H1. H1 : µ0 > µ1 is mostly used in Chapter 5 when we investigate alternatives to a standard method. This is because in those cases we are only interested in if the alternative method results in a de- creased loss. This is a lower-tailed test and the p-value is calculated as P (T < t), where T is the student’s t-distribution with degree’s of freedom as specified above, and t is the test statistic. H1 : µ0 ̸= µ1 is used mainly in Chapter 6 when we investigate differences between different cases of using our optimization algorithms. In those cases, it is also interesting if a case causes the loss to significantly increase. This case is two-tailed and therefore the p-value is calculated p = 2 · P (T > |t|). We always discard the null hypothesis in favour of the alternative hypothesis if the p-value is less than 0.05. 37 5. Method The results of paired t-tests are represented as boxplots of Xdj and above the boxes, we use asterisks to represent the p-values where the number of asterisks n signify a p-value less than 10−n, with a maximum of three asterisks. In some cases there are separate t-tests for each loss component as well as the total loss, i.e., side lobe loss plus main lobe loss, but it is the p-values for the total loss that are the deciding factor for which method we choose to use. The default settings for the optimization used in the following tests were rank 1 single-polarized beamforming for an antenna array with 288 subarrays. 5.6.4 Side Lobe Threshold One alternative version of calculating the side lobe loss was tested. Instead of defin- ing the side lobe area using the side lobe threshold Umax − 12 [dBm], it was defined using the threshold U ′ max − 12 [dBm], where U ′ max is the maximum value of the ra- diation pattern corresponding to the optimized beam weights and not the reference field. The reason behind this alternative method was that even if the optimized beam did not reach the same amplitudes as the reference field, we could ensure that the side lobes were still suppressed below −12 [dBm] relative to the optimized curve. However, if the optimization managed to reach the same amplitudes levels as the reference field, this change should be no different from the current method. We compared the two methods for defining the side lobe threshold using a paired t-test. The current method uses a side lobe threshold relative to the reference field, and the alternative method is relative to the optimized field. We correlated the pairs {X0j, X1j} j = 1, 2, ..., 20 by using the same initial values for the optimization. The results of the paired t-test are presented in Appendix A and neither GA nor PSO showed a significant decrease in total loss when the side lobe areas were calculated relative to the optimized curve, as opposed to relative to the reference field. An important detail is that the total loss for both tests was evaluated using the same objective function, where the side lobe area is relative to the reference field. This is because a comparison using two different objective functions would not give any meaningful results. An interesting detail is that taper loss showed a decrease for both PSO and GA, with p < 0.05. Since the side lobe defined relative to the optimized beam did not result in a better performance, we decided that it would not be used. 5.6.5 Constrained vs Unconstrained Optimization As previously mentioned, the amplitudes of the beam weights are constrained. The beam weights are therefore always normalized before being returned from the opti- mization to ensure this condition holds. However, we can choose to either constrain the amplitudes during optimization as well, never letting them exceed the maximal value, or let them be unconstrained and normalize at the end. We refer to these variants as constrained and unconstrained optimization, respectively. 38 5. Method For this paired t-test we used unconstrained optimization as the current method and constrained method as the alternative method for PSO as it is most commonly unconstrained. However, for GA the opposite is true, it is usually constrained, and thus we set the alternative method as unconstrained optimization for GA. We correlated the pairs {X0j, X1j} j = 1, 2, ..., 20 by using the same initial values for the optimization. The result of the test is shown in Appendix A and the difference was not significant for either PSO or GA, and so we chose to continue using unconstrained PSO and constrained GA. 5.6.6 Warm Starts When solving a problem, usually an optimization problem, solutions to nearby prob- lems can sometimes be used as a warm start to facilitate and speed up the opti- mization process [16]. In our case, we can use traffic beam BWs as warm starts to find BWs for wide beams. More specifically, the approach we tested was to use the traffic beam weights, which were used to create the reference fields. We compared the distance in the rank 1 search space between a given wide beam BW and the BWs of the traffic beams used to create the reference field. The Euclidean distance between these was shorter than the distance between the wide beam BWs and ran- domly generated starting points. We performed a paired t-test with random initialization as described in Section 5.5.2 as the current method and initialization where half the initial positions were random and half were traffic beams randomly selected from the traffic beams used to create the reference field as the alternative method. The pairs {X0j, X1j} j = 1, 2, ..., 20 were correlated by using the same values for half of the initial positions for the op- timization. The results of the test are shown in Appendix A. For GA, we saw that p < 0.05 and the null hypothesis was discarded in favour of the alternative hypothesis. However, for PSO, the null hypothesis could not be discarded. So, warm starts give a statis- tically significant decrease of the total loss for GA, but no significant change for PSO. Since the warm start method showed promise for GA, we decided to try adding noise onto the encoded traffic beam BWs. This was mostly to avoid GA getting stuck in local optima because too many of the initial values were too close to each other. For this purpose, we added Gaussian noise. For the amplitudes we subtracted the absolute value of the noise which had mean value zero and standard deviation α and for the phases we added noise with mean zero and standard deviation π · α for varying values of α. The noise levels we tried were α = {0.0, 0.1, 0.25, 0.5} where α = 0.0 means there is no noise, just the traffic beam weights. The results of the test are shown in Figure 5.3 and Appendix A and no tried value of α resulted in p < 0.05. We decided to let α be optimized by the hyperparameter tuning in Section 5.7 as the results of this test were inconclusive. If no noise is the best solution, then the Bayesian optimization should hopefully set α to zero. 39 5. Method α = 0.1 α = 0.25 α = 0.5 −1.0 −0.5 0.0 0.5 1.0 1.5 To ta l L os s 1e−3 Pairwise Difference of Total Loss for GA Relative α = 0.0 Figure 5.3: Result of paired t-test for GA for initial positions that are half random and half traffic beam weights. α is the level of noise added unto the traffic beams. 5.7 Hyperparameter Tuning PSO has 6 hyperparameters that need to be chosen by the user. To perform a grid search with 3 different values for each hyperparameter would require 36 runs of the algorithm. PSO takes approximately 10 minutes per run, which would mean that the total run time would be about 10 · 36 minutes, which is equal to approximately 122 hours. However, testing only three values per hyperparameter is not a very thorough search, but if we want to test 10 values per hyperparameter, the runtime becomes 19 years. This is clearly an infeasible solution. Instead, we used Bayesian optimization to tune the hyperparameters for GA and PSO, which gave us preciser results and more manageable runtimes. We used Bayesian Optimization to select all hyperparameters for each of the meth- ods, choosing the set of hyperparameter values that minimized the objective function for each method. For GA, this included the standard deviation α of the noise added to the traffic beams in the warm starts, as described in section 5.6.6. We used µ0 = 0 and σ0 constant for the prior. For each hyperparameter tuned, we allowed the algorithm to test values within the feasible interval for that hyperparameter. For hyperparameters where the feasible interval is infinite, we limited the interval to a subset of the feasible interval which we deemed reasonable. In one instance, we found that one hyperparameter was chosen that was at the very edge of the interval we had chosen. We then extended that interval and ran the optimization again to prevent confining hyperparameter values to smaller intervals than necessary. We tuned the hyperparameters specifically for rank 1 single-polarized beamform- ing for an antenna array with 288 subarrays. Despite using Bayesian optimization, hyperparameter tuning was a lengthy process, and we performed it only for these default settings. 40 5. Method The chosen hyperparameter values are omitted from this report for confidentiality reasons. 5.8 Extension to Dual-Polarized Beamforming In all previous tests, we have handled only the single-polarized case. However, we extended our models to work for dual-polarized beamforming as well through minor changes to certain key functions. We chose to make no other changes to the algorithms, and used the same optimization algorithms and objective function as for single-polarized beamforming. The cases are similar and running all tests again for dual-polarized beamforming would have been too time-consuming. 41 5. Method 42 6 Results In this chapter, we present the results from a number of tests that were run to investigate the behaviour of our developed optimization methods. We begin with results that show the general performance of the algorithms, which we then compare to the method currently in use at Ericsson. We subsequently study the behaviour of the methods when the dimension of the problem varies, before investigating the stability of the results when exposed to varying levels of noise. Finally, we in- vestigate the use of warm starts to expedite the optimization process, and present the time taken to optimize all necessary wide beams for an exemplary antenna array. The results presented in this section were all produced using the same reference field unless explicitly stated otherwise. This is to reduce the number of external factors since we are mainly interested in comparing the algorithms to each other. The data used was simulated using Ansys HFSS to represent the ESAPs of an antenna array with 288 subarrays. When we refer to GA and PSO in this chapter we mean the optimization algorithms as detailed in Chapter 5. Furthermore, when a run of either GA or PSO is referenced it will, if nothing else is stated, be single-polarized beamforming with rank 1 search space and run for 20 or 6000 iterations, respectively. These numbers were chosen to produce similar run times to enable a fair comparison between the algorithms. In some cases the results were run until convergence, meaning that we allowed the algorithm to run until the convergence criterion was fulfilled, but for a maximum of 60 or 8000 iterations for GA and PSO, respectively. These numbers were chosen to ensure that each run stopped eventually, while still allowing most runs to fulfill the convergence criterion. The convergence criterion is satisfied if the total loss improves by less than 10−5 for 2 consecutive iterations of GA, or for 1000 consecutive iterations for PSO. All tests were performed with an NVIDIA A40-48Q. 43 6. Results 6.1 Performance We ran GA and PSO 20 times each to investigate their performance and present the results here. 6.1.1 Example Beams Figure 6.1 shows example beams from the results of GA and PSO. They were chosen as the beams with loss values closest to the median. The main lobes of the resulting radiation patterns are shown along with the main lobe of the used reference field. In Figure 6.2 we show cuts in elevation and azimuth of the same beam to allow for a closer comparison of the beams and the reference beam. The results of the two algorithms are similar enough in performance that it is difficult to decide which algorithm is better from a purely visual inspection. To facilitate a comparison, we present the mean of each loss component and the total loss in Table 6.1, as well as the mean runtime. -70 -40 ϕ (deg) 80 110θ (d eg ) Optimized Pattern -70 -40 ϕ (deg) 80 110θ (d eg ) Reference Pattern 36 38 40 42 44 EI RP (d Bm ) (a) The main lobe of the beam opti- mized by GA and the reference beam. -70 -40 ϕ (deg) 80 110θ (d eg ) Optimized Pattern -70 -40 ϕ (deg) 80 110θ (d eg ) Reference Pattern 36 38 40 42 44 EI RP (d Bm ) (b) The main lobe of the beam op- timized by PSO and the reference beam. Figure 6.1: Comparisons of main lobes of beams optimized by GA and PSO and the reference beam. 44 6. Results −90 −60 −30 0 30 60 90 ϕ (deg) 0 10 20 30 40 EI RP (d Bm ) Cut at θ=100 Optimi(ed curve Refere ce curve Side lobe threshold Mai lobe threshold (a) Cut in azimuth of beam optimized by GA and reference beam. 0 30 60 90 120 150 180 θ (deg) −20 −10 0 10 20 30 40 EI RP (d Bm ) C)( a( ϕ= -45 Optimized curve Reference curve Side lobe threshold Main lobe threshold (b) Cut in elevation of beam opti- mized by GA and reference beam. −90 −60 −30 0 30 60 90 ϕ (deg) 0 10 20 30 40 EI RP (d Bm ) Cut at θ=100 Optimi(ed curve Refere ce curve Side lobe threshold Mai lobe threshold (c) Cut in azimuth of beam optimized by PSO and reference beam. 0 30 60 90 120 150 180 θ (deg) −20 −10 0 10 20 30 40 EI RP (d Bm ) C)( a( ϕ= -45 Optimized curve Reference curve Side lobe threshold Main lobe threshold (d) Cut in elevation of beam opti- mized by PSO and reference beam. Figure 6.2: Comparisons between cuts of beams (blue lines) optimized by GA and PSO, and the reference beam (pink line). Main and side lobe thresholds are shown as well (grey and black line respectively). 6.1.2 Loss Distribution and Runtimes The run times were similar between GA and PSO, and the two algorithms produced results where the mean total loss is in the same order of magnitude. Interestingly, GA appears to be better at minimizing side lobes, while PSO appears to be better at optimizing the main lobe shape. Although similar, GA has a slightly lower total loss. Taper loss is omitted from this table, but can be found in Appendix B. Table 6.1: Mean loss and runtime over 20 runs of GA and PSO. Main lobe loss Side lobe loss Total loss Runtime GA 1.15 · 10−3 2.08 · 10−4 1.35 · 10−3 10 m 54 s PSO 2.63 · 10−4 1.58 · 10−3 1.85 · 10−3 11 m 16 s To further study the performance of GA and PSO we investigated the distribution of the final loss, i.e. the best loss in the final iteration, using the standard deviation 45 6. Results of the final loss in Table 6.2 and boxplots of the final loss in Figure 6.3. Similar to the mean values, the standard deviations of the total loss are of the same order of magnitude for GA and PSO. Main