Stochastic Methods for Beam Weight
Optimization for Large Antenna Arrays
A Time Efficient and Robust Method for Beam Weight
Optimization Using Non-ideal Data

Master’s thesis in Data Science and AI
Master’s thesis in Engineering Mathematics and Computational Science

MATILDA SELVARAJ TIVESTEN
JOHAN ÖDESJÖ

DEPARTMENT OF MATHEMATICAL SCIENCES

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2025
www.chalmers.se

www.chalmers.se


Master’s thesis 2025

Stochastic Methods for Beam Weight
Optimization for Large Antenna Arrays

A Time Efficient and Robust Method for Beam Weight
Optimization Using Non-ideal Data

MATILDA SELVARAJ TIVESTEN
JOHAN ÖDESJÖ

Department of Mathematical Sciences
Division of Applied Mathematics and Statistics

Chalmers University of Technology
Gothenburg, Sweden 2025


Stochastic Methods for Beam Weight Optimization for Large Antenna Arrays
A Time Efficient and Robust Method for Beam Weight Optimization Using Non-
ideal Data
MATILDA SELVARAJ TIVESTEN
JOHAN ÖDESJÖ

© MATILDA SELVARAJ TIVESTEN & JOHAN ÖDESJÖ, 2025.

Supervisor: Dr. Artem Roev, Ericsson AB
Examiner: Prof. Rebecka Jörnsten, Department of Mathematical Sciences

Master’s Thesis 2025
Department of Mathematical Sciences
Division of Applied Mathematics and Statistics
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: Radiation pattern of a reference field used during beam weight optimization.

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Printed by Chalmers Reproservice
Gothenburg, Sweden 2025

iv


Stochastic Methods for Beam Weight Optimization for Large Antenna Arrays
A Time Efficient and Robust Method for Beam Weight Optimization Using Non-
ideal Data
Matilda Selvaraj Tivesten, Johan Ödesjö
Department of Mathematical Sciences
Chalmers University of Technology

Abstract
Traditional methods for beam weight optimization suffer from poor convergence
and long run times for large antenna arrays and are limited to ideal antenna ele-
ment models. To overcome these limitations, more efficient optimization algorithms
are needed. Previous research has demonstrated that stochastic approaches such as
genetic algorithms and particle swarm optimization are promising alternatives for
solving this problem. This thesis investigates the tailoring of these stochastic opti-
mization algorithms for beam weight optimization using high-dimensional, realistic,
simulated data.

We propose the use of an absolute radiation pattern scale as opposed to a rela-
tive one, which allows us to simplify the objective function and promote a more
efficient energy usage. We decrease runtimes by restricting the optimization to a
low-rank search space, implementing warm starts, and leveraging GPU acceleration.

We find that these tailored methods show robust performance and converge signif-
icantly faster compared to current methods. The resulting beams also consistently
outperform those currently in use. We demonstrate the effectiveness of our approach
on antenna arrays of up to 1,152 subarrays, where antennas of 288 subarrays or more
are considered large.

Keywords: antenna arrays, beamforming, advanced antenna systems, stochastic
optimization, genetic algorithms, particle swarm optimization.

v


Acknowledgements

This thesis would not have been possible without the guidance, support, and en-
couragement from many different people.

Thank you to our academic supervisor, Prof. Rebecka Jörnsten, for being a leading
light in the worlds of statistics and optimization.

Thank you to our Ericsson supervisor, Dr. Artem Roev, for sharing your expertise
on antenna arrays.

Thank you to Ericsson for hosting us, to our line manager Dr. Bhushan Billade for
having us, and to Håkan Karlsson for answering all of our questions, even the bad
ones.

Thank you to our proofreaders Alexander Mayer, Dr. Helena Ödesjö, Ulf Ödesjö,
and Alma Cavallin for your time and feedback.

Thank you to Bella for the endless coffee and enthusiasm.

Matilda Selvaraj Tivesten & Johan Ödesjö, Gothenburg, June 2025

vii


List of Acronyms

Below is the list of acronyms that have been used throughout this thesis listed in
alphabetical order:

BW Beam Weights
CAR Coverage Angular Range
CM Current Method
EIRP Effective Isotropic Radiated Power
ESAP Embedded Subarray Patterns
EI Expected Improvement
GA Genetic Algorithms
GoB Grid of Beams
MSE Mean Square Error
PSO Particle Swarm Optimization

ix


Nomenclature

Below is the nomenclature of indices, sets, parameters, and variables that have been
used throughout this thesis.

Indeces

m Number of rows of antenna array
n Number of columns of antenna array

Coordinates

{ x, y, z} Cartesian coordinate system
{ θ, φ } Far field coordinate system, θ = π

2 , φ = 0 corresponds to propaga-
tion at normal incidence

{ r, θ, φ } Spherical coordinates

Antennas

E(r, θ, φ) Electrical field
G(θ, φ) Far field function
GA(θ, φ) Resulting far field function of an antenna array
U(θ, φ) Radiation intensity or Radiation pattern
D(θ, φ) Directivity
G(θ, φ) Gain
GR(θ, φ) Realized gain
EIRP(θ, φ) Effective Isotropic Radiated Power
PR Total radiated power
PO Total accepted power

xi


PM2 Power available from the matched transmission line feeding the an-
tenna

ESAPj(θ, φ) Embedded subarray pattern j

bwn Beam weight n

N Number of beam weights

Optimization

Fmain Set of angles (θ, φ) that defines the main lobe area
Fside Set of angles (θ, φ) that defines the side lobe area
L(bw) Objective function
F(bw) Fitness function

xii


Contents

List of Acronyms ix

Nomenclature xi

List of Figures xv

List of Tables xix

List of Algorithms xxi

1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Electromagnetic Field Theory 3
2.1 Azimuth and Elevation Coordinates . . . . . . . . . . . . . . . . . . . 3
2.2 The Far Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Directivity and Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 EIRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.6 Polarisation of Plane Waves . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Aperture Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Antenna Arrays 9
3.1 Beam Types and Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Antenna Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 The Superposition Principle . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Beamforming with Antenna Arrays . . . . . . . . . . . . . . . . . . . 15
3.5 Single- and Dual-Polarized Beamforming . . . . . . . . . . . . . . . . 16
3.6 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Optimization 21
4.1 Classical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Stochastic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Method 31
5.1 Specification of the Issue Being Investigated . . . . . . . . . . . . . . 31

xiii


Contents

5.2 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Design of Objective Function . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.6 t-Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.7 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.8 Extension to Dual-Polarized Beamforming . . . . . . . . . . . . . . . 41

6 Results 43
6.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2 Comparison to Current Method . . . . . . . . . . . . . . . . . . . . . 46
6.3 Dimensionality Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.4 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.5 Runtime in Real Application . . . . . . . . . . . . . . . . . . . . . . . 56

7 Discussion 59
7.1 Fulfilment of Desiderata . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2 Search Space Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

8 Conclusion 65
8.1 Fulfilment of the Desiderata . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 Further Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Bibliography 69

A Method Appendix I
A.1 Paired t-Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I

B Result Appendix XI
B.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI
B.2 Comparison to Current Method . . . . . . . . . . . . . . . . . . . . . XII
B.3 p-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIV

xiv


List of Figures

2.1 Coordinate system used for antennas, with a Cartesian coordinate
system for reference. The antenna array is situated in the yz-plane
and faces the positive x-direction. . . . . . . . . . . . . . . . . . . . . 3

2.2 In the far field region, the electrical field can approximated as a plane
wave. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.2 Wide beams are used to identify the optimal traffic beam with which
to serve a user. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Wide beams are used to identify the optimal traffic beam with which
to serve a user. The beams are arranged in a GoB that cover the CAR. 11

3.4 Creation of an envelope over a set of traffic beams. . . . . . . . . . . 11
3.5 One antenna element consisting of two antennas rotated 45◦ in oppo-

site directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.6 A 3 × 1-subarray consists of three antenna elements arranged into

three rows and one column. . . . . . . . . . . . . . . . . . . . . . . . 13
3.7 A 3 × 8 antenna array consists of 24 subarrays arranged into 3 rows

and 8 columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.8 Constructive and destructive interference between two vectors (green

and red arrows respectively). The black arrow represents the resulting
vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.9 Representation of how the electrical field is calculated from the ESAPs
and BWs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.10 Cut of an example beam. The pink line is the reference field in the
main lobe area and the blue line is the optimized field. The black
and grey dotted lines are the side lobe and main lobe thresholds,
respectively. The red area represents the main lobe loss. . . . . . . . 17

3.11 An example of the radiation pattern of a wide beam and its main lobe. 18
3.12 Cut of an example beam. The pink line is the reference field in the

main lobe area and the blue line is the optimized field. The black
and grey dotted lines are the side lobe and main lobe thresholds,
respectively. The red area represents the side lobe loss. . . . . . . . . 19

4.1 Visualization of a generation of GA with its main operations. The
blue ovals are populations, and the smaller circles are individuals. . . 22

4.2 Visualization of a swarm of particles and their velocities. The blue
star represents the current best position of the blue particle and the
pink star represents the current best position of the entire swarm. . . 25

xv


List of Figures

4.3 The mean of the surrogate function is shown as a dashed green line,
and the area within 1.96 standard deviations from the mean of the
surrogate model is shown shaded in green. The mean of the true
function is shown as a dashed red line, and all sampled points are
shown as red dots. The acquisition function is shown in blue, with a
blue dot denoting the next query point. . . . . . . . . . . . . . . . . . 28

5.1 A flowchart of the method from data and user inputs to optimized
beam weights. The dotted line only applies when GA is the algorithm
used in the optimization step and represents the use of traffic beam
BWs as warm starts. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2 Comparison of two ways of initializing beam weights. Each point
represents one beam weight. . . . . . . . . . . . . . . . . . . . . . . . 36

5.3 Result of paired t-test for GA for initial positions that are half random
and half traffic beam weights. α is the level of noise added unto the
traffic beams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.1 Comparisons of main lobes of beams optimized by GA and PSO and
the reference beam. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.2 Comparisons between cuts of beams (blue lines) optimized by GA
and PSO, and the reference beam (pink line). Main and side lobe
thresholds are shown as well (grey and black line respectively). . . . 45

6.3 Comparison of best loss of final iteration for different metrics for 20
runs of GA and PSO. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.4 A wide beam optimized by CM. In b) and c) the pattern optimized
by CM is blue and the radiation pattern of the reference field is pink.
The grey and black dashed lines represent the side and main lobe
thresholds respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.5 GoB of wide beams optimized by different methods and a GoB of the
corresponding traffic beams. . . . . . . . . . . . . . . . . . . . . . . . 48

6.6 Total loss for all beams in the GoB shown in Figure 6.5 and differ-
ence in total loss between CM and both GA and PSO. Each loss
corresponds to the wide beam in the same position in the GoB. . . . 49

6.7 Distributions of final loss of 20 runs of GA and PSO for search spaces
of rank 1/2, 1, and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.8 Comparison between time and loss for different algorithms for differ-
ent ranks. Results for GA are shown in blue, and results for PSO are
shown in pink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.9 Loss distributions GA and PSO for different ranks. Results for GA
are shown in blue, and results for PSO are shown in pink. . . . . . . . 52

6.10 Comparison between time and loss for different algorithms for differ-
ent array sizes. Results for GA are shown in blue, and results for
PSO are shown in pink. . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.11 Total loss for 20 runs of GA and PSO until convergence using single-
and dual-polarized beamforming. . . . . . . . . . . . . . . . . . . . . 54

xvi


List of Figures

6.12 Variation of total loss for 20 beam weights with Gaussian noise added
to either phase or amplitude. Red arrow indicates maximum allowed
error according to internal Ericsson standards. . . . . . . . . . . . . . 55

6.13 Comparisons of beams with noise with 3σP = 22◦ added to phases
and 3σA = 2dB added to amplitudes. Also a beam with no noise
added for reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.14 Beams marked 1, 2, and 3 can be approximately calculated using
beam weights of the beam marked X by mirroring its amplitudes
vertically, horizontally, or both. . . . . . . . . . . . . . . . . . . . . . 57

A.1 Results from paired t-test for optimization using genetic algorithm
and side lobe thresholds relative to optimized field vs reference field.
Both methods were ran 20 times. . . . . . . . . . . . . . . . . . . . . II

A.2 Results from paired t-test for particle swarm optimization and side
lobe thresholds relative to optimized field vs reference field. Both
methods were ran 20 times. . . . . . . . . . . . . . . . . . . . . . . . III

A.3 Results from paired t-test for optimization using constrained vs un-
constrained GA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV

A.4 Results from paired t-test for constrained vs unconstrained PSO. . . . V
A.5 Results from paired t-test for PSO with initial positions completely

random vs half random and half traffic beam beam weights. . . . . . VI
A.6 Results from paired t-test for GA with initial positions completely

random vs half random and half traffic beam beam weights. . . . . . VII
A.7 Result of paired t-test for GA for initial positions that are half random

and half traffic beam weights. α is the level of noise added unto the
traffic beams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII

B.1 Comparison of best taper loss of final iteration for 20 runs of GA and
PSO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI

B.2 Taper loss for all beams in the GoB shown in Figure 6.5, and difference
in total loss between CM and both GA and PSO. . . . . . . . . . . . XII

B.3 Main lobe loss for all beams in the GoB shown in Figure 6.5, and
difference in main lobe loss between CM and both GA and PSO. . . . XIII

B.4 Side lobe loss for all beams in the GoB shown in Figure 6.5, and
difference in main lobe loss between CM and both GA and PSO. . . . XIV

xvii


List of Figures

xviii


List of Tables

6.1 Mean loss and runtime over 20 runs of GA and PSO. . . . . . . . . . 45
6.2 Standard deviation of final loss for all loss metrics for GA and PSO. . 46
6.3 Mean loss and runtime over 20 runs of GA and PSO when terminated

using convergence criterion. . . . . . . . . . . . . . . . . . . . . . . . 47
6.4 Mean difference in total loss and taper loss between GoBs optimized

by CM compared to GA and PSO. For reference, taper loss is usually
in the range two to three. . . . . . . . . . . . . . . . . . . . . . . . . 49

6.5 Mean runtime over 20 runs for different search space ranks. . . . . . . 51
6.6 Mean total loss and runtime for 20 runs of GA and PSO for an antenna

array with 32 subarrays. . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.7 Mean total loss and runtime for 20 runs of GA and PSO for an antenna

array with 1152 subarrays. . . . . . . . . . . . . . . . . . . . . . . . . 53
6.8 Mean runtime over 20 runs for single- and dual-polarisation. . . . . . 54
6.9 Mean total loss per beam over GoB for different kinds of warm starts. 57
6.10 Mean runtime per beam over GoB for different kinds of warm starts. 58
6.11 Estimated times needed to optimize the entire GoB of an antenna

with 288 subarrays for five frequencies using different warm start
methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

A.1 p-values for all paired t-tests, rounded to two significant figures. . . . IX

B.1 Mean taper loss over 20 runs of GA and PSO. . . . . . . . . . . . . . XI
B.2 Standard deviation of final taper loss for GA and PSO. . . . . . . . . XI
B.3 Mean difference in total loss and taper loss between all 24 wide beam

optimized by current method compared to GA and PSO. . . . . . . . XIV
B.4 p-values for all two-sample t-tests, rounded to two significant figures. XV

xix


List of Tables

xx


List of Algorithms

1 Genetic Algorithm for High Dimensional Problems . . . . . . . . . . . 23
2 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Particle Swarm Optimization with Inertia . . . . . . . . . . . . . . . 26
5 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 27

xxi


List of Algorithms

xxii


1
Introduction

Ericsson is one of the leading providers of information and communication technolo-
gies to service providers, while also investing in technical development. One area
under research at Ericsson is antenna arrays. These can conduct traffic with user
equipment by creating peaks in the intensity of the electrical field, known as beams,
in the direction of the user by manipulating signals called beam weights. Beam
weights that result in simple beams can be calculated analytically, while more com-
plicated beams require more complex tools, such as optimization methods.

A larger antenna array can create stronger beams and better specify direction, and
therefore, the current trend appears to be toward larger antenna arrays for 6G.
However, current methods for beam weight optimization used at Ericsson suffer
from long run times and poor convergence for larger products. Producing beam
weights for large antenna arrays in an efficient and precise manner is a challenge
that Ericsson needs to tackle to unlock further development opportunities within
the area of large antenna arrays.

The performed initial studies show that the use of tailored algorithms is a promising
alternative to the generic optimizers used thus far, for example through the use of
CNNs to predict optimal beam weight values, such as by Frilund and Olofsson [1],
although this method was only explored for small antenna arrays. The aim of this
thesis is to create an optimization algorithm tailored to the problem of beam weight
optimization for large antenna arrays. The algorithm needs to be able to consis-
tently converge in a reasonable time while keeping the desired spatial performance
of Ericsson’s Active Antenna System products.

1.1 Problem Statement
The objective of this thesis is to design a time-efficient optimization algorithm that
can be used to optimize beam weights for large antenna arrays such that the result-
ing electrical field is fitted to a given reference electrical field.

Previously used methods are unable to use real embedded subarray pattern data in
the optimization process and can only handle ideal data. This is a drawback, as the
physical characteristics of the antenna array affect the resulting electrical field.

Ericsson uses a number of different metrics to evaluate the quality of a beam weight

1


1. Introduction

choice for a given reference field. For the purposes of this master thesis, a subset of
these metrics will be selected to be used as components of the objective function,
but all of them will be used when evaluating the quality of the solutions. These
metrics are introduced in Section 3.6.

The total list of desiderata for the final algorithm is stated here. The algorithm
should:

• Consistently produce beam weights that correspond to an electrical field with
desired metric values with respect to a given reference field.

• Run and converge in a shorter time than the current methods.
• Produce results that are robust against small variations resulting from the

manufacturing of antenna and measurement components.
• Optimize beam weights using measured and simulated realistic ESAPs.

1.2 Constraints
In this thesis, there are a number of constraints set from the beginning to limit
the scope of the project. One such limitation is that we do not take into account
emissions outside the frequency bandwidth of the antenna, i.e., all antennas are as-
sumed to have no out-of-band emissions. We further assume that all electrical fields
propagate through free space and do not consider other mediums. Finally, we use
realistic simulated data, and no data measured from real antennas since no such
measurements exist for the specific antenna size we mainly investigate. Due to the
computational cost of producing realistic simulated embedded subarray patterns, we
only have one such set available. This data is generated using the first two assump-
tions. However, data could be simulated without these two assumptions, though
they have a negligible effect and are therefore generally included as it simplifies the
process.

2


2
Electromagnetic Field Theory

In this chapter, we introduce the majority of the electromagnetic field theory that is
necessary to understand the method and results of this thesis. We explain concepts
such as relevant coordinate systems, scales, units, and polarisations.

2.1 Azimuth and Elevation Coordinates
When working with antennas, it is common to use a coordinate system that is similar
to spherical coordinates. We use the angle in azimuth φ to refer to the angle between
the x-axis and the projection of a point onto the xy-plane. The angle in elevation
θ refers to the angle between the z-axis and the point. This means that φ is in the
range [−180, 180] and θ is in the range [0, 180]. The coordinate system is illustrated
in Figure 2.1.

Figure 2.1: Coordinate system used for antennas, with a Cartesian coordinate
system for reference. The antenna array is situated in the yz-plane and faces the
positive x-direction.

2.2 The Far Field
When working with antennas, the focus is often on the far field, which is defined as
the region where

3


2. Electromagnetic Field Theory

r ≥ 2D2

λ
,

where λ is the wavelength, r is the distance from the source to the measuring point,
and D is the largest diameter of the antenna [2, p. 33].

Figure 2.2: In the far field region, the electrical field can approximated as a plane
wave.

The far field is of interest because the electrical field can be approximated as a
plane wave [3], as shown in Figure 2.2. There, the electrical field’s dependency on r
becomes nearly perfectly separable from the θ- and φ-dependency and the field can
be approximately expressed as

E(r, θ, φ) = 1
r

e−j 2π
λ

rG(θ, φ), (2.1)

where G(θ, φ) is the directional dependency of the field, known as the far field func-
tion. This is the reason that the azimuth/elevation coordinate system omits a radial
coordinate. The far field function is complex valued and therefore also affects the
phase of E [2, p. 35].

Given the far field function G(θ, φ) the radiated power per unit solid angle can be
calculated as

U(θ, φ) = 1
2ζ
|G(θ, φ)|2,

where U(θ, φ) is known as the radiation intensity and ζ is the wave impedance [2,
p. 36]. When we refer to the radiation intensity for a set of angles, we call it the
radiation pattern.

2.3 Directivity and Gain
The directivity D(θ, φ) of an antenna in a given direction is defined as the ratio
between the radiation intensity in that direction and the radiation intensity averaged

4


2. Electromagnetic Field Theory

over all directions [4, p. 12]. This is written as,

D(θ, φ) = 4π
U(θ, φ)

PR
,

where PR is the total radiated power of the antenna and is defined as,

PR =
∫ 2π

0

∫ π

0
U(θ, φ) sin(θ)dθdφ.

When directivity is mentioned without a directional dependency, it is usually as-
sumed that it is in the direction that maximizes the radiation intensity [5, p. 39].
However, in this thesis, we always refer to directivity with a directional dependence.

Directivity does not measure the antennas’ efficiency, only its directional capabilities,
and therefore gain, G(θ, φ) is often used as accounts for both. The gain of an antenna
in a given direction is defined by IEEE as the ratio between the radiation intensity
in that direction and the radiation intensity produced if the power accepted by the
antenna was radiated isotropically, i.e., equally in all directions [4, p. 17]. Thus,
gain is defined as

G(θ, φ) = 4π
U(θ, φ)

PO
, (2.2)

where PO is the total accepted power of the antenna. According to [4, p. 30] PO ·η =
PR, where η ∈ [0, 1] is the radiation efficiency, which is a measure of how large part
of the accepted power is radiated by the antenna. Thus, equation (2.2) can be
rewritten as,

G(θ, φ) = 4πη
U(θ, φ)

PR
= ηD(θ, φ).

In the same way as for directivity, when gain is mentioned without a direction, it is
usually assumed that it is the direction that maximizes the radiation intensity [5,
p. 59]. Likewise, in this thesis, we always refer to gain with a directional dependence.

Gain is not enough when we want to know the performance of our antenna in relation
to the supplied power. It accounts for the losses inside the antenna, but to account
for the energy loss between the energy source and the antenna, we use realized gain.
It is defined as the gain of an antenna multiplied with an impedance mismatch factor,
M2 ∈ [0, 1], which is the ratio between the total accepted power of the antenna, PO

and the power available from the matched transmission line feeding the antenna,
PM [4, pp. 19, 31]. Thus, realized gain is

GR(θ, φ) = M2G(θ, φ).

In the same way as for gain and directivity, when realized gain is mentioned without
a direction, it is usually assumed that it is the direction that maximizes the radia-
tion intensity. As with directivity, we always refer to realized gain with a directional
dependence in this thesis.

5


2. Electromagnetic Field Theory

2.4 EIRP
Effective Isotropic Radiated Power (EIRP) is a measure of how much energy an
antenna is transmitting in a given direction and is often used when studying the
performance of an active antenna. According to “IEEE Standard for Definitions of
Terms for Antennas” [4, p. 15] it can be calculated in the following way,

EIRP(θ, φ) = POG(θ, φ). (2.3)

Since gain can be rewritten as G(θ, φ) = 1
M2
GR(θ, φ) and PO = M2PM , the relation

can also be written as

EIRP(θ, φ) = PMGR(θ, φ). (2.4)

Furthermore, since gain can also be rewritten as G(θ, φ) = ηD(θ, φ) and PO = 1
η
PR,

we can also write this relation as

EIRP(θ, φ) = PRD(θ, φ). (2.5)

Equation (2.5) is used more often than (2.3) as PR can be measured directly, whereas
PO is very difficult to measure. However, Equation (2.4) is the easiest to use, as the
power in the matched transmission line, PM , is the easiest to measure.

2.5 Units
In this thesis, the primary focus will be on the far field, where we measure directivity
and gain in dBi. The i in dBi stands for isotropic because we measure relative an
isotropic antenna, i.e., an antenna that radiates equally in all directions. This is
according to convention defined as

DdBi = 10 · log10 (D) [dBi]
GdBi = 10 · log10 (G/Giso) [dBi],

where Giso is the gain of an isotropic antenna.

The unit dBm is used to express power, for example U(θ, φ), PO, or PR, in dB
relative to 1 mW. That is, it is used to express a power P , measured in mW, in
logarithmic scale [6, p. 30]. It is converted according to

PdBm = 10 · log10 (P ) [dBm].

If the power, P , is in W we convert it to dBm according to

PdBm = 10 · log10 (P ) + 30 [dBm].

EIRP is also often expressed in dBm, and if it is measured in W it is converted
according to

6


2. Electromagnetic Field Theory

EIRPdBm = 10 · log10 (EIRP ) + 30 [dBm].
Every time directivity, gain, or EIRP is mentioned in the context of this thesis, it
is in units of dBi and dBm, respectively.

2.6 Polarisation of Plane Waves
As mentioned in Section 2.2, all electrical fields can be approximated as a plane
wave if considered in a sufficiently small area in the far field. The electrical field of
a plane wave propagating in the ẑ-direction can according to [2, p. 23] be described
as

E = Ete
−j 2π

λ
z = [Exx̂ + Eyŷ]e−j 2π

λ
z

where Et is the E-field with the propagation vector e−j 2π
λ

z omitted. It is possible
to decompose Et into its two polarisations, x̂ and ŷ. The components Ex and Ey
can be expressed as the projection of E onto the orthogonal unit vectors x̂ and ŷ
respectively. We use x̂ and ŷ as an example here, but this is applicable to any set
of orthogonal unit vectors.

If the polarisations can be individually controlled, then it is possible to send twice
the amount of information by sending different signals along the different polarisa-
tions. This is true for any pair of orthogonal polarisations, however, for this to work
well the polarisations must be well separated, i.e. one must be able to detect one of
them without the other interfering.

Each antenna aims to send only one polarisation, which we refer to as the co-polar
polarisation for that antenna. In the non-ideal case it is almost certain that some
signals will be sent in the other polarisation, which is referred to as cross-polar [2,
p. 23].

Thus, the electrical field from an antenna can be decomposed into the co-polar and
cross-polar components, Eco and Exp, where Eco is parallel to the co-polar unit vector
ĉo and Exp is parallel to the cross-polar unit vector x̂p. Therefore, the electrical
field can be written as

E = [Ecoĉo + Expx̂p]e−j 2π
λ

z. (2.6)
The propagation direction ẑ is orthogonal to both ĉo and x̂p which are generally
orthonormal vectors [2, p. 24].

2.7 Aperture Size
The aperture of an antenna is the area where most of the radiated power passes
through [2, p. 57]. In general it is difficult to exactly define the aperture of an
antenna and some antenna types lack the concept completely.

7


2. Electromagnetic Field Theory

The effective aperture area of an antenna in a given direction is

Ae = PO

Wt

,

where PO is the total accepted power by the antenna and Wt is the power density
of a plane wave in the given direction [4, p. 13].

The max directivity of an antenna is directly proportional to the effective aperture
area when the field of the antenna has a uniform distribution over the aperture [2,
pp. 57, 240]. Max directivity can therefore be calculated as

Dmax = Ae
4π

λ2 .

This means that larger antennas allow us to create narrower, more focused beams,
concentrating more of the power in one chosen direction.

8


3
Antenna Arrays

This chapter aims to describe the fundamentals of antenna arrays and their use in
beamforming. We begin with a description of the different beam types and their
uses, before outlining the architecture of an antenna array. We then detail the use
of antenna arrays in beamforming. Following that, we also introduce the metrics
that are used to evaluate the quality of an optimized set of beam weights.

3.1 Beam Types and Uses
The main utility of antenna arrays lies in the ability to manipulate their electri-
cal fields as desired through adjustment of the phases and amplitudes of the beam
weights. In this way, antenna arrays allow for steering which directions they are
radiating energy in. When this is used to radiate energy in a chosen direction, we
refer to this peak in the radiation pattern as a beam. We refer to the manipulation
of beams through, for example, the choice of beam weights, as beamforming [7, p. 2].
In 5G, antenna arrays are mainly used to create two types of beams, traffic beams
and wide beams [7, pp. 2–3].

Traffic beams, or pencil beams, are beams with low side lobe levels and a single
main lobe with small half-power beamwidth in both azimuth and elevation [4]. They
have high max directivity, which is used to send and receive information from user
equipment [7, pp. 2–3]. Figure 3.1a illustrates how a traffic beam communicates
with a user device, and Figure 3.1b shows a representation of a cross-section of the
radiation pattern of an ideal traffic beam.

(a) Representation of a traffic beam
conducting traffic to and from a user
device.

(b) Representation of a cross section of
the radiation pattern of an ideal traffic
beam.

9


3. Antenna Arrays

Each antenna array has a set of traffic beams that cover the entire angular range that
the antenna serves, also known as the coverage angular range (CAR). The traffic
beam with the strongest connection to the user equipment is used to provide service
to the user. However, to choose which traffic beam to use, we need to know which
creates the strongest signal. This can be done by testing each traffic beam one at
a time and then choosing the traffic beam that created the strongest signal for the
user. The problem with this approach is that there are many traffic beams available
to an antenna, making it inefficient to test each one. To aid in this process, a second
type of beam called a wide beam [7, p. 3] is used.

Wide beams are wider than traffic beams, so fewer are needed to cover the CAR.
Each wide beam corresponds to a subset of the traffic beams. When searching for
the best traffic beam to provide service to a user, each wide beam is tested to find
the one that creates the strongest signal for the user. The traffic beams correspond-
ing to the chosen wide beam are then tested to find the one with the strongest
connection to the user. In this way, it is possible to find the best traffic beam to
serve the user without having to check every single traffic beam. This process is
illustrated in Figure 3.2.

(a) The wide beams are tested to find
the one with the best connection.

(b) The traffic beams corresponding to
the best wide beam are tested to find
the one with the best connection.

Figure 3.2: Wide beams are used to identify the optimal traffic beam with which
to serve a user.

The wide beams are arranged in a grid that covers the CAR. This grid is known as
a grid of beams (GoB), which can refer to either a grid of wide beams, or to traffic
beams, which are also arranged in a grid. The process of selecting a traffic beam is
also illustrated in Figure 3.3.

10


3. Antenna Arrays

(a) The wide beam with the best
connection to the user equipment is
selected, and is outlined here in yellow.

(b) The traffic beams corresponding to
the best wide beam are tested to find
the one with the best connection to the
user equipment. The selected traffic
beam is outlined here in yellow.

Figure 3.3: Wide beams are used to identify the optimal traffic beam with which
to serve a user. The beams are arranged in a GoB that cover the CAR.

The desired wide beam shape is one that represents the strength of the signal for the
corresponding traffic beams as accurately as possible. In the ideal case, the strength
of the signal between a user and a wide beam is directly proportional to the strength
of the signal between the user and the best of the traffic beams corresponding to
that wide beam. For this reason, we build our desired wide beam radiation pattern,
which will be referred to as the reference field, from the radiation patterns of the
corresponding traffic beams. This is done by creating an envelope around these
beams, which refers to taking the maximum over all the traffic beams at each point,
as represented in Figure 3.4b.

(a) Representation of a cross section
of several ideal traffic beams corre-
sponding to a single wide beam.

(b) Representation of a cross section
of the envelope of several ideal traffic
beams corresponding to a single wide
beam.

Figure 3.4: Creation of an envelope over a set of traffic beams.

The radiation pattern is a representation of the amount of energy that is sent in
different directions. A traffic beam utilizes all the energy available from the antenna.
For this reason, an envelope of several traffic beams creates a curve that implies that
the antenna is emitting more energy than is available to it. This is rather improb-
able. For this reason, the envelope is divided by the square root of the number of
traffic beams used to create the beam, which lowers the power to a level where the

11


3. Antenna Arrays

energy output is not greater than the energy available to the antenna. This is the
ideal wide beam shape.

It is possible to calculate the beam weights that create a traffic beam analytically.
However, this does not work for wide beams due to their complicated shape. For
this reason, optimization methods are employed to find beam weights that create a
radiation pattern that is similar to that of a given reference field.

3.2 Antenna Arrays

As was shown in Section 2.7, larger antennas have higher directivity and for some
uses, for example long-distance communication, much higher directivity than an-
tennas of ordinary size can create is required. For this purpose, larger antennas or
antenna arrays can be used. Antenna arrays consist of multiple antenna elements
that are distributed over a surface, the aperture area, and thus antenna arrays with
large apertures can be constructed. They are also useful in that they provide a large
flexibility when shaping the radiation pattern [8, p. 285].

There are three different types of antenna arrays: linear, planar, and conformal. The
type mainly discussed in this thesis is planar antenna array. The antenna elements
of a planar array are spread out on a plane surface [2, p. 334]. The antenna elements
can be any type of antenna or even several antennas per element, for example, two
orthogonal half-wave dipole antennas.

3.2.1 Antenna Elements

As mentioned above, the structure of antenna elements can vary, but in the scope
of this thesis we will work with antenna elements which consists of two orthogonal
antennas. In this case, the antennas produce fields orthogonal to each other that
propagate in the same direction.

In Figure 3.5 we present an example of an antenna element with two orthogonal
antennas. This setup is called ±45-polarisation, since both antennas are rotated 45◦

in opposite directions from the top of the antenna element. Another very common
way to build such an antenna element is called H/V-polarisation and consists of one
antenna rotated 90◦ and one rotated 0◦ from the top, i.e., one is horizontal and the
other is vertical.

12


3. Antenna Arrays

Figure 3.5: One antenna element consisting of two antennas rotated 45◦ in opposite
directions.

In the ideal case, the electrical field of each antenna will consist of only one polari-
sation. We illustrate this using the antenna element in Figure 3.5. In the ideal case,
the E+45 component of the field will be created entirely by the +45 antenna and the
E−45 component will be created entirely by the −45 antenna. However, in reality,
each antenna usually creates some low-level noise on its cross-polarisation.

3.2.2 Several Antenna Elements Make a Subarray
Several antenna elements can be arranged next to each other to create a subarray.
Figure 3.6 shows an example of a subarray with 3 rows and 1 column. It is referred
to as a 3 × 1-subarray. Each subarray has 2 ports. Each port corresponds to one
polarisation, in this case either +45 or −45.

Figure 3.6: A 3×1-subarray consists of three antenna elements arranged into three
rows and one column.

The excitations sent through the ports are called beam weights (BWs). These are
the smallest steerable parameters that will be discussed in this thesis. Beam weights
are complex valued, and when a beam weight is set to a non-zero value, the corre-
sponding elements in that subarray create an electrical field. Multiplying the beam
weight by some factor a will create an electrical field that is a times stronger, and
adding a phase θ will create an electrical field that is shifted θ degrees in phase.

3.2.3 Several Subarrays Make an Antenna Array
An antenna array consists of a number of subarrays arranged in a grid. An example
with 3 rows and 8 columns can be seen in Figure 3.7. An antenna array with m

13


3. Antenna Arrays

rows and n columns has 2 ·m · n ports, one for each polarisation for each subarray.
By manipulating the beam weights, it is possible to steer the direction and modify
the shape of the electrical field created by the antenna array. Each port has a
corresponding beam weight, so there are 2 · m · n beam weights to choose for an
m× n antenna array.

Figure 3.7: A 3 × 8 antenna array consists of 24 subarrays arranged into 3 rows
and 8 columns.

3.3 The Superposition Principle

Electric fields are vector fields and can therefore be combined according to the
superposition principle. When there are several sources creating electrical fields in
the same space, the total electric field at each point is the sum of the vectors of the
different electrical fields in that point [2, p. 135]. The total electrical field Etot can
therefore be written as

Etot (x) =
∑
j∈S

Ej (x) (3.1)

where S is the set of sources, Ej is the electrical field created by the source j ∈ S,
and x is any point in the space.

If the vectors are pointing in a similar direction, the resulting vector will be longer
than either, while if the phase difference is large, the resulting vector will be shorter
than both. This is called constructive and destructive interference respectively, and
is illustrated in Figure 3.8.

14


3. Antenna Arrays

(a) Constructive interference. (b) Destructive interference.

Figure 3.8: Constructive and destructive interference between two vectors (green
and red arrows respectively). The black arrow represents the resulting vector.

3.4 Beamforming with Antenna Arrays

For each port on an antenna array, an electrical field is created when the corre-
sponding beam weight is set to one, and all other beam weights are set to zero. The
resulting far field function is called an embedded subarray pattern (ESAP). As there
is one ESAP for each port, an m× n antenna array has 2 ·m · n ESAPs. For each
ESAP we can calculate the resulting far field function when that same beam weight
is instead set to reiθ, and all other beam weights are still set to zero, by multiplying
the ESAP with reiθ. This is done by multiplying the amplitude in each point by r
and adding θ to the phase of the vector in each point, as illustrated in Figure 3.9a.
By summing over the products of each ESAP with its corresponding beam weight,
we get the resulting far field function GA(θ, φ) of the antenna array, which we can
write as

GA(θ, φ) =
∑
j∈S

bwj · ESAPj(θ, φ), (3.2)

where bwj is the beam weight corresponding to ESAPj(θ, φ), and S is the set of
ports. This calculation is illustrated in Figure 3.9b. By shifting the phases of
different beam weights we can steer where we will see constructive and destructive
interference in the far field, and we can affect the strength of the field in different
points by adjusting the amplitudes of the different beam weights. In this way, we
are able to create desired radiation patterns by manipulating the beam weights [7,
p. 2].

15


3. Antenna Arrays

(a) The field created by one polari-
sation of a subarray is the embedded
subarray pattern multiplied with the
beam weight.

(b) The field created by an antenna
array is the sum of the fields created
by each polarisation of each subarray.

Figure 3.9: Representation of how the electrical field is calculated from the ESAPs
and BWs.

3.5 Single- and Dual-Polarized Beamforming
In Section 3.2.2 we stated that each subarray of an antenna array has two ports,
one for each polarisation. Sometimes we choose to optimize the beam weights in
such a way that the two ports for a subarray are assigned the same beam weight.
This is called single polarized beamforming and gives us m · n complex variables to
optimize for an antenna array of size m×n. The alternative is called dual polarized
beamforming which means that we optimize individual beam weights for each port.
This results in 2 ·m · n complex variables to optimize. These two approaches have
different strengths and weaknesses. With single polarized beamforming there are half
as many beam weights that need to be optimized, which can reduce the runtime of
an optimization algorithm substantially depending on the time complexity of the
optimization. On the other hand, dual polarized beamforming has more degrees of
freedom which might allow the algorithm to find a more exact solution.

3.6 Metrics
In this section, we introduce three metrics used when evaluating the quality of
a choice of beam weights. For some of these metrics, we need a reference field
to compare the resulting radiation pattern against. We assume that the far field
function are measured in discrete points, at every whole degree of θ and φ, as this
is a standard format for measured ESAPs.

3.6.1 Taper Loss
Taper loss is a measure of how much of the maximum available power of an antenna
is utilized for a particular set of beam weights. This is calculated from the ampli-
tudes of the beam weights. Each beam weight has a maximal amplitude constrained
by the amount of energy available per port. This implies that when the amplitudes
of all beam weights are set to the maximum allowed value, the antenna utilizes all
available power. In this case, the taper loss is zero. It is usually assumed that we

16


3. Antenna Arrays

are attempting to create the strongest possible beam with the power available to
the antenna array. For this reason, we normalize the beam weights and assume
that the largest amplitude is set to the largest allowed value. To account for this
normalization, we divide each beam weight amplitude by the largest beam weight
amplitude in the taper loss equation, effectively performing the normalization as a
part of the taper loss calculation in case the beam weights have not been normalized.

The taper loss is defined as

LT := −10 · log10

( N∑
n=1

( |bwn|
maxn |bwn|

)2
/N

)
,

where N is the number of beam weights and bwn is the beam weight corresponding
to the nth ESAP.

3.6.2 Main Lobe Loss
One very important loss is the main lobe loss which measures the difference between
the field calculated from the investigated beam weights, and the reference field, by
taking the mean square error (MSE) of the radiation pattern of both fields in the
EIRP scale. However, the fields are not compared in all indices, but only in the
main lobe area Fmain. The main lobe loss is illustrated in cross-section in Figure
3.10. The total main lobe loss is the volume between the reference beam and op-
timized beam. Note that the figure is in a logarithmic scale and that the side and
main lobe thresholds are lower than they might appear in the figure.

−90 −60 −30 0 30 60 90
ϕ (deg)

15

20

25

30

35

40

45

EI
RP
 (d
B 
)

Cut at θ=100
Opti i)ed cur(e
Reference cur(e
Side lobe threshold
Main lobe threshold

Figure 3.10: Cut of an example beam. The pink line is the reference field in
the main lobe area and the blue line is the optimized field. The black and grey
dotted lines are the side lobe and main lobe thresholds, respectively. The red area
represents the main lobe loss.

17


3. Antenna Arrays

The first step of calculating the main lobe area is finding the highest value of the ra-
diation pattern of the reference field, Umax, and its corresponding index, (imax, jmax).
Then let P be the set of all indices of the reference field that have a radiation inten-
sity higher than Umax − 10 [dBm], which is referred to as the main lobe threshold.
We then define the main lobe area Fmain to be the largest connected subset of S that
contains the index (imax, jmax). For more details, see the implementation in Section
5.3.2.

-90 -60 -30 0 30 60 90
ϕ (deg)

0

30

60

90

120

150

180

θ 
(d

eg
)

Epower

−10

0

10

20

30

40

EI
RP

 (d
Bm

)

(a) An examplatory wide beam.

-90 -60 -30 0 30 60 90
ϕ (deg)

0

30

60

90

120

150

180

θ 
(d

eg
)

Epower

34

36

38

40

42

44

EI
RP

 (d
Bm

)

(b) The main lobe of the examplatory
wide beam.

Figure 3.11: An example of the radiation pattern of a wide beam and its main
lobe.

The main lobe loss is then defined as

LM := 1
|Fmain|

∑
θ,φ∈Fmain

([UGA
(θ, φ)]dBm − [UR(θ, φ)]dBm)2 ,

where UGA
(θ, φ) is the radiation pattern of the antenna array and UR(θ, φ) is the

radiation pattern of the reference field.

3.6.3 Side Lobe Loss
A side lobe is an unwanted peak in the radiation pattern of an antenna array. They
are unwanted because they leak energy in undesirable directions, which wastes en-
ergy, and can also interfere with other antennas. The side lobe loss is a measure of
how much the side lobes exceed the side lobe threshold, Umax − 12 [dBm]. If no side
lobes exceed the threshold the loss is zero. The side lobe loss is illustrated in cross-
section in Figure 3.12. The total side lobe loss is the volume between the optimized
beam and the side lobe threshold. Note that the figure is in a logarithmic scale and
that the side and main lobe thresholds are lower than they might appear in the figure.

Similar to the main lobe loss, this is not calculated for all indices, only the side lobe
area Fside. We define this set as the complement set of F ′

main, which is calculated in

18


3. Antenna Arrays

almost the same way as Fmain in Section 3.6.2. The only difference is that we instead
use Umax − 12 [dBm] as the threshold. For more details, see the implementation in
Section 5.3.2.

−90 −60 −30 0 30 60 90
ϕ (deg)

15

20

25

30

35

40

45

EI
RP
 (d
B 
)

Cut at θ=100
Opti i)ed cur(e
Reference cur(e
Side lobe threshold
Main lobe threshold

Figure 3.12: Cut of an example beam. The pink line is the reference field in
the main lobe area and the blue line is the optimized field. The black and grey
dotted lines are the side lobe and main lobe thresholds, respectively. The red area
represents the side lobe loss.

The side lobe loss is then defined as

LS :=
∑

θ,φ∈Fside

max{0, [UGA
(θ, φ)]dBm, − [Umax − 12]dBm}.

Note that since we are working in a logarithmic scale, the side lobe threshold, Umax−
12 [dBm], would be approximately 1

16Umax in a linear scale.

19


3. Antenna Arrays

20


4
Optimization

In the field of optimization, one studies methods for finding the optimum of a func-
tion. The optimum is the best possible value, often defined as the point within a
set search space where the function achieves either its maximum or minimum value.
There are many different approaches to solving optimization problems, the oldest
and most well-studied branch being classical optimization [9, pp. 1–2]. However, the
main focus of this thesis will be methods from a separate branch, namely stochas-
tic optimization. In this chapter, we begin with a brief introduction to these two
branches, before detailing the specific algorithms that will be used in this thesis.

4.1 Classical Optimization
Classical optimization methods are deterministic algorithms for solving optimization
problems where the objective function can be written as a well-defined mathematical
function. Most classical optimization algorithms are poorly suited to non-convex
problems, and problems where the objective function is highly non-linear or of high
dimension. For problems of these types, it is often prudent to turn to other branches
of optimization, such as stochastic optimization [9, pp. 1–2, 33].

4.2 Stochastic Optimization
Stochastic optimization methods are, in contrast to classical optimization methods,
never deterministic. They have elements of randomness in them, though this does
not mean they are completely without rhyme or reason. Many stochastic optimiza-
tion methods are inspired by phenomena in nature, such as foraging behaviour of
ant colonies, swarms of animals, and evolution. From these phenomena, principles
such as cooperation and adaption, which can be regarded as a type of optimization,
have been taken and converted to mathematical equations to be used in optimiza-
tion. Two stochastic methods inspired by nature are genetic algorithms (GA) and
particle swarm optimization (PSO) [9, pp. 2–4].

4.2.1 Genetic Algorithms
Genetic algorithms (GA) are a type of evolutionary algorithms, which are stochastic
optimization methods inspired by the biological phenomena of evolution as described
by Charles Darwin [9, p. 35]. Similar to how a gradual change of a population over
generations can increase individuals’ chances to survive and reproduce, i.e. increase

21


4. Optimization

their fitness, we want to get closer and closer to an optimum over the iterations of
an optimization algorithm.

Figure 4.1: Visualization of a generation of GA with its main operations. The
blue ovals are populations, and the smaller circles are individuals.

The first step of genetic algorithms is to define an encoding of solutions to chro-
mosomes. Next, a population is created, which consists of several chromosomes.
Then, a fitness function is defined. Its purpose is to evaluate each chromosome.
The resulting value is referred to as a chromosome’s fitness. A common way to
define the fitness function is F (x) = 1

L (x) , where L (x) is an objective function
which we want to minimize. The main part of genetic algorithms consists of manip-
ulating chromosomes to increase their fitness over generations. A new generation
is created from the previous generation using three operations. The first operation
is called crossover, which creates new chromosomes using parts of current chromo-
somes. Secondly, there is mutation which makes stochastic changes to elements of
the chromosomes, also known as genes. Lastly, there is elitism, which selects chro-
mosomes for the new generation based on their fitness. Chromosomes with higher
fitness are more likely to be selected.

There are many ways the three operations; crossover, mutation, and elitism, can be
implemented. For this project we used the operations as defined by M L Shahab, F
Azizi, B A Sanjoyo, et al. [10] which are modified to work well in high dimensions.
Algorithm 1 is a pesudocode for this genetic algorithm, and Figure 4.1 presents a
simple representation of a generation of GA. The operations are described in more
detail in the following sections.

22


4. Optimization

Algorithm 1 Genetic Algorithm for High Dimensional Problems
Define a chromosome representing the solution
Define fitness function F (x)
Generate an initial population consisting of m chromosomes
Initialize the maximum number of generations n
for t = 1,2...,n do

for i = 1,2...,m do
Generate child by Crossover
Update obtained child by Mutation

end for
Elitism

end for

Crossover

The first step of crossover is selecting two chromosomes from the population. The
method used is tournament selection with size n and tournament probability ptour.
Thus, n random chromosomes are chosen from the population, and the one with the
highest fitness is chosen with probability ptour. If it is not chosen it is discarded, and
the same process is repeated recursively until one individual is chosen or there is
only one left. In that case, the last remaining one is chosen. This process is executed
twice, so that two individuals are chosen in total.

Genes from the two chosen chromosomes are then mixed as shown in Algorithm 2
to create a new chromosome. The new chromosome always has the same or higher
fitness than its parent chromosomes due to the construction of the algorithm.

Algorithm 2 Crossover
x← Chromosome of length N chosen via tournament selection
y ← Chromosome of length N chosen via tournament selection
xc ← Copy of x
for i = 1, 2, ..., N do

x(1) ← xc

x(2) ← xc with value at index i set to yi

x(3) ← xc with value at index i set to yi+xi

2
j ← argmaxj∈{1,2,3}{F (x(j))}
xc ← x(j)

end for
return xc

Mutation

We perform mutation on the chromosomes returned by crossover with mutation
probability set to one, which means each index is mutated. The process of mutation
is described in Algorithm 3. The mutated chromosome is always better than the

23


4. Optimization

original. Mutation is important since it contributes to chromosomes moving out of
local optima of the fitness function and adds genetic diversity to the population.

Algorithm 3 Mutation
n← The number of the current generation
x← Chromosome of length N created using Crossover
xc ← Copy of x
for i = 1, 2, ..., N do

r ← random value sampled from uniform distribution on interval [− 1
2n

, 1
2n

]
x(1) ← xc

x(2) ← xc with value at index i set to xi + r
x(3) ← xc with value at index i set to xi − r
j ← argmaxj∈{1,2,3}{F (x(j))}
xc ← x(j)

end for
return xc

Elitism

The version of elitism used in this project is called "Elitism replacement with fil-
tration" and was introduced by M L Shahab, F Azizi, B A Sanjoyo, et al. [10]. It
chooses the next generation as the m chromosomes with the highest fitness out of
both the last generation and the m newly generated chromosomes. However, we
first remove duplicate chromosomes and generate random chromosomes if there are
in total less than m chromosomes left. This is to counteract potential “inbreeding”,
i.e., if the chromosomes become too similar, by removing duplicates and introducing
new genes.

4.2.2 Particle Swarm Optimization

We also investigated particle swarm optimization (PSO). PSO is a type of stochastic
optimization algorithm that mimics the behaviour of a swarm of animals or insects.
Similar to the population in GA, PSO has a group of possible solutions, which we
refer to as a swarm. Each solution in the swarm is referred to as a particle. The
particles in a swarm are able to share information about where they have found the
lowest loss values in the search space. The loss of a particle x is calculated using the
objective function f(x). Each particle stores information about the position with
the lowest loss that it has found, as well as the position with the lowest loss that
any member of the swarm has found. This is visualized in Figure 4.2.

24


4. Optimization

Figure 4.2: Visualization of a swarm of particles and their velocities. The blue star
represents the current best position of the blue particle and the pink star represents
the current best position of the entire swarm.

The first step in the algorithm is to initialize the swarm with all its particles. Each
particle is initialized at a random position in the search space, with a random ve-
locity. The search space is assumed to be a hyperrectangle defined by the maximal
and minimal values xj,max and xj,min for each dimension j.

In each iteration, each particle i updates its position xi based on the position of their
personal best xpb

i , and the position of the swarm best xsb. How much the particles
steers toward these points is determined by two positive constants: the cognition
factor c1 which affects how much the particles move toward their personal bests, and
the social factor c2 which affects how much the particles steer toward the swarm best.

Next, the velocity vi is checked to ensure that it does not exceed a set speed limit
vmax. If it does, the vector is normalized to be at the speed limit. Then the position
of the particle is calculated from the current position of the particle and its updated
velocity together with the time step ∆t. This new position is then evaluated, and if
the loss is lower than its personal best, then the particle’s personal best is updated.
If the loss is lower than the swarm best, then the swarm best is updated as well.

A common version of PSO is PSO with inertia, where inertia ω is a nonnegative
constant that affects how much of the previous velocity is kept in the velocity up-
date. The pseudocode for this PSO variant is presented in Algorithm 4.

25


4. Optimization

Algorithm 4 Particle Swarm Optimization with Inertia
Initialize particles with random positions and velocities

xij ← xj,min + U [0, 1](xj,max − xj,min), i = 1, ..., n, j = 1, ..., n
vij ← α

∆t
(−xj,max−xj,min

2 + U [0, 1](xj,max − xj,min)), i = 1, ..., n, j = 1, ..., n

for k = 1, 2, ..., K do
Evaluate each particle and update their personal best and the swarm best

xpb
i ← xi if f(xi) < f(xpb

i ), i = 1, ..., n
xsb ← xi if f(xi) < f(xsb), i = 1, ..., n

Update each particle’s velocity
vij ← ωvij + c1r1j

(
xpb

ij −xij

∆t

)
+ c2r2j

(
xsb

j −xij

∆t

)
, i = 1, ..., n, j = 1, ..., m

Restrict the velocities to the speed limit
vi ← vmax

|vi| vi if |vi| > vmax, i = 1, ..., n

Update each particle’s position
xi ← xi + vi∆t, i = 1, ..., n

end for
return xsb

Here, r1 and r2 are random numbers between 0 and 1.

4.3 Bayesian Optimization
Bayesian optimization is a non-parametric optimization method suitable for prob-
lems where the objective function acts as a black-box function that is very expensive
to evaluate. It attempts to find the point that minimizes the objective function using
as few evaluations as possible by strategically choosing the next point to evaluate
in each iteration [11]. Here we present a brief overview of the optimization method
and introduce pseudocode, before discussing the main elements of the method in
more detail.

The method uses a Gaussian process as a surrogate model for the objective function.
The surrogate model attempts to model the behaviour of the objective function as
accurately as possible. The method also uses an acquisition function. For each point
in the domain, the acquisition function estimates the value gained from sampling
the objective function in that point. This value is usually defined in terms of the
expected amount of information gained and the value of the surrogate function in
that point. In each iteration, the acquisition function is maximized to find the op-
timal point to sample in. Next, the objective function is evaluated in the chosen
point, and the result is used to update the surrogate model using Gaussian process
regression [11]. Algorithm 5 summarizes Bayesian optimization in pseudocode as
presented by Frazier [11].

26


4. Optimization

Algorithm 5 Bayesian Optimization
Place a Gaussian prior on f
Observe f at n0 points according to an initial space-filling experimental design.
Set n = n0
while n ≤ N do

Update the posterior probability distribution on f using all available data.
Let xn be a maximizer of the acquisition function over x, where the acquisition
function is computed using the current posterior distribution.
Observe yn = f(xn).
Increment n

end while
Return a solution: either the point evaluated with the largest f(x), or the point
with the largest posterior mean.

Figure 4.3 illustrates an example of how one iteration of the algorithm affects the
surrogate model and the acquisition function. Observe how an additional observa-
tion is added in the point that maximized the acquisition function and that this
reduces the uncertainty in the surrogate model and improves the estimated mean so
that it is now closer to the true value. Note that this is a toy example and that in
most cases we would not have access to the true shape of the function or any of its
derivatives when performing Bayesian optimization.

4.3.1 Gaussian Process Regression

A Gaussian process is defined as a collection of random variables, of which any finite
collection has a multivariate Gaussian distribution. Each of these random variables
is tied to a point in some domain D, often time or space. The Gaussian process is
completely defined by its mean function µ(x) and its covariance function Σ(x, x′).
We often choose µ0(x) = 0 for all x ∈ D as our prior. The prior covariance ma-
trix Σ0(x, x’) is chosen to reflect our beliefs about the covariance between different
points and is sometimes referred to as a kernel function [12].

The kernel function is chosen to encode our beliefs about the correlation between
different points in the domain based on the distance between them. A common
assumption is that points that are closer should have a higher correlation. This
would mean that when we evaluate the objective function in a point x′ and use this
to update the surrogate model, points that are close to x′ in the domain are affected
more by the update. If we know that the function is periodic with a known period
λ, then we can choose a kernel function that represents this, so that when we update
the prior in a point x′, the prior is most affected in the points x′ +nλ for any n ∈ Z.

27


4. Optimization

(a) Surrogate model with the true un-
known model (top) and acquisition
function (bottom) after updating the
prior with 6 random points.

(b) Surrogate model with the true
unknown model (top) and acquisition
function (bottom) after 1 iteration of
Bayesian optimization.

Figure 4.3: The mean of the surrogate function is shown as a dashed green line,
and the area within 1.96 standard deviations from the mean of the surrogate model
is shown shaded in green. The mean of the true function is shown as a dashed red
line, and all sampled points are shown as red dots. The acquisition function is shown
in blue, with a blue dot denoting the next query point.

We assume that the prior placed on the mean is µ0(x) = 0 and that σ0 is constant
over D. Then, given a kernel function Σ0(x1, x2) and a set of new data points
(x1:n, y1:n), where x1:n denotes a set of n points and y1:n denotes the corresponding
objective function values, we update the mean and variance functions according to
Rasmussen and Williams [12] in the following way:

µn(x) = Σ0(x, x1:n)
(
Σ0(x1:n, x1:n) + σ2

0I
)−1

f(x1:n)

σ2
n(x) = Σ0(x, x)− Σ0(x, x1:n)

(
Σ0(x1:n, x1:n) + σ2

0I
)−1

Σ0(x1:n, x).

4.3.2 Acquisition Function
The acquisition function usually reflects some combination of the uncertainty in the
surrogate model and the expected mean. In Figure 4.3 we use expected improvement

28


4. Optimization

(EI) as our acquisition function. The improvement from sampling at a point is 0 if
the objective value y′ is higher than or equal to the lowest value we have found so
far, ymin, and the improvement is ymin − y′ if y′ < ymin. Since we cannot know what
the improvement will be at any point without sampling there, we instead calculate
the expected improvement. Therefore, the expected improvement prioritizes points
where the surrogate model has a low mean but a high variance, since points with
high variance carry more uncertainty and could potentially be much better than the
expectation [11].

29


4. Optimization

30


5
Method

In this chapter, we specify the goals of this thesis in more detail. We then proceed
to give an overview of the algorithms and detail the design choices made during the
development of the central beam weight optimization methods of this thesis.

5.1 Specification of the Issue Being Investigated

The main objective of this thesis is to create an algorithm for optimizing beam
weights such that the corresponding radiation pattern minimizes the objective func-
tion, which we defined as a linear combination of the main lobe loss and the side
lobe loss. The algorithm should take as input; the embedded subarray patterns of
the antenna for which the beam weights should be optimized, the corresponding
traffic beam beam weights, and the indices of the traffic beams to be used in the
construction of the reference field. The reference field is a theoretical beam that is
constructed from traffic beams as described in Section 3.1.

The total list of desiderata for the final algorithm is stated in more detail here. The
algorithm should:

• Consistently produce beam weights that correspond to a radiation pattern
with desired metric values with respect to a given reference field.

– The metrics used to evaluate beam weights are main lobe loss, side lobe
loss, and taper loss, as introduced in Section 3.6.

– The desired metric levels are based on the results of the current method.
• Run and converge in a shorter time than the current methods.

– Current methods need approximately 1-2 hours per beam weight opti-
mization.

– The algorithm should also have a reasonable time complexity with regard
to the dimension of the problem.

• Produce results that are robust against small variations resulting from the
manufacturing of antenna and measurement components.

– These variations may include noise in the ESAPs and rounding errors for
beam weights applied in real situations.

• Optimize beam weights using measured and simulated realistic ESAPs.
– This is in contrast to current methods which are only able to handle ideal

ESAPs.

31


5. Method

5.2 Algorithm Overview
In Figure 5.1 a flowchart illustrates the most important steps of the algorithm be-
tween the input data (ellipses) and the output (circle).

Figure 5.1: A flowchart of the method from data and user inputs to optimized
beam weights. The dotted line only applies when GA is the algorithm used in the
optimization step and represents the use of traffic beam BWs as warm starts.

The ESAPs and traffic beam BWs are used to calculate the radiation pattern of the
traffic beams. The beam mapping is a list of the indices of the traffic beams to be
used in the construction of the reference field. The traffic beam radiation patterns
and the beam mapping are used to calculate the reference field, as described in Sec-
tion 3.1. The ESAPs and the reference field are then passed to the optimizer, which
returns a set of optimized beam weights. The optimizer minimizes the objective
function, which we define in Section 5.3.

5.3 Design of Objective Function
The objective function evaluates the quality of a set of beam weights and is com-
posed of two components, the side lobe loss and the main lobe loss, which are both
calculated using a chosen reference field. Originally, the objective function consisted
of the three metrics introduced in Section 3.6 which are commonly used at Ericsson
for beamforming, for example in the master thesis written at Ericsson in 2024 by
Frilund and Olofsson [1]. However, due to a difference in the choice of scale compared
to Frilund and Olofsson [1], the taper loss component was rendered unnecessary and
could be omitted, see Section 5.3.1. Thus, the final objective function L (bw), is a
weighted sum of the main lobe loss and the side lobe loss,

L (bw) = α1LM(bw) + α2LS(bw),
where LM(bw) is the main lobe loss and LS(bw) is the side lobe loss of the beam
weights bw, and α1 and α2 are positive constants. The exact values of α1 and α2

32


5. Method

are omitted from this report for confidentiality reasons. L (bw) is henceforth what
we refer to as the total loss. In the case of GA, the fitness function is defined as
F (bw) = 1

L (bw) .

5.3.1 Taper Loss
When working in a scale that is relative to the used energy, such as realized gain,
it is necessary to have some separate measure of the amount of radiated energy.
For this reason, previous algorithms, such as the one by Frilund and Olofsson [1],
include taper loss as part of the objective function. However, our algorithms perform
the optimization in EIRP scale which is an absolute scale that also measures the
radiated energy, and thus the taper loss becomes a superfluous component of the
objective function, and we can remove it.

5.3.2 Main and Side Lobe Areas
The definitions of the main and side lobe areas mentioned in Section 3.6 have the
restriction that the area must be connected, as without that restriction a side lobe
of the reference field with a high amplitude might be considered part of the main
lobe area. Earlier implementations, such as Frilund and Olofsson [1], did not impose
such restrictions, and thus we had to devise the following method to find those areas.

We assume that the radiation intensity of the antenna array, UGA
(θ, φ), is measured

at the points (θi, φj), i = 1, .., nθ, j = 1, .., nφ, where nθ and nφ is the size of the
set of θ and φ values, respectively. Let M be a matrix of size nθ × nφ, where each
index corresponds to a point at which UGA

(θ, φ) is measured. We define

Mij =

1, if UGA
(θi, φj) > Umax − 10,

0, UGA
(θi, φj) ≤ Umax − 10,

(5.1)

where Umax is the peak of the radiation pattern of the reference field. In other
words,Mij is set to one if the corresponding point in UGA

(θi, φj) is larger than the
main lobe threshold and to zero otherwise. Therefore, there is a large connected
area of ones around the index corresponding to Umax, i.e. where our main lobe
is, and potentially other smaller areas of ones as well. We now want to remove all
smaller connected areas such that only the one corresponding to the main lobe is left.

Convert M to a graph where Mij is a neighbour to all values in
{Mi+1,j,Mi−1,j,Mi,j+1,Mi,j−1} that exist. If (i, j) is an index at the edge of M
then some of the four neighbours won’t exist. On this graph we perform a modified
breadth first search starting in the index which corresponds to Umax. The breadth
first search only cares about the indices which have value one, and ignores indices
with value zero, and thus we never explore outside the main lobe.

The process of finding the side lobe area is similar but not identical. Each element
of M is set to one if the corresponding point of the radiation pattern is instead
larger than the side lobe threshold, Umax−12. The breadth first search is performed

33


5. Method

the same as for the main lobe area, and then the side lobe area can be found as the
complement set of the chosen indices.

5.4 Optimization Algorithms
Many optimization algorithms were considered for this project, but the ones chosen
were genetic algorithms and particle swarm optimization. We mainly investigated
stochastic optimization methods as they generally perform well for problems with
complex objective functions, and there are a plethora of methods with different
strengths and weaknesses that can be utilized. One reason we chose GA and PSO
specifically was due to them being successfully applied to similar problems such as
by Boeringer and Werner [13] who used both PSO and GA for optimizating beam
weights and Duzel, Saoud, Shayea, et al. [14] who made a comprehensive review of
the use of GA to optimize the phase of beam weights. The version of GA we have
used was designed by M L Shahab, F Azizi, B A Sanjoyo, et al. [10] to perform
better in high dimensions, which is of import since our thesis concerns specifically
large antenna arrays.

5.5 Search Space
The search space of an optimization problem is the subset of all feasible solutions
that we allow the optimizer to explore. Therefore, the dimension of our search space
is the number of parameters that we need to optimize, which depends on several
factors. These include the size of the antenna, the rank of the beam weights, which
we discuss in Section 5.5.1, and whether we are performing single- or dual polarized
beamforming. For full rank dual polarized beamforming we need to optimize two
m × n matrices of complex beam weights. Furthermore, we regard the phase and
amplitude of each beam weight as separate variables during optimization, and there-
fore, have a total of 4 ·m · n values to optimize. For single-polarized beamforming
the dimension instead becomes 2 ·m · n.

Another option is to represent them as the real and imaginary part, however, that
would have complicated certain calculations, as we wanted the option to constrain
the amplitudes, and in certain cases keep the amplitudes fixed during optimization.

5.5.1 Low-Rank Restrictions
Let A be an m × n matrix. If A has exactly r linearly independent columns, then
A is said to be of rank r. This means that it is possible to express A as

A = BC (5.2)

where B is an m× r matrix and C is an r × n matrix. This implies that A, which
has mn variables, can be expressed as the matrix product of B and C, which have
a total of r(m + n) variables. In the case where r is small, this can be a significant

34


5. Method

reduction. If r = min{m, n}, we say that A is of full rank.

We reduce the dimension of the problem by restricting each beam weight matrix to
be of low rank. For each beam weight matrix BW this is accomplished by optimizing
two matrices, BW1 and BW2, of sizes m× r and r × n respectively. We assume

BW = BW1BW2, (5.3)

i.e., that BW is of rank r. BW1 and BW2 are complex valued matrices, but as
explained in Section 5.5, we do not optimize the complex values, but rather the
phases and amplitudes separately. Therefore, the values we optimize are arg BW1
and arg BW2, and the amplitudes |BW1| and |BW2|. This means that the num-
ber of optimized parameters is 2r(m + n) for each beam weight matrix. Thus, the
dimension of the search space is 2r(m + n) for single-polarized beamforming and
4r(m + n) for dual-polarized beamforming.

For large antenna arrays, this significantly reduces the number of variables to op-
timize, and thereby speeds up the optimization algorithms. Previous work done at
Ericsson has used beam weights of low rank, and the initial tests of lower ranks
yielded positive results. To limit the scope of the project, we restricted ourselves to
only investigating rank 1 solutions in depth, and most decisions were made based
on results for the rank 1 case.

5.5.2 Initialization

When generating random starting points for GA and PSO, we decided between two
different methods. The alternatives were generating beam weights and encoding
them to our search space or generating directly in the search space. The difference
is that the generated beam weights must be encoded to the search space, which
has low rank, and thus information will be lost in the encoding. We wanted to see
how this affected the distribution when they are decoded compared to the values
generated directly in the search space.

We generated 20 sets of beam weights for an antenna array with 288 subarrays,
i.e., 5760 beam weights. Their representations after they are encoded and decoded
are shown in Figure 5.2a in a phase-amplitude space. We also generated the same
number of beam weights directly in a rank 1 search space, and their representations
after they are encoded are shown in Figure 5.2b. The values generated directly in
the search space have a more spread out distribution compared to the generated
beam weights, and thus we chose to generate directly in the search space, as the
goal is to have the initial values cover as large area as possible to easier be able to
find the optimum. We also want amplitudes to be close to the maximum allowed
value in the final beam weights as this allows more of the available energy to be
utilized.

35


5. Method

0.0 0.2 0.4 0.6 0.8 1.0
Amplitude

−3

−2

−1

0

1

2

3

Ph
as
e

Distribution of Generated Beam Weights

(a) 5760 uniformly generated beam
weights that have been encoded to
rank 1 and then decoded.

0.0 0.2 0.4 0.6 0.8 1.0
Amplitude

−3

−2

−1

0

1

2

3

Ph
as

e

Distribution of Beam Weights Generated 
 in Search Space

(b) 5760 decoded beam weights gen-
erated uniformly in the rank 1 search
space.

Figure 5.2: Comparison of two ways of initializing beam weights. Each point
represents one beam weight.

5.6 t-Tests
Paired t-tests are used to test if there is a significant difference in mean value between
two sets of observations, X0 and X1, which are continuous and normally distributed.
There are two kinds of t-tests which are applicable in different situations depending
on the dependence between observations in X0 and X1 [15].

5.6.1 Paired t-Tests
Paired t-tests are used to compare the mean values of two sets of observations, X0
and X1, when we can pair each observation in X0 with one in X1 in such a way
that for these two observations all controllable variables are identical except for the
ones being tested. By calculating the mean difference of each pair, we obtain a
more accurate measure of the effect of the changed variable and reduce the impact
of other environmental variables and differences between subjects [15].

More specifically, paired t-tests are used to test the null hypothesis H0 : µ1−µ0 = 0
under the assumptions that X0 and X1 are drawn from two continuous normal dis-
tributions, and each sample of X0 is paired with a sample of X1 in such a way that
the pairs Xj = {X0j, X1j} j = 1, 2, ..., n are independent, but the data within each
pair is correlated [15].

We calculate Xdj = X1j −X0j and Xd =
∑n

j=1 Xdj

n
, and estimate the variance of Xd

as S2
d = 1

n(n−1)
∑n

j=1(Xdj −Xd)2. Thus, we can calculate

t = Xd

Sd

, (5.4)

36


5. Method

and the p-value is calculated via the student’s t-distribution with n − 1 degrees of
freedom [15]. The exact calculation depends on the formulation of the alternative
hypothesis, and the variants we use are presented in Section 5.6.3.

5.6.2 Two-Sample t-Tests
Two-sample t-tests are used when all observations in both X0 and X1 are indepen-
dent of each other. Aside from that, all other assumptions made for paired t-tests
hold for two-sample t-tests as well [15].

We therefore calculate the estimated variance for both sets of observations, S2
0 and

S2
1 , and if the true variances are equal we can estimate the variance of Xd as S2

d =
( 1

n0
+ 1

n1
) (n0−1)S2

0+(n1−1)S2
1

n0+n1−2 , where n0 and n1 are the number of observations in X0
and X1. Thus, we calculate the test statistic

t = Xd

Sd

, (5.5)

and the p-value is calculated via the student’s t-distribution with n0 +n1−2 degrees
of freedom [15]. The exact calculation depends on the formulation of the alternative
hypothesis, and the variants we use are presented in the next section.

5.6.3 Use of t-Tests
We use t-tests to compare the distribution of the total loss for optimized beam
weights for two methods of optimization. The observations in X0 are the total loss
of beam weights optimized using the standard or current method, and they are
compared against the observations in x1 from the alternative method. Using an
alternative hypothesis H1 we determine whether the alternative method results in
a statistically significant change in the total loss. All p-values are also presented in
Table A.1 in Appendix A or Table B.4 in Appendix B depending on which chapter
they were calculated in.

We use two different alternative hypotheses H1. H1 : µ0 > µ1 is mostly used in
Chapter 5 when we investigate alternatives to a standard method. This is because
in those cases we are only interested in if the alternative method results in a de-
creased loss. This is a lower-tailed test and the p-value is calculated as P (T < t),
where T is the student’s t-distribution with degree’s of freedom as specified above,
and t is the test statistic. H1 : µ0 ̸= µ1 is used mainly in Chapter 6 when we
investigate differences between different cases of using our optimization algorithms.
In those cases, it is also interesting if a case causes the loss to significantly increase.
This case is two-tailed and therefore the p-value is calculated p = 2 · P (T > |t|).
We always discard the null hypothesis in favour of the alternative hypothesis if the
p-value is less than 0.05.

37


5. Method

The results of paired t-tests are represented as boxplots of Xdj and above the boxes,
we use asterisks to represent the p-values where the number of asterisks n signify a
p-value less than 10−n, with a maximum of three asterisks.

In some cases there are separate t-tests for each loss component as well as the total
loss, i.e., side lobe loss plus main lobe loss, but it is the p-values for the total loss
that are the deciding factor for which method we choose to use.

The default settings for the optimization used in the following tests were rank 1
single-polarized beamforming for an antenna array with 288 subarrays.

5.6.4 Side Lobe Threshold
One alternative version of calculating the side lobe loss was tested. Instead of defin-
ing the side lobe area using the side lobe threshold Umax − 12 [dBm], it was defined
using the threshold U ′

max − 12 [dBm], where U ′
max is the maximum value of the ra-

diation pattern corresponding to the optimized beam weights and not the reference
field. The reason behind this alternative method was that even if the optimized
beam did not reach the same amplitudes as the reference field, we could ensure that
the side lobes were still suppressed below −12 [dBm] relative to the optimized curve.
However, if the optimization managed to reach the same amplitudes levels as the
reference field, this change should be no different from the current method.

We compared the two methods for defining the side lobe threshold using a paired
t-test. The current method uses a side lobe threshold relative to the reference field,
and the alternative method is relative to the optimized field. We correlated the pairs
{X0j, X1j} j = 1, 2, ..., 20 by using the same initial values for the optimization. The
results of the paired t-test are presented in Appendix A and neither GA nor PSO
showed a significant decrease in total loss when the side lobe areas were calculated
relative to the optimized curve, as opposed to relative to the reference field. An
important detail is that the total loss for both tests was evaluated using the same
objective function, where the side lobe area is relative to the reference field. This
is because a comparison using two different objective functions would not give any
meaningful results. An interesting detail is that taper loss showed a decrease for both
PSO and GA, with p < 0.05. Since the side lobe defined relative to the optimized
beam did not result in a better performance, we decided that it would not be used.

5.6.5 Constrained vs Unconstrained Optimization
As previously mentioned, the amplitudes of the beam weights are constrained. The
beam weights are therefore always normalized before being returned from the opti-
mization to ensure this condition holds. However, we can choose to either constrain
the amplitudes during optimization as well, never letting them exceed the maximal
value, or let them be unconstrained and normalize at the end. We refer to these
variants as constrained and unconstrained optimization, respectively.

38


5. Method

For this paired t-test we used unconstrained optimization as the current method
and constrained method as the alternative method for PSO as it is most commonly
unconstrained. However, for GA the opposite is true, it is usually constrained,
and thus we set the alternative method as unconstrained optimization for GA. We
correlated the pairs {X0j, X1j} j = 1, 2, ..., 20 by using the same initial values for the
optimization. The result of the test is shown in Appendix A and the difference was
not significant for either PSO or GA, and so we chose to continue using unconstrained
PSO and constrained GA.

5.6.6 Warm Starts

When solving a problem, usually an optimization problem, solutions to nearby prob-
lems can sometimes be used as a warm start to facilitate and speed up the opti-
mization process [16]. In our case, we can use traffic beam BWs as warm starts to
find BWs for wide beams. More specifically, the approach we tested was to use the
traffic beam weights, which were used to create the reference fields. We compared
the distance in the rank 1 search space between a given wide beam BW and the
BWs of the traffic beams used to create the reference field. The Euclidean distance
between these was shorter than the distance between the wide beam BWs and ran-
domly generated starting points.

We performed a paired t-test with random initialization as described in Section 5.5.2
as the current method and initialization where half the initial positions were random
and half were traffic beams randomly selected from the traffic beams used to create
the reference field as the alternative method. The pairs {X0j, X1j} j = 1, 2, ..., 20
were correlated by using the same values for half of the initial positions for the op-
timization.

The results of the test are shown in Appendix A. For GA, we saw that p < 0.05 and
the null hypothesis was discarded in favour of the alternative hypothesis. However,
for PSO, the null hypothesis could not be discarded. So, warm starts give a statis-
tically significant decrease of the total loss for GA, but no significant change for PSO.

Since the warm start method showed promise for GA, we decided to try adding noise
onto the encoded traffic beam BWs. This was mostly to avoid GA getting stuck in
local optima because too many of the initial values were too close to each other.
For this purpose, we added Gaussian noise. For the amplitudes we subtracted the
absolute value of the noise which had mean value zero and standard deviation α
and for the phases we added noise with mean zero and standard deviation π · α for
varying values of α. The noise levels we tried were α = {0.0, 0.1, 0.25, 0.5} where
α = 0.0 means there is no noise, just the traffic beam weights. The results of the
test are shown in Figure 5.3 and Appendix A and no tried value of α resulted in
p < 0.05. We decided to let α be optimized by the hyperparameter tuning in Section
5.7 as the results of this test were inconclusive. If no noise is the best solution, then
the Bayesian optimization should hopefully set α to zero.

39


5. Method

α = 0.1 α = 0.25 α = 0.5
−1.0

−0.5

0.0

0.5

1.0

1.5

To
ta

l L
os

s

1e−3
Pairwise Difference of Total Loss for GA Relative α = 0.0

Figure 5.3: Result of paired t-test for GA for initial positions that are half random
and half traffic beam weights. α is the level of noise added unto the traffic beams.

5.7 Hyperparameter Tuning
PSO has 6 hyperparameters that need to be chosen by the user. To perform a grid
search with 3 different values for each hyperparameter would require 36 runs of the
algorithm. PSO takes approximately 10 minutes per run, which would mean that
the total run time would be about 10 · 36 minutes, which is equal to approximately
122 hours. However, testing only three values per hyperparameter is not a very
thorough search, but if we want to test 10 values per hyperparameter, the runtime
becomes 19 years. This is clearly an infeasible solution. Instead, we used Bayesian
optimization to tune the hyperparameters for GA and PSO, which gave us preciser
results and more manageable runtimes.

We used Bayesian Optimization to select all hyperparameters for each of the meth-
ods, choosing the set of hyperparameter values that minimized the objective function
for each method. For GA, this included the standard deviation α of the noise added
to the traffic beams in the warm starts, as described in section 5.6.6. We used µ0 = 0
and σ0 constant for the prior.

For each hyperparameter tuned, we allowed the algorithm to test values within the
feasible interval for that hyperparameter. For hyperparameters where the feasible
interval is infinite, we limited the interval to a subset of the feasible interval which
we deemed reasonable. In one instance, we found that one hyperparameter was
chosen that was at the very edge of the interval we had chosen. We then extended
that interval and ran the optimization again to prevent confining hyperparameter
values to smaller intervals than necessary.

We tuned the hyperparameters specifically for rank 1 single-polarized beamform-
ing for an antenna array with 288 subarrays. Despite using Bayesian optimization,
hyperparameter tuning was a lengthy process, and we performed it only for these
default settings.

40


5. Method

The chosen hyperparameter values are omitted from this report for confidentiality
reasons.

5.8 Extension to Dual-Polarized Beamforming
In all previous tests, we have handled only the single-polarized case. However,
we extended our models to work for dual-polarized beamforming as well through
minor changes to certain key functions. We chose to make no other changes to the
algorithms, and used the same optimization algorithms and objective function as
for single-polarized beamforming. The cases are similar and running all tests again
for dual-polarized beamforming would have been too time-consuming.

41


5. Method

42


6
Results

In this chapter, we present the results from a number of tests that were run to
investigate the behaviour of our developed optimization methods. We begin with
results that show the general performance of the algorithms, which we then compare
to the method currently in use at Ericsson. We subsequently study the behaviour
of the methods when the dimension of the problem varies, before investigating the
stability of the results when exposed to varying levels of noise. Finally, we in-
vestigate the use of warm starts to expedite the optimization process, and present
the time taken to optimize all necessary wide beams for an exemplary antenna array.

The results presented in this section were all produced using the same reference field
unless explicitly stated otherwise. This is to reduce the number of external factors
since we are mainly interested in comparing the algorithms to each other. The data
used was simulated using Ansys HFSS to represent the ESAPs of an antenna array
with 288 subarrays.

When we refer to GA and PSO in this chapter we mean the optimization algorithms
as detailed in Chapter 5. Furthermore, when a run of either GA or PSO is referenced
it will, if nothing else is stated, be single-polarized beamforming with rank 1 search
space and run for 20 or 6000 iterations, respectively. These numbers were chosen
to produce similar run times to enable a fair comparison between the algorithms.
In some cases the results were run until convergence, meaning that we allowed the
algorithm to run until the convergence criterion was fulfilled, but for a maximum of
60 or 8000 iterations for GA and PSO, respectively. These numbers were chosen to
ensure that each run stopped eventually, while still allowing most runs to fulfill the
convergence criterion. The convergence criterion is satisfied if the total loss improves
by less than 10−5 for 2 consecutive iterations of GA, or for 1000 consecutive iterations
for PSO. All tests were performed with an NVIDIA A40-48Q.

43


6. Results

6.1 Performance

We ran GA and PSO 20 times each to investigate their performance and present the
results here.

6.1.1 Example Beams

Figure 6.1 shows example beams from the results of GA and PSO. They were chosen
as the beams with loss values closest to the median. The main lobes of the resulting
radiation patterns are shown along with the main lobe of the used reference field.
In Figure 6.2 we show cuts in elevation and azimuth of the same beam to allow for
a closer comparison of the beams and the reference beam.

The results of the two algorithms are similar enough in performance that it is difficult
to decide which algorithm is better from a purely visual inspection. To facilitate a
comparison, we present the mean of each loss component and the total loss in Table
6.1, as well as the mean runtime.

-70 -40
ϕ (deg)

80

110θ 
(d

eg
)

Optimized Pattern

-70 -40
ϕ (deg)

80

110θ 
(d

eg
)

Reference Pattern

36

38

40

42

44

EI
RP

 (d
Bm

)

(a) The main lobe of the beam opti-
mized by GA and the reference beam.

-70 -40
ϕ (deg)

80

110θ 
(d

eg
)

Optimized Pattern

-70 -40
ϕ (deg)

80

110θ 
(d

eg
)

Reference Pattern

36

38

40

42

44

EI
RP

 (d
Bm

)

(b) The main lobe of the beam op-
timized by PSO and the reference
beam.

Figure 6.1: Comparisons of main lobes of beams optimized by GA and PSO and
the reference beam.

44


6. Results

−90 −60 −30 0 30 60 90
ϕ (deg)

0

10

20

30

40

EI
RP

 (d
Bm

)
Cut at θ=100

Optimi(ed curve
Refere ce curve
Side lobe threshold
Mai  lobe threshold

(a) Cut in azimuth of beam optimized
by GA and reference beam.

0 30 60 90 120 150 180
θ (deg)

−20

−10

0

10

20

30

40

EI
RP

 (d
Bm

)

C)( a( ϕ= -45

Optimized curve
Reference curve
Side lobe threshold
Main lobe threshold

(b) Cut in elevation of beam opti-
mized by GA and reference beam.

−90 −60 −30 0 30 60 90
ϕ (deg)

0

10

20

30

40

EI
RP

 (d
Bm

)

Cut at θ=100
Optimi(ed curve
Refere ce curve
Side lobe threshold
Mai  lobe threshold

(c) Cut in azimuth of beam optimized
by PSO and reference beam.

0 30 60 90 120 150 180
θ (deg)

−20

−10

0

10

20

30

40

EI
RP

 (d
Bm

)

C)( a( ϕ= -45

Optimized curve
Reference curve
Side lobe threshold
Main lobe threshold

(d) Cut in elevation of beam opti-
mized by PSO and reference beam.

Figure 6.2: Comparisons between cuts of beams (blue lines) optimized by GA and
PSO, and the reference beam (pink line). Main and side lobe thresholds are shown
as well (grey and black line respectively).

6.1.2 Loss Distribution and Runtimes
The run times were similar between GA and PSO, and the two algorithms produced
results where the mean total loss is in the same order of magnitude. Interestingly,
GA appears to be better at minimizing side lobes, while PSO appears to be better
at optimizing the main lobe shape. Although similar, GA has a slightly lower total
loss. Taper loss is omitted from this table, but can be found in Appendix B.

Table 6.1: Mean loss and runtime over 20 runs of GA and PSO.

Main lobe loss Side lobe loss Total loss Runtime
GA 1.15 · 10−3 2.08 · 10−4 1.35 · 10−3 10 m 54 s
PSO 2.63 · 10−4 1.58 · 10−3 1.85 · 10−3 11 m 16 s

To further study the performance of GA and PSO we investigated the distribution
of the final loss, i.e. the best loss in the final iteration, using the standard deviation

45


6. Results

of the final loss in Table 6.2 and boxplots of the final loss in Figure 6.3. Similar to
the mean values, the standard deviations of the total loss are of the same order of
magnitude for GA and PSO.

Main