Learning Joint Synchronization, Equal- ization, and Decoding in Short Packet Communications Master’s thesis in Information and Communication Technology Xi Zhang DEPARTMENT OF ELECTRICAL ENGINEERING CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2024 www.chalmers.se www.chalmers.se Master’s thesis 2024 Learning Joint Synchronization, Equalization, and Decoding in Short Packet Communications XI ZHANG Department of Electrical Engineering Chalmers University of Technology Gothenburg, Sweden 2024 Learning Joint Synchronization, Equalization, and Decoding in Short Packet Com- munications XI ZHANG © XI ZHANG, 2024. Supervisors: Giuseppe Durisi, Department of Electrical Engineering Khac-Hoang Ngo, Department of Electrical Engineering Examiner: Giuseppe Durisi, Department of Electrical Engineering Master’s Thesis 2024 Department of Electrical Engineering Chalmers University of Technology SE-412 96 Gothenburg Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria Gothenburg, Sweden 2024 iv Learning Joint Synchronization, Equalization, and Decoding in Short Packet Com- munications XI ZHANG Department of Electrical Engineering Chalmers University of Technology Abstract The rapid evolution of cellular communication technologies necessitates improve- ments to support emerging applications like autonomous driving and remote medical surgery. Ultra-Reliable Low Latency Communications (URLLC), a key scenario in 5G, demands stringent latency and reliability, with even more rigorous requirements expected in 6G. Traditional communication systems using dedicated preambles for detection, synchronization, and channel estimation is suboptimal for short packet transmissions, highlighting the need for innovative approaches. This thesis investigates the potential of deep learning (DL) techniques in enhancing short packet communications. By designing an autoencoder-based joint synchroniza- tion, equalization, and decoding scheme, the system jointly learns the transmitter and receiver end-to-end for the tasks of synchronization, equalization, and decod- ing without relying on a dedicated preamble. The objectives include developing an autoencoder-based communication scheme, extending it for joint equalization and decoding, and proposing a joint synchronization, equalization, and decoding scheme under block fading waveform channels. The findings demonstrate that an end-to-end learning approach using a convolu- tional neural network-autoencoder (CNN-AE) improves spectral efficiency and re- duces overhead in short packet communications while maintaining system reliabil- ity. The proposed system, without using dedicated preambles, outperforms the nonasymptotic achievability bound for pilot-assisted transmission systems in terms of block error rate (BLER) at high signal-to-noise ratios (SNRs). This highlights the potential of DL techniques in addressing the challenges of short packet commu- nications in future wireless networks. Keywords: Short packet communications, Deep learning, Autoencoders, Joint syn- chronization and decoding. v Acknowledgements I would like to express my deepest gratitude to my supervisors, Giuseppe Durisi and Khac-Hoang Ngo, for their invaluable guidance, insightful feedback and sup- port throughout my journey in completing this master thesis. I would also like to thank Alireza Bordbar for his suggestions and discussions, and Christian Häger for providing access to the computation resources, which were es- sential for the completion of this thesis. The computations of the study were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. Xi Zhang, Gothenburg, August 2024 vii List of Acronyms Below is the list of acronyms that have been used throughout this thesis listed in alphabetical order: 3GPP 3rd generation partnership project 5G The fifth generation AE Autoencoder ASIC Application specific integrated circuit AWGN Additive white Gaussian noise BCE Binary cross-entropy BER Bit error rate BLER Block error rate CCE Categorical cross-entropy CNN Convolutional neural network CNN-AE Convolutional neural network-autoencoder CPU Central processing unit DL Deep learning DSP Digital signal processing ELU Exponential linear unit FEC Forward error correction FPGA Field-programmable gate array GPU Graphics processing unit LDPC Low-density parity-check IoT Internet of things ISI Inter-symbol interference ML Machine learning MSE Mean squared error NN Neural network PMF Probability mass function PSK Phase shift keying QAM Quadrature amplitude modulation SGD Stochastic gradient descent SNR Signal-to-noise ratio SISO Single-input single-output TurboAE Turbo autoencoder URLLC Ultra-reliable low latency communications ix Contents List of Acronyms ix List of Figures xiii List of Tables xv 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Theory 5 2.1 Classical Communication System . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Classical Transmitter . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Classical Receiver . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.3 Channel Models and Fundamental Limits . . . . . . . . . . . . 10 2.2 Deep Learning Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.4 Gradient-based Learning . . . . . . . . . . . . . . . . . . . . . 16 3 Methods 17 3.1 An AE-based Communication System . . . . . . . . . . . . . . . . . . 17 3.2 An AE-based Joint Synchronization, Equalization, and Decoding Sys- tem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Results 29 4.1 Performance of CNN-AE . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 Performance of CNN-AE under AWGN Channel . . . . . . . . 29 4.1.2 Performance of CNN-AE under Block-fading Channel . . . . . 30 4.2 Performance of CNN-AE-based Joint Synchronization, Equalization, and Decoding System . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 Synchronization Performance . . . . . . . . . . . . . . . . . . 33 4.2.2 Decoding Performance . . . . . . . . . . . . . . . . . . . . . . 34 xi Contents 5 Conclusion 37 5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 xii List of Figures 2.1 Illustration of a simple communication system. . . . . . . . . . . . . . 5 2.2 Block diagram of a conventional transmitter. . . . . . . . . . . . . . . 6 2.3 Block diagram of a conventional receiver, considering the task of syn- chronization, equalization, and decoding. . . . . . . . . . . . . . . . . 8 2.4 A simple model of block fading channel. . . . . . . . . . . . . . . . . 12 2.5 An autoencoder composed of two NNs. . . . . . . . . . . . . . . . . . 15 3.1 The structure of a CNN-AE based communication system, where the traditional transmitter and receiver is replaced by CNNs. . . . . . . . 18 3.2 Block diagram of the transmitter part of the CNN-AE-based system model, the blue blocks indicates the trainable parts. . . . . . . . . . . 23 3.3 Block diagram of the receiver part of the CNN-AE-based system model. 23 3.4 The iteration steps of the equalization and decoding at the receiver. . 26 4.1 Simulated BER under AWGN channel with k = 64, n = 128. . . . . . 30 4.2 Simulated BLER under AWGN channel with k = 64, n = 128. . . . . 31 4.3 Simulated BLER under the block-fading channel with nb = 4, k = 64, n = 128. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 Synchronization error comparison. . . . . . . . . . . . . . . . . . . . . 34 4.5 Achievable BLER comparison. . . . . . . . . . . . . . . . . . . . . . . 35 xiii List of Figures xiv List of Tables 3.1 Parameters of the CNN-AE, each Conv1D layer is followed by a batch normalization layer before activation to help the model con- verge quickly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Hyperparameters for the training of CNN-AE under the AWGN chan- nel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Parameters of the CNN-AE-based joint synchronization, equalization, and decoding system. . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Hyperparameters for the training of CNN-AE-based system. . . . . . 28 4.1 Parameters for simulation. . . . . . . . . . . . . . . . . . . . . . . . . 33 xv List of Tables xvi 1 Introduction 1.1 Background Cellular communication technologies are going through a revolutionary improve- ment every ten years to support increasing demands of applications, from phone call to videostream and then to the Internet of Things (IoT). The fifth generation (5G) has been commercially deployed and supports various emerging applications like autonomous driving [1], remote medical surgery [2], and intelligent transport systems [3]. Ultra-reliable low latency communications (URLLC), identified as one of the key usage scenarios in 5G, plays an essential role in providing stringent latency and reli- ability guarantees for mission-critical applications. The 3rd generation partnership project (3GPP) defines that URLLC is expected to provide 99.999% reliability for a single transmission of short packet with an end-to-end latency of less than 1 ms [4]. In 6G, URLLC will enable even more demanding applications, such as tactile inter- net and augmented reality, with the requirement of even lower latency (25 µs --1 ms) and lower block error rate (BLER) (10−5 --10−7) [5]. In URLLC applications, the transmitted messages are rather small. To satisfy such challenging requirements of short packet transmissions, new technical enablers need to be adopted for latency reduction and reliability enhancement [6]. In conventional communication systems, a dedicated preamble is added to the data packet to per- form detection, synchronization, and channel estimation. However, in short-packet communications, such preamble-based transmission might be suboptimal due the limited size of the data packet [7], [8]. From an information-theoretic perspective, the results in [8] demonstrate that for a binary-input additive white Gaussian noise (AWGN) channel, performing detection using a dedicated preamble is highly suboptimal in the short-packet regime, even when the length of preamble is optimized. Instead, joint detection and decoding, where the receiver detects the information packet without relying on a dedicated preamble, yields significant gains in terms of the maximum coding rate over the preamble-based detection scheme. Naturally, this leads to the questions: how can we perform joint detection and decoding in practice? How does the performance compare to the theoretical bounds for preamble-based scheme? 1 1. Introduction Machine learning (ML) techniques are expected to be essential to assist in design- ing future communication networks. Traditional communication systems divide the transmitter and receiver into different processing blocks and rely on specific channel and system modeling assumptions, that enable tractable mathematically analysis. However, emerging complex communication scenarios are difficult to describe with tractable mathematical models. In this case, machine learning techniques can di- rectly learn and optimize from data and are not constrained by modeling assump- tions [9]. In recent years, end-to-end learning of communications systems has become a promis- ing concept to improve the reliability of block-based transmissions [10]. Different from conventional communications systems, this approach represents transmitter and receiver as one deep neural network (NN) that is optimized for an end-to-end performance metric. This can be achieved by interpreting the whole system as an au- toencoder which is trained in a supervised manner using stochastic gradient descent (SGD). Such an autoencoder-based system breaks up restrictions in conventional block based signal processing and has shown great benefits in terms of reliability [11], [12]. In [13], Turbo autoencoder (TurboAE) is introduced. It outperforms the state-of-art codes under the AWGN channel model in terms of bit error rate. The authors of [14] propose a convolutional neural network autoencoder (CNN-AE) which approaches the theoretical maximum achievable rate over the AWGN channel. In this thesis, we investigate the potential of applying deep learning (DL) techniques in short packet transmissions. We focus on designing an autoencoder-based joint synchronization, equalization, and decoding scheme. We aim to compare the end- to-end performance with the theoretical bound for the pilot-assisted transmission system. 1.2 Objectives During the thesis, we aim to: • Design a DL-based communication scheme for short packets, and investigate its performance compared with achievability bounds and state-of-art channel codes under an AWGN channel. • Extend the DL-based scheme to handle both equalization and decoding, then compare the scheme’s performance with an achievability bound under a block fading channel. • Propose a DL-based joint synchronization, equalization, and decoding scheme, and compare both the synchronization and decoding performance with an achievability bound for pilot-assisted transmission scheme under a block fading waveform channel with imperfect synchronization. 2 1. Introduction 1.3 Limitations The work primarily focuses on the simulations for deep learning applications in physical layer of communication systems. Consequently, hardware implementations have not been considered. Modern communication systems mainly use application- specific integrated circuit (ASIC) and field-programmable gate array (FPGA), while neural networks need to be processed on central processing units (CPUs) and graph- ics processing units (GPUs). The performance of NN-based systems is limited by the computation and memory capabilities of the hardware. Another limitation is that practical channels have not been considered in this thesis work. End-to-end learning communications systems require a differentiable channel model, so that gradients can be backpropagated through the channel. However, in practice, the channel is generally a black box, where only inputs and outputs can be observed. Several works have been done for solving this problem in [11], [15], but they are not covered in this thesis. 1.4 Thesis Outline The thesis is divided into five chapters. Chapter 2 briefly introduces the classical communication system, as well as the basics of deep learning. Chapter 3 presents the considered transmission scenario, and the proposed autoencoder-based scheme, along with the training procedure and parameters. Chapter 4 introduces the per- formance comparison between the proposed joint synchronization, equalization and decoding scheme and the achievability bound for pilot-assisted transmission in the finite block-length regime. Finally, a brief summary and discussion of the contribu- tion is presented in Chapter 5. 1.5 Notation We denote scalar random variables by upper case letters, such as X, and their realizations by lower case letters, e.g., x. Bold-faced upper case letters denote ran- dom vectors, e.g., X, and their realizations are denoted by lower case letters, e.g., x. The superscripts (·)T , (·)H and (·)∗ denote transposition, Hermitian transpo- sition, and complex conjugation, respectively. We denote the set of real numbers by R and the set of complex numbers by C. The distribution of a complex Gaus- sian random variable with mean µ and variance σ2 is denoted by CN (µ, σ2). We write log (·) to denote the natural logarithm. Probabilities are written as P [·] and the expectation operation is denoted by E [·]. The notation ∥·∥ stands for the ℓ2- norm. For two functions f(n) and g(n), the notation f (n) = O (g (n)) means that lim supn→∞ |f (n) /g (n)| <∞. 3 1. Introduction 4 2 Theory This chapter provide a brief introduction to the general theory of communication systems and DL. 2.1 Classical Communication System The objective of a communication system is to transmit information reliably from one point to another. A classical single-input single-output (SISO) communication system includes a transmitter, a channel, and a receiver, as illustrated in Figure 2.1. These blocks are described as follows: • Transmitter: Encodes a message W into a codeword x ∈ Cn and transmits this codeword over n complex-valued channel uses. Here, W is assumed to be drawn uniformly from the alphabet { 1, 2, ..., 2nR } , where R is the transmission rate. We consider the power constraint ∥x∥2 ≤ n. • Channel: Adds propagation distortion to the transmitted signal, specified by a conditional probability mass function (PMF) PY|X (y|x). • Receiver: Constructs an estimate Ŵ of the original message W from the noisy observation y ∈ Cn of the transmitted signal. The reliability is mea- sured by the block error probability Pe = Pr [ Ŵ ̸= W ] . The following subsections will provide a detailed explanation for each parts in the system. Figure 2.1: Illustration of a simple communication system. 5 2. Theory 2.1.1 Classical Transmitter Figure 2.2 shows the basic components of a transmitter in a general communication system. The transmitter feeds the binary representation of its message to a channel encoder, then the coded bits are mapped to the real or complex-valued symbols specified by a modulation format. After that, a dedicated preamble sequence (also called pilot) is added in front of the symbols. The preamble is then used for syn- chronization and channel estimation at the receiver. Finally, through pulse shaping, both preamble sequence and payload are transformed into a waveform transmitted through the channel. We next describe each component. Figure 2.2: Block diagram of a conventional transmitter. Channel Encoder The objective of the channel encoder is to introduce redundancy to the transmit- ted bit sequence so that the receiver can correct errors introduced by the channel. Consider the encoder as a function f : { 1, ..., 2k } → X n that maps the message m ∈ { 1, ..., 2k } into a codeword xn (m) = [x1 (m) , ..., xn (m)], where X = {0, 1} and xl (m) ∈ X , l = 1, ..., n. Here, m is drawn uniformly from the message set { 1, ..., 2k } , and 2k is the number of codewords in the codebook { xn (1) , ..., xn ( 2k )} . Each mes- sage is described using k bits, the code rate Rc is defined as Rc = k/n ≤ 1, which represents the spectral efficiency. By mapping 2k possible information sequences to a larger space of 2n, the dis- tance between the valid codewords and arbitrary binary sequences increases. This enhances the likelihood of reconstructing the correct codeword from a noisy obser- vation. Channel codes are often referred to as forward error correction (FEC) or error correction codes. Over the years, different types of codes have been invented, starting with classical codes such as Hamming codes and convolutional codes, and evolving to modern coding schemes like low-density parity-check (LDPC) codes [16], and Turbo codes [17]. These modern codes are widely used in current communica- tion systems due to their near Shannon limit performance. 6 2. Theory Mapper After the channel encoder, the mapper maps each codeword to a real or complex- valued symbol specified by a constellation, denoted as C = {c1, ..., cM} ⊂ CM . For a constellation with M symbols, log M denotes the number of bits per symbol. Each symbol is defined as c = cI + jcQ, where cI represents the in-phase component and cQ the quadrature component. Common choices of constellation are phase shift keying (PSK) and quadrature am- plitude modulation (QAM). A higher order constellation (larger M) allows more bits per symbol, enhancing spectral efficiency. However, the Euclidean distance between symbols decreases, leading to an increased error probability in symbol decoding under the presence of noise. Pilot Insertion Typically, in order to synchronize the transmitted signal and estimate the channel gain, a dedicated preamble sequence is added in front of the payload. The preamble sequence usually has good auto-correlation properties, such as m-sequences [18] and Zadoff-Chu preamble sequences [19]. Pulse Shaping After mapping the coded bits to symbols of the chosen constellation, pulse shaping filtering is usually employed to transform the symbols into the waveform, in order to limit the bandwidth and reduce the inter-symbol interference (ISI) [20]. Consider a simple case of performing square pulse shaping on transmitted symbols. The square pulse with normalized energy can be expressed as: s(t) = { 1√ tp , t ∈ [0, tp) , 0, otherwise , where tp denotes the size of the pulses. Subsequently, the transmitted signal can be obtained by convolving the symbols with the square pulse, i.e., x(t) = ns∑ k=1 xks (t− (k − 1) tp) , where ns is the length of the transmitted symbols. Square pulse shapes are not practical in nowadays communication systems as they require large amount of bandwidth. Common choices of the pulse shapes include sinc shapes, raised-cosine shapes, and root-raised-cosine shapes. Among them, the root-raised-cosine shapes result in a higher spectral efficiency. 2.1.2 Classical Receiver Figure 2.3 shows a block diagram of a conventional receiver. In this scenario, we consider that the transmitted signal experiences fading propagation with a random 7 2. Theory time delay, and both the transmitter and receiver have no prior knowledge of the channel. To reconstruct the transmitted message at the receiver, the received sig- nal needs to be processed by digital signal processing (DSP) blocks, which include synchronization, channel estimation, and equalization. • Synchronization: The receiver needs to synchronize the received signal by estimating the time delay of the start of the signal within the received sequence. • Channel estimation: Channel estimation involves estimating the charac- teristics of the channel between the transmitter and receiver. After synchro- nization, the receiver estimates the fading gains from the received signal to compensate for distortions induced by channel subsequently. • Equalization: Equalization is performed to mitigate the effects of ISI and noise introduced by the channel, thereby recovering the transmitted symbols. These steps can be performed in different sequences depending on the receiver de- sign, and the depicted process does not include all possible techniques in the receiver. In the following subsections, we will briefly review the algorithms from the literature that implement these DSP blocks. For more detailed reviews, we refer to [21]. Figure 2.3: Block diagram of a conventional receiver, considering the task of synchronization, equalization, and decoding. Synchronization In the first step, the synchronizer needs to estimate the time delay of the received signal. The estimation of the time delay can be achieved using either pilot-assisted or by blind methods. Here we focus on pilot-assisted estimation. Pilot-assisted synchronization involves embedding a known pilot sequence into the transmitted signal. Upon receiving the signal, the receiver calculates the cross- correlation between the received signal and the pilot sequence. The peak of the cross-correlation result indicates the presence of the pilot sequence in the received signal; thus, the position of the peak corresponds to the estimated time delay. By convolving the received signal with the dedicated pilot sequence and comparing the cross-correlation results to a certain threshold, the beginning of the signal can be 8 2. Theory identified, thus achieving synchronization of the received signal. This pilot-assisted method is robust and widely used in practical communication systems. Channel Estimator When the channel state information (CSI) is unknown at the receiver, a pilot se- quence is typically used for channel estimation. Consider a simple case where the received signal is perfectly synchronized, the signal can be expressed as: y(p) = Hx(p) + z(p), where x(p) is the pilot sequence of the signal, H is the random fading gain, and z(p) is the complex AWGN. To estimate H, we first derive expression of H as: H = ( x(p) )H y(p) ∥x(p)∥2 − ( x(p) )H z(p) ∥x(p)∥2 . Therefore, the maximum likelihood estimation (MLE) is applied as: ĥ = arg min h̃ ∥∥∥y(p) − h̃x(p) ∥∥∥2 = ( x(p) )H y(p) ∥x(p)∥2 . Equalizer Given the estimated channel gain from channel estimation, the equalizer works to remove the ISI and noise effects from the channel and recover the transmitted sym- bols. Common digital linear equalizers include zero-forcing (ZF) equalizer, minimum mean square error (MMSE) equalizer. Consider the following model: y = ĥx + z, where y is the received data signal, ĥ is the estimated channel gain, which is assumed to be perfectly estimated ĥ = h; x is the transmitted symbols, and z is the noise vector. The ZF equalizer fully inverses the impact of the channel. It applies the inverse of the channel gain as: GZF = (h∗h)−1 h∗ = h∗ |h|2 , and then, the estimated transmitted symbol can be expressed as: x̂ = GZFy = x + h∗ |h|2 z. It is important to note that ZF equalizers ignore the additive noise and may amplify noise, especially in scenarios where the channel gain is small. This can be mitigated by using an MMSE equalizer, which minimizes the mean square error between the output of the equalizer and the transmitted signals. 9 2. Theory 2.1.3 Channel Models and Fundamental Limits The channel introduces distortion to the transmitted signal. In this section, we describe two channel models considered in the thesis, and also the fundamental limits on these channels. AWGN Channel AWGN channel is a commonly used channel model in communication systems. Con- sider a discrete-time memoryless AWGN channel given by: y = x + z, where x is the input of the channel (i.e., the transmitted symbols) each with average energy per symbol Es, each element in z is independent and identically distributed and drawn from the complex Gaussian distribution with zero mean and variance N0. The signal-to-noise ratio (SNR) is defined as: SNR = ρ = Es N0 . The channel capacity, as given by Shannon’s theorem [22], is C = log (1 + ρ) bits/channel use, which indicates the ultimate limit of the information can be transmitted reliably over the channel, as the code length goes to infinity. Shannon’s channel coding theorem states the largest communication rate at which we can transmit messages over a channel with a vanishing error probability for suf- ficient large blocklength. This theorem proved that the maximum achievable rate for which the error probability ε→ 0 as blocklength n→∞ is C. However, for short packet transmissions where the blocklength is relatively small, Shannon’s channel capacity might be a loose upper bound on the achievable coding rate. In this scenario, finite blocklength information theory provides a more precise characterization. Finite Blocklength Information Theory In finite blocklength regime, achievability bounds (e.g., random coding union bound [23], and random coding union bound with parameter s) and converse bounds (e.g., metaconverse bound [23]) are provided. The achievability bound is an upper bound on error probability indicating the performance that can be achieved by suitable encoding and decoding schemes. In contrast, the converse bound is a lower bound on error probability representing the performance that cannot be outperformed by any choice of encoding and decoding schemes. The computation of both the achiev- ability and converse bounds is difficult. Therefore, asymptotic expansions of both 10 2. Theory bounds, such as the normal approximation [23] and the saddlepoint approximation [24], are often used to yield numerical approximation. The maximal coding rate R∗ (n, ϵ) is defined as the maximum rate that can be achieved for a fixed error probability ϵ and finite blocklength n. For various channels with capacity C, R∗ (n, ϵ) can be characterized as given in [23]: R∗ (n, ϵ) = C − √ V n Q−1 (ϵ) +O ( log n n ) , where Q−1 denotes inverse of Q-function and V is the channel dispersion defined as V = ρ (2 + ρ) / (1 + ρ)2. The term O (log n/n) comprises higher-order terms of order log n/n. For large blocklength n and small block error rates ϵ, the maximal coding rate R∗ (n, ϵ) approaches channel capacity, i.e., R∗ (n, ϵ) ≈ C. However, for short block- length, a more precise approximation can be derived. Consider the real-valued AWGN channel with noise variance σ2 = 1. The transmit- ted symbols x, with blocklength n, satisfy the power constraint ∥x∥2 ≤ nρ. The normal approximation for R∗ (n, ϵ) as given by [23] can be expressed as: R∗ (n, ϵ) ≈ C (ρ)− √ V (ρ) n Q−1 (ϵ) + log n 2n , where C (ρ) and V (ρ) denotes the Gaussian capacity and dispersion respectively: C (ρ) = log (1 + ρ) V (ρ) = ρ (2 + ρ) (1 + ρ)2 (log e)2 . The converse bounds, such as meta-converse bound, and the achievability bounds, including the Shannon cone-packing bound, κβ bound, and Gallager’s bound are detailed in [23]. The achievability and converse bounds for the binary-input AWGN (bi-AWGN) channel are detailed in [25]. Given the code parameters and the channel, the BLER can be calculated by these bounds. In this thesis, we will use the achievability bound as the benchmark to evaluated the BLER performance our proposed system. Block-Fading Channel A block-fading channel is shown in Figure 4.3. Consider the transmission of a sequence of n complex-valued symbols over a SISO memoryless block-fading channel with the number of fading blocks nb. The received symbols yi, i = 1, ..., n can be expressed as: 11 2. Theory Figure 2.4: A simple model of block fading channel. yi = Hjxi + zi, j = 1, ..., nb, where Hj, j = 1, ..., nb denotes the random fading gain for the jth fading block, and zi denotes the independent AWGN of the ith block. The coherence time is denoted as nc. The fading is constant during a block of nc symbols and independent from block to block. Large number of independent fading blocks increases the channel diversity while few number of blocks increases the chance for experiencing deep fad- ing [26]. In the context of transmission over block-fading channels, it is crucial for the trans- mitter, receiver, or both to have knowledge of the fading coefficient H. CSI known at the transmitter enables efficient power allocation strategies, such as waterfilling. When CSI is available at the receiver, it facilitates the decoding. In practice, CSI at the receiver is typically obtained by transmitting dedicated pilot sequences, which the receiver uses to estimate the channel. CSI at the transmitter can be acquired by feeding the channel estimates from the receiver back to the transmitter. However, transmitting pilot sequences introduces a rate loss, and establishing a feedback link incurs additional costs. Two common infinite blocklength performance metrics for communication over fad- ing channels are the ergodic capacity and the outage capacity. The ergodic capacity represents the maximum achievable rate of reliable communication over a fading channel, averaged over all channel states. The outage capacity, on the other hand, characterizes the maximum transmission rate at which the probability of the instan- taneous channel capacity falling below this rate does not exceed a specified outage probability ϵ. Consider the scenario where nb = 1, the outage probability for a given rate R can be expressed as: Pout (R) = P [ log ( 1 + |H|2 ρ ) < R ] . The outage capacity Cϵ is defined as the supremum of all rates R satisfying Pout ≤ ϵ. It is given by: 12 2. Theory Cϵ = sup {R : Pout ≤ ϵ} . The outage capacity Cϵ implies that, for every realization of the fading coefficient H = h, the channel behaves like an AWGN channel with a channel gain |h|2. In this context, communication with an arbitrarily small error probability is achievable for sufficiently large blocklength n if and only if the rate satisfies R < log ( 1 + |h|2 ρ ) . However, the work by [27] highlights that the expression log ( 1 + |h|2 ρ ) is meaning- ful only for sufficient large blocklengths. In the same paper, the maximum coding rate R∗ (n, ϵ) for a given blocklength n and block error probability ϵ is refined to account for finite blocklengths and can be expressed as: R∗ (n, ϵ) = Cϵ +O ( log n n ) , which holds regardless of whether CSI is available at the transmitter, receiver, or both. The normal approximation to the maximal achievable rate is also provided in [27] for single-antenna case. Additionally, the authors of [28] derive the achievability bounds and converse bounds on maximum coding rate over the multiple-antenna Rayleigh block-fading channel model. Both results provide the accurate performance metrics when blocklength is relatively small. 2.2 Deep Learning Basics This chapter provides a brief introduction to the general theory behind deep learning and autoencoder. 2.2.1 Neural Networks Neural networks are adaptive statistical models capable of representing complex functions through the composition of simple operations. Consider a feedforward NN, which is a function f (r0; θ) : RN0 → RNL that maps an input vector r0 ∈ RN0 to an output vector rL ∈ RNL through L iterative processing steps: rℓ = fℓ (rℓ−1; θℓ) , ℓ = {1, ..., L} , where L is the number of layers, and fℓ (rℓ−1; θℓ) : RNℓ−1 → RNℓ is the mapping carried out by the ℓth layer. This mapping depends on both the output vector rℓ−1 from the previous layer and the set of parameters θ = {θ1, ..., θL}. A commonly used layer is called dense layer, also known as fully-connected layer. It has the form: fℓ (rℓ−1; θℓ) = σ (Wℓrℓ−1 + bℓ) , 13 2. Theory where Wℓ ∈ RNℓ×Nℓ−1 is the weight matrix, bℓ ∈ RNℓ is the bias vector, and σ (·) is an activation function which introduces non-linearity of the output [29]. The set of trainable parameters for this layer is θℓ = {Wℓ, bℓ}. A fully-connected NN is a type of neural network where all layers are dense layers. In the fully-connected NN, each neuron in one layer is connected to every neuron in the subsequent layer, allowing for the efficient transmission of information through- out the network. Another popular NN is the convolution neural network (CNN). Compared to fully-connected NNs, CNNs are more efficient and effective for tasks involving structured data [30]. Consider a 2D convolutional layer consisting a set of F trainable filter weights Qf ∈ Ra×b, where f = 1, ..., F and F is the depth of the layer. This layer maps an input matrix X ∈ Rn×m to a feature map Yf ∈ Rn′×m′ according to: Yf i,j = a−1∑ k=0 b−1∑ ℓ=0 Qf a−k,b−ℓX1+s(i−1)−k,1+s(j−1)−ℓ, where s ≥ 1 is call stride. It denotes the step size for the convolution, specified by a positive integer. The output size can be calculated as n′ = 1 + ⌊ n+a−2 s ⌋ and m′ = 1+ ⌊ m+b−2 s ⌋ . In convolutional layers, the filter slides across an input vector with a certain stride, tying adjacent shifts of the same weights together. Consequently, convolutional layers reduce the model complexity compared to dense layers [9]. 2.2.2 Autoencoder An autoencoder is a unsupervised learning framework designed to learn latent repre- sentations by minimizing the reconstruction loss of its input data [31]. An example of an autoencoder is shown in Figure 2.5. The network consists of two parts: an encoder function that transforms the input data into latent representation h = f (x) and a decoder that produces a reconstruc- tion r = g (h), where h is the latent space that describes a code used to represent the input. Most autoencoders are undercomplete autoencoders, meaning the latent space h has a smaller dimension than the input data x. Learning an undercom- plete representation forces the autoencoder to capture the most essential features of the data [32]. The learning process can be described as minimizing a loss function L (x, g (f (x))). 2.2.3 Loss Functions The goal of training a NN is to minimize a chosen loss function. Commonly used loss functions include mean squared error (MSE) loss, binary cross-entropy (BCE) loss, categorical cross-entropy (CCE) loss. The following subsection provides a review of the BCE and CCE, which are used in this thesis. 14 2. Theory Figure 2.5: An autoencoder composed of two NNs. Binary Cross-Entropy Loss BCE loss is commonly used for binary classification tasks. It measures the perfor- mance of a classification algorithm whose output is a probability between 0 and 1. In a binary classification task, each class label is denoted by a scalar s ∈ {0, 1}. A NN can be designed to output a probability q for a given input r according q = f (r; θ), where q describes the probability of the input belonging to positive class. Here, θ represents the trainable parameters in the NN. Given this model and a training dataset D consisting of |D| input-output pairs, the BCE loss is defined as: LBCE (θ) = − 1 |D| ∑ (r,s)∈D s log [f (r; θ)] + (1− s) log [1− f (r; θ)] , where f (r; θ) is the probability of the input r being in class s and 1− f (r; θ) is the probability of the input r being in the other class 1− s. Categorical Cross-Entropy Loss CCE is commonly used for multi-class classification problems. In contrast to binary classification tasks, the class label s ∈ {0, 1, ..., C − 1} where C is the number of classes. Assume p is the probability vector associated with the true class label s. The CCE loss can be expressed as: LCCE (θ) = − 1 |D| ∑ (r,s)∈D ℓCE (p, q) , 15 2. Theory where ℓCE (p, q) = −∑C c=1 pc log qc is the cross entropy, measuring the difference between two distributions p and q. Here, qc is the predicted probability for class c and pc is the true probability (which is 1 for the correct class and 0 otherwise). 2.2.4 Gradient-based Learning The goal of the training algorithms is to find a good set of parameters θ that minimize the chosen loss function. This is typically achieved by optimizing the loss function L (θ). The problem can be expressed as: θ∗ = arg min θ L (θ) , where θ∗ is the optimal parameters that minimize the loss function. To solve this optimization problem, gradient-based techniques such as stochastic gradient descent (SGD) are often used. In SGD, the parameters θ are updated iteratively using the gradient of the loss function with respect to θ. At each iteration, θt+1 = θt − η∇θL̃ (θt) , where θt denotes the parameters at iteration t, η is the learning rate, and L̃ (θt) is an approximation of the loss function using a random mini-batch Bt of the entire training samples D at iteration t. This mini-batch Bt ⊂ D, and it helps to reduce the computational cost. The gradient ∇θL̃ (θt) can be efficiently calculated using back-propagation algorithm [33]. The choice of the learning rate influences the convergence rate, a very large learning rate might cause the algorithm to diverge, while a very low learning rate makes it slow to converge. There are many variants of the SGD proposed to improve the convergence [32], such as the momentum method [34], RMSProp method [35], and Adam optimizer [36]. These optimizers adjust the learning rate during the training based on the gradients, which helps to avoid the local minima and speeding up the convergence. 16 3 Methods Autoencoder can be used to assist the design of communication systems. In this chapter, we first introduce a simple AE-based end-to-end learning of communication system and its training procedure. Then, we introduce the proposed CNN-AE based joint synchronization, equalization, and decoding system. 3.1 An AE-based Communication System Consider a simple communication system as shown in Figure 2.1. From a deep learn- ing perspective, this system can be viewed as a particular type of autoencoder. In this scenario, the encoder acts as the transmitter, learning a representation x of the message W in a manner robust to the channel impairments. The decoder functions as the receiver, attempting to recover the message from the channel observation y with a low probability of error. This concept was initially proposed in [9], where both the transmitter and receiver are replaced by multiple dense layers. In this section, we introduce a CNN-AE based scheme, where the transmitter and receiver are comprised of a set of CNN layers. The structure of the CNN-AE is based on the model proposed in [14]. However, we adjust the parameters and number of layers for the purposes of this thesis. The structure of the CNN-AE is shown in Figure 3.1, where we model four blocks (channel encoder, modulator, demodulator and channel decoder) as CNN blocks and train them jointly in an end-to-end manner. Given an information bit sequence u ∈ {0, 1}k of length k, the transmitter outputs n complex-valued symbols x ∈ Cn. These symbols are then propagated through the channel, resulting in a noisy observa- tion y ∈ Cn of the transmitted signal. The receiver takes this noisy observation and compensates for transmission impairments to output an estimate of the transmitted bit sequence û. The goal of the end-to-end learning is to find suitable parameters for the AE such that the transmitter learns a signal representation that is robust to channel impairments, while the receiver learns reliable reconstructions of the trans- mitted bits from the channel observation. This end-to-end training ensures that the entire communication system is optimized holistically, leading to improved perfor- mance in terms of error rates and robustness to noise. The CNN-AE structure is designed to mimic the blocks of a conventional communi- cation system. Each block of the CNN-AE is represented by a set of 1-dimensional convolutional (Conv1D) layers, where the dimensions are chosen based on the spe- 17 3. Methods Figure 3.1: The structure of a CNN-AE based communication system, where the traditional transmitter and receiver is replaced by CNNs. cific function of each block. Compared to fully connected layers, Conv1D layers offer lower complexity and better trainability. The detailed structure is shown in Table 3.1, we will introduce each block and train- ing procedure in the following subsections. Transmitter The transmitter maps a message with k information bits to n complex-valued sym- bols. The Enc CNN first works as a channel encoder, mapping the bit sequence into nc coded sequence at a code rate Rcod = k/nc. Then, the Mod CNN functions as a modulator, mapping the nc coded sequence into n complex-valued symbols with the modulation order m = nc/n. The overall communication rate is overall R = k/n bits per complex channel use. For simplicity, we set l as the greatest common divisor of nc and k, thus k′ = k/l and n′ c = nc/l. This allows us to interpret the encoding of k bits into nc coded bits as the encoding of l sub-bits of k′ bits into l sub-codewords of n′ c bits. This transformation enables the AE to fit different code rates easily. • Enc CNN: We apply five Conv1D layers to function as a channel encoder. The first four layers map the information bits into a higher dimensional space, allowing the AE to learn an effective placement of the bit sequence. The final layer maps the sub-codewords down to a lower dimensional space, with the output reshaped into a matrix of size (n, m) for modulation. Convolutional 18 3. Methods Layer Activation Output dimensions Enc CNN Conv1D ELU (k,100) Conv1D ELU (l,100) Conv1D ELU (l,100) Conv1D ELU (l,100) Conv1D ELU (l,n′ c) Reshape (n,m) Mod CNN Conv1D ELU (n,100) Conv1D ELU (n,100) Conv1D ELU (n,100) Conv1D Linear (n,2) Demod CNN Conv1D ELU (n,100) Conv1D ELU (n,100) Conv1D ELU (n,100) Conv1D Linear (n,m) Reshape (l,n′ c) Dec CNN Conv1D ELU (l,100) Conv1D ELU (l,100) Conv1D ELU (l,100) Conv1D ELU (l,100) Conv1D Sigmoid (k,1) Table 3.1: Parameters of the CNN-AE, each Conv1D layer is followed by a batch normalization layer before activation to help the model converge quickly. operations enable linear coding, while exponential linear unit (ELU) activation functions allow potential non-linear operations as : ELU (z) = { z, z > 0, ez − 1, z ≤ 0, The use of ELU typically speeds up the learning and reduces errors [37]. • Mod CNN: We use four Conv1D layers to modulate the n symbols, each with m symbols. The first three layers map m symbols into a higher dimensional space. The final layer maps each of the n modulated 100-dimensional symbols into a 2-dimensional real-valued symbols, representing the real and imaginary components of the transmit symbols. • Normalization: A non-trainable normalization is added to satisfy the average power constraint E [ ∥x∥2 ] = nρ, where ρ denotes the SNR. 19 3. Methods The normalized signal x is then transmitted over the channel. We assume an AWGN channel is used, i.e., y = x + z. The noise z is an n-dimensional vector of indepen- dent and identically distributed complex Gaussian noise with zero mean and unit variance. Receiver The receiver takes the channel observation as input and attempts to estimate the transmitted message. The demod CNN functions as a demodulator, and the Dec CNN functions as a channel decoder. Both the demod CNN and Dec CNN are designed in the same manner to reconstruct the transmitted message. Key compo- nents include: • Sigmoid activation function: The Sigmoid function takes a real value as input and outputs a value between 0 and 1. It is expressed as: S (z) = 1 1 + e−z . The Sigmoid outputs can be interpreted as the estimated posterior probabil- ities of the bits being 0 or 1. The closer the output value is to 0, the more likely the bits is 0, and vice versa. • Decision: Since the output of the last layer is a posterior probability between 0 and 1, a threshold of 0.5 is applied to ensure the output is a binary vector, representing the estimated transmitted bits. Training procedure and parameters We train the CNN-AE by optimizing the total BCE loss between originally trans- mitted bit sequence u and the estimated bit sequence û at the receiver output. The training process involves adjusting all trainable parameters in an end-to-end manner using SGD. The detailed training procedure is outlined in Algorithm 1. The joint training algorithm only works when the channel model is differentiable; otherwise, the gradients cannot be backpropagated through the channel. In [11], the authors propose an alternating training algorithm. In this approach, at each iter- ation, the receiver is optimized while keeping the transmitter parameters θtx fixed, and the transmitter is optimized while keeping the receiver parameters θrx fixed. By following this procedure, the system achieves faster convergence. The details of the alternating training method are provided in Algorithm 2. The training parameters are listed in Table 3.2. It is important to note that while the BCE loss function optimizes the BER, it does not directly result in optimal BLER. In this thesis, we have chosen to optimize the BCE loss, which is sufficient 20 3. Methods Algorithm 1 Training Procedure for CNN-AE Input: Number of Epoch M , Training Step T , Training SNR σ2 min, σ2 max, Training parameter θtx, θrx Output: θtx, θrx for i ≤M do for j ≤ T do σ2 ←− generate_SNR ([σ2 min, σ2 max]) u←− generate_bits () x←− transmit (u; θtx) y←− channel (x; σ2) û←− receive (y; θrx) LBCE ←− BCE (u, û) θtx, θrx ←− SGD ([θtx, θrx],LBCE) end for end for Algorithm 2 Alternating Training Procedure for CNN-AE Input: Number of Epoch M , Training Step TTX,TRX, Training SNR σ2 min, σ2 max, Training parameter θtx, θrx Output: θtx, θrx for i ≤M do for j ≤ TTX do set_trainable(θtx, θrx) = [True, False] σ2 ←− generate_SNR ([σ2 min, σ2 max]) u←− generate_bits () x←− transmit (u; θtx) y←− channel (x; σ2) û←− receive (y) LBCE ←− BCE (u, û) θtx ←− SGD (θtx, LBCE) end for for j ≤ TRX do set_trainable(θtx, θrx) = [False, True] x←− transmit (u; θtx) y←− channel (x; σ2) û←− receive (y; θrx) LBCE ←− BCE (u, û; θrx) θtx, θrx ←− SGD ([θtx, θrx],LBCE) end for end for 21 3. Methods Parameter Value Loss BCE Epoch 100 Batch size 500 Training vectors 106 Optimizer Adam Learning rate 0.001 Table 3.2: Hyperparameters for the training of CNN-AE under the AWGN channel. for achieving the BLER performance necessary for our comparative analysis. It is worth mentioning that in [38], several alternative loss functions are proposed that aim for BLER-optimal decoding. 3.2 An AE-based Joint Synchronization, Equal- ization, and Decoding System We now move forward to consider transmitting short packets over a memoryless block-fading waveform channel with an unknown delay. In this scenario, the re- ceiver needs to handle the task of synchronization, equalization, and decoding. We propose a CNN-AE-based joint synchronization, equalization, and decoding system. Instead of relying on the dedicated pilots, the proposed system uses the holistic message from the channel outputs to synchronize and decode the transmitted sig- nal. This approach not only results in higher spectral efficiency but also allows for the possibility of shorter messages compared to systems that use dedicated pilots. The system follows the same setup in [39], but discards the use of conventional pilots. The following subsections will provide a detailed introduction to the CNN-AE-based scheme and its training procedure. Transmitter The transmitter part, depicted in Figure 3.2, consists of the trainable CNN blocks and the non-trainable parts. At first, the Enc CNN encodes the information bits b ∈ {0, 1}k into a coded sequence c ∈ Rnc , which is real-valued. Then the Mod CNN maps the coded sequence c to the complex-valued symbol sequence x ∈ Cn. In order to transmit the short message through the block fading channel with un- known delay, we process the symbol sequence with following steps. First, we split the symbol sequence x ∈ Cn into nb sub-sequences {xℓ}nb ℓ=1 of length ns xℓ = [x1,ℓ, ..., xns,ℓ] ∈ Cns , 22 3. Methods Figure 3.2: Block diagram of the transmitter part of the CNN-AE-based system model, the blue blocks indicates the trainable parts. Figure 3.3: Block diagram of the receiver part of the CNN-AE-based system model. where ℓ = 1, ..., nb, nb denotes the number of fading blocks, each sub-fading blocks contains ns complex-valued channel uses. Then we apply power normalization for ℓth sub packet according to E [ ∥xℓ∥2 ] = nsρ, where ρ denoted the SNR. In order to form the continuous-time signal, we add pulse shaping block. Consider using square pulse with normalized energy stp(t) = { 1√ tp , t ∈ [0, tp) , 0, otherwise , where tp denotes the period of the pulse, determined by the upsampling rate N and sampling interval ts as tp = Nts. The signal for ℓth sub packet can be expressed as: xℓ(t) = ns∑ k=1 xk,ℓstp (t− (k − 1) tp) . 23 3. Methods The continuous-time signal then can be transmitted over the fading channel with an unknown delay. The received signal for ℓth fading block is: Yℓ (t) = Hℓxℓ (t− τ) + Zℓ (t) , where Hℓ denotes the random complex gain for the ℓth fading block, following the CN (0, 1) distribution; τ denotes the time delay, which we consider a simple case that each sub packet experiences the same delay; τ can be treated as uniform dis- tributed in [0, τmax]; Z1(t), ..., Znb (t) are independent additive white Gaussian noise with power spectral density N0. Receiver The receiver structure is depicted in Figure 3.3. Unlike the previous CNN-AE scheme designed for the AWGN channel, the proposed structure includes a Sync CNN for synchronization, as well as a Demod CNN and Dec CNN for equalization and de- coding. We next introduce each block in detail. First, the Sync CNN estimates the starting position of the data signal from the received signal Y1 (t) , ..., Ynb (t). The length of the received signal for each sub- packet is n′ = nstp + τmax. Since the received signal is complex-valued, it is divided into 2 real-valued signals, corresponding to the real and imaginary parts. The Sync CNN maps the received signal of each sub-packet into a higher dimensional space and outputs the probability distribution of τmax possible outcomes using the softmax activation function softmax (τ )i = eτi∑τmax j=1 eτj . Here, pτ is a τmax-dimensional probability vector with all entries between 0 and 1. pi = softmax (τ )i represents the probability of the estimated delay τ̂ = i. The sum of the probabilities ∑τmax i=1 pi = 1. Then the time delay is estimated according to τ̂ = arg max pτ . We remove the estimated delay from each sub-packets and concatenate all sub-packets together, resulting in a signal y(t)′ of length ntp, which is then used for equalization and decoding. The following CNN blocks function in an iterative fashion, as shown in Figure 3.4. First, the EQ CNN performs equalization with inherent channel gain estimation. The output of the EQ CNN, denoted as Ic, can be interpreted as the prior informa- tion provided to the Dec CNN. Subsequently, the Dec CNN calculates the posterior of the transmitted bit sequence Ib, and sends the extrinsic information Ic ′ = Ib− Ic back to the EQ CNN that can be used as prior for equalization at next iteration. After a sufficient number of iterations, the estimated bits can be calculated from Ib using the sigmoid function: sigmoid(Ib). The size of both the prior information and extrinsic information is (n, F ), where F represents the information feature size. The number F indicates the amount of in- 24 3. Methods Layer Activation Output dimensions Enc CNN Conv1D ELU (k,200) Conv1D ELU (k,200) Conv1D ELU (k,200) Conv1D ELU (k,200) Conv1D ELU (k,⌊n/k⌋) Reshape (n,1) Mod CNN Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D Linear (n,2) Sync CNN Conv1D ELU (nb,n′,100) Conv1D ELU (nb,n′,100) Conv1D ELU (nb,n′,100) Conv1D ELU (nb,n′,100) Conv1D ELU (nb,n′,1) Flatten (nbn ′,) Dense Softmax (τmax) EQ CNN Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D (n,F ) Dec CNN Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D ELU (n,200) Conv1D (n,F ) Dec CNN Last Iteration Flatten (nF ,) Dense Sigmoid (k,1) Table 3.3: Parameters of the CNN-AE-based joint synchronization, equalization, and decoding system. 25 3. Methods Figure 3.4: The iteration steps of the equalization and decoding at the receiver. formation exchanged between the EQ CNN and Dec CNN per codeword. Compared to sequential structure, the iterative process results in faster convergence. Table 3.3 shows the structure of each layer in the system. Compared to previous configurations, more filters are applied to each Conv1D layer to allow the system to better capture and process the intricate patterns of the signal. Training procedure and parameters We train the CNN-AE by optimizing the BCE loss between the transmitted bit sequence b and the estimated bit sequence b̂ at the receiver’s output. In parallel, the Sync CNN outputs a prediction pτ for the time delay. The CCE is calculated using the prediction pτ and the one-hot representation of the true delay τ . Thus, we define the total loss as a sum of both synchronization loss and decoding loss, L = LBCE + α · LCCE. The hyperparameter α can be adjusted to balance the decoding and synchronization performance of the system. The main hyperparameters are given in Table 3.4 and the training process is shown in Algorithm 3. 26 3. Methods Algorithm 3 Training Procedure for CNN-AE Input: Number of Epoch M , Training Step T , Training SNR σ2 min, σ2 max, Training Parameter θenc, θdec, θsync, Loss Weight α Output: θenc, θdec, θsync for i ≤M do for j ≤ T do τ ←− generate_delay ([0, τmax)) σ2 ←− generate_SNR ([σ2 min, σ2 max]) b←− generate_bits () x←− transmit (b; θenc) for xi ∈ x do xi ←− normalization (xi) xi(t)←− pulse_shaping (xi) yi(t)←− channel (xi(t), τ, σ2) end for for yi ∈ y do pτ ←− synchronization (yi, θsync) end for LCCE ←− CCE (pτ , τ ; θsync) τ̂ ←− argmax (pτ ) for yi ∈ y do ycutoff ←− cutoff (yi, τ̂) end for yDEC ←− concat (ycutoff) b̂←− decode (yDEC; θdec) LBCE ←− BCE (b, b̂; θdec) L←− LBCE + α · LCCE θenc, θsync, θdec ←− SGD ([θenc, θsync, θdec], L) end for end for 27 3. Methods Parameter Value Loss BCE,CCE α 0.01 F 20 Decoder iteration 6 Batch size 500 - 1000 Optimizer Adam Learning rate 10−4 - 10−5 Training SNR 2.0 - 20.0 dB Table 3.4: Hyperparameters for the training of CNN-AE-based system. 28 4 Results In this chapter, we discuss the performance of our AE-based communication systems described in the previous chapter. First, we consider the CNN-AE system under a AWGN channel and a block-fading channel and compare its performance with state- of-the-art channel codes in terms of BER and BLER. Next, we consider the proposed CNN-AE-based joint synchronization, equalization, and decoding system, and com- pare the synchronization and decoding performance with the achievability bound for pilot-assisted system. The following sections provide a detailed illustration. 4.1 Performance of CNN-AE First, we evaluate the end-to-end performance of proposed CNN-AE system as de- scribed in Chapter 3.1. We consider transmitting k = 64 information bits within a short packet of n = 128 channel uses over a memoryless AWGN channel and a Rayleigh block-fading channel. The communication rate is R = k/n = 1/2 bits per channel use. 4.1.1 Performance of CNN-AE under AWGN Channel Figure 4.1 and 4.2 presents the simulated BER and BLER performance of following schemes under the AWGN channel: • Baseline system : The system employs 5G compliant LDPC codes combined with BPSK modulation, simulated using the Sionna library [40]. An LDPC code with a code rate Rc = 1/2 is utilized. Specifically, the code uses base graph 2 with a lifting factor of 11. At the receiver, a boxplus-phi belief prop- agation decoder is applied with 20 iterations for decoding. • Proposed CNN-AE : The system adheres to the structure detailed in Table 3.1 and is trained and tested over the same range of SNRs for 106 blocks. We also plot the normal approximation on BLER as a function of the SNRs for R = 1/2 and n = 128 under real-valued AWGN channels. The simulation is done with the help of SPECTRE toolbox [41]. The results demonstrate that the proposed CNN-AE scheme exhibits comparable performance to the baseline system under the AWGN channel. At low SNRs, the 29 4. Results 1 2 3 410−4 10−3 10−2 10−1 100 SNR (dB) BE R LDPC code CNN-AE Figure 4.1: Simulated BER under AWGN channel with k = 64, n = 128. CNN-AE scheme outperforms the baseline system, demonstrating superior perfor- mance in terms of both BER and BLER. Notably, the CNN-AE achieves a more significant reduction in BER than BLER when compared to the baseline system. However, at higher SNRs, the baseline system surpasses the CNN-AE, resulting in a more rapid decline in both BER and BLER. Both systems show a performance gap relative to the normal approximation for BLER. For the following performance comparison, the focus will be on BLER comparisons instead. 4.1.2 Performance of CNN-AE under Block-fading Channel We consider the transmission over a Rayleigh memoryless block-fading channel, where neither the transmitter nor the receiver has any prior knowledge of the CSI. The number of fading blocks is nb = 4, and the channel gains of each fading block Hℓ, ℓ = 1, ..., nb are independently distributed according to a complex Gaussian dis- tribution CN (0, 1). Figure 4.3 illustrates the simulated BLER performance of following schemes under the block-fading channel: • Achievability bound : This refers to the nonasymptotic achievability bound on maximum coding rate over Rayleigh block-fading channels, under the as- sumption that the receiver lacks prior knowledge of the CSI. This bound is 30 4. Results 1 2 3 410−4 10−3 10−2 10−1 100 SNR (dB) BL ER LDPC code CNN-AE Normal approximation [23] Figure 4.2: Simulated BLER under AWGN channel with k = 64, n = 128. proposed in [28] and is simulated using the SPECTRE toolbox [42]. • Proposed CNN-AE : The system adheres to the structure detailed in Table 3.1 and is trained and tested over the same range of SNRs for 106 blocks. • Baseline system : The system employs 5G compliant LDPC codes with QPSK modulation at the transmitter and ZF equalizer at the receiver, where the CSI is assumed to be known. The achievability bound serves as a relatively tight benchmark on the error prob- ability for any transmission scheme under a memoryless Rayleigh fading channel, particularly when no CSI is available at the receiver. Compared to the baseline system, CNN-AE exhibits better performance at higher SNRs, particularly from 11 dB onwards. It is important to note that the baseline system benefits from prior knowledge of the CSI; hence, no pilot sequence is required, and no rate loss is in- curred. In contrast, the CNN-AE not only maintains performance but also enhances spectral efficiency. Nonetheless, a performance gap remains when compared to the achievability bound. In the following section, we will evaluate the performance of the enhanced CNN-AE structure, as detailed in Chapter 3.2, in a more complicated scenario. 31 4. Results 2 4 6 8 10 12 14 16 18 2010−4 10−3 10−2 10−1 100 SNR (dB) BL ER Achievability bound CNN-AE Baseline system Figure 4.3: Simulated BLER under the block-fading channel with nb = 4, k = 64, n = 128. 32 4. Results Parameter Value information bits: k 80 blocklength: n 144 number of fading blocks: nb 4 upsampling rate: N 5 maximum time delay : τmax 12 Table 4.1: Parameters for simulation. 4.2 Performance of CNN-AE-based Joint Synchro- nization, Equalization, and Decoding System In this section, we consider the short packet transmission over a SISO memoryless block-fading waveform channel with an unknown delay. The benchmark is pro- posed in [39]. The proposed CNN-AE-based joint synchronization, equalization, and decoding scheme is detailed in Chapter 3.2. We evaluate the performance of the proposed scheme in terms of normalized mean square error (NMSE) for delay estimation error and BLER for decoding. The selected parameters for the simulation are shown in Table 4.1. The transmission rate is R = 80/144 = 0.556 bit per complex channel use. We assume the channel gains Hℓ, ℓ = 1, ..., nb are generated independently from CN (0, 1). The time delay τ for each sub-block is assumed to be uniformly distributed in [0, τmax]. 4.2.1 Synchronization Performance To evaluate synchronization error, we consider the metric NMSE for delay estima- tion, which is defined as: NMSE = E [ (τ − τ̂)2 /t2 p ] . Here, tp is the period of the pulses used for pulse shaping. With the assumption that the sampling interval is 1 second, tp coincides with the upsampling rate N . The benchmark is a pilot-assisted system detailed in [39]. Both the benchmark and proposed system aim to handle synchronization tasks at the receiver. While the benchmark applies ML estimation for joint synchronization and channel estimation using pilot assistance, the proposed CNN-AE-based system utilizes the entire re- ceived signal for synchronization. Figure 4.4 presents the simulated results for the benchmark and CNN-AE-based system. It is evident that the CNN-AE-based system outperforms the ML estimation used in the pilot-assisted scheme. This superior performance is attributed to the CNN-AE’s ability to perform delay estimation over the entire observation of the 33 4. Results 2 4 6 8 10 12 14 16 18 2010−5 10−4 10−3 10−2 10−1 100 SNR (dB) N M SE Pilot-assisted scheme CNN-AE-based system Figure 4.4: Synchronization error comparison. received signal, as opposed to the pilot-assisted scheme, which relies solely on the pilot sequence. The CNN-AE-based system can learn latent information from the received signal without requiring prior information, enhancing its synchronization performance. 4.2.2 Decoding Performance After synchronizing the received signal, the receiver needs to equalize the signal and estimate the transmitted bit sequence. The benchmark proposed in [39] developed an RCUs bound on error probability for pilot-assisted transmission systems. The numerical results of the achievability bound are efficiently computed using the sad- dlepoint approximation. We evaluate the BLER performance for the benchmark and CNN-AE-based system as shown in Figure 4.4. The simulated results indicate that while the CNN-AE- based system underperforms compared to the benchmark at SNRs lower than 10 dB, it surpasses the benchmark at higher SNRs, which are more practical operational ranges. Specifically, the CNN-AE-based system outperforms the benchmark by 1.7 dB when the BLER is 10−3. Our proposed scheme jointly learns the channel gains and estimates the transmitted bit sequences without relying on any known sequence assistance. At lower SNRs, synchronization performance significantly influences decoding performance, requiring the training process of the CNN-AE to carefully 34 4. Results 2 4 6 8 10 12 14 16 18 2010−4 10−3 10−2 10−1 100 1.7 dB SNR (dB) BL ER RCUs bound for pilot-assisted scheme CNN-AE-based system Figure 4.5: Achievable BLER comparison. balance both tasks. However, at higher SNRs, synchronization performance has minimal impact on decoding, making it easier to improve decoding performance. 35 4. Results 36 5 Conclusion This chapter summarizes the work conducted in this thesis and outlines some po- tential ideas for future work. In this thesis, we applied end-to-end learning of physical-layer communications and evaluated the performance of CNN-AE-based joint synchronization, equalization, and decoding system under short packet communications. Unlike conventional com- munication systems that use dedicated preambles for synchronization and equaliza- tion, our proposed system performs joint synchronization, equalization, and decod- ing without the use of dedicated preambles and prior information. This approach is more suitable in URLLC scenarios as it greatly improves the spectral efficiency and reduces the overhead in short packet regime. Compared to the nonasymptotic achievability bound for pilot-assisted transmission systems, our proposed system results in a lower BLER performance in the high SNR range under block-fading waveform channels. 5.1 Future Work While this thesis has explored the use of CNN-AE in short packet communications, several important aspects and potential improvements merit further research: • Exploring CNN-AE and Turbo-AE structures: While the proposed system is based on CNN-AE, another structure called Turbo-AE also demonstrates good performance under AWGN and fading channels [13], [43]. Turbo-AE takes advantage of Turbo codes, utilizing interleavers and deinterleavers at both the transmitter and receiver. Further exploration of the Turbo-AE structure is warranted. Additionally, combining the strengths of both CNN-AE and Turbo-AE could potentially yield a hybrid model with superior performance. • Optimizing the training procedure: The computational complexity of the training procedure for the proposed system is high. The selection of hyperpa- rameters is currently based on empirical fine-tuning, which can be cumbersome and may not generalize well to other scenarios. Future research should focus on developing more efficient training algorithms and hyperparameter optimiza- tion techniques. Methods such as reinforcement learning could be explored to streamline the training process and improve the system’s adaptability to dif- ferent conditions. 37 5. Conclusion • Extending to more practical system setup: Our proposed system considers the case where all fading blocks are synchronous, experiencing the same time delay. A more complex scenario involves different fading blocks experiencing different random time delays, which has also been evaluated in our benchmark [39]. In such cases, the synchronization component of our proposed system should be updated. By addressing these aspects, we can further improve the performance and applica- bility of CNN-AE-based joint synchronization, equalization, and decoding system in short packet communications. 38 Bibliography [1] Hamidreza Bagheri, Md Noor-A-Rahim, Zilong Liu, Haeyoung Lee, Dirk Pesch, Klaus Moessner, and Pei Xiao. 5G NR-V2X: Toward Connected and Co- operative Autonomous Driving. IEEE Communications Standards Magazine, 5(1):48–54, 3 2021. [2] Georgia Kolovou, Sharief Oteafy, and Periklis Chatzimisios. A Remote Surgery Use Case for the IEEE P1918.1 Tactile Internet Standard. IEEE International Conference on Communications, 6 2021. [3] Ali Gohar, Gianfranco Nencioni, Omar Khyam, and Xuejun Li. The Role of 5G Technologies in a Smart City: The Case for Intelligent Transportation System. Sustainability 2021, Vol. 13, Page 5188, 13(9):5188, 5 2021. [4] Rashid Ali, Yousaf Bin Zikria, Ali Kashif Bashir, Sahil Garg, and Hyung Seok Kim. URLLC for 5G and Beyond: Requirements, Enabling Incumbent Tech- nologies and Network Intelligence. IEEE Access, 9:67064–67095, 2021. [5] Harsh Tataria, Mansoor Shafi, Andreas F. Molisch, Mischa Dohler, Henrik Sjoland, and Fredrik Tufvesson. 6G Wireless Systems: Vision, Requirements, Challenges, Insights, and Opportunities. Proceedings of the IEEE, 109(7):1166– 1199, 7 2021. [6] Zexian Li, Hamidreza Shariatmadari, Bikramjit Singh, and Mikko A. Uusitalo. 5G URLLC: Design challenges and system concepts. Proceedings of the In- ternational Symposium on Wireless Communication Systems, 2018-August, 10 2018. [7] Alexandru Sabin Bana, Kasper Floe Trillingsgaard, Petar Popovski, and Elis- abeth De Carvalho. Short Packet Structure for Ultra-Reliable Machine-Type Communication: Tradeoff between Detection and Decoding. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceed- ings, 2018-April:6608–6612, 9 2018. [8] Alejandro Lancho, Johan Ostman, and Giuseppe Durisi. On Joint Detection and Decoding in Short-Packet Communications. Proceedings - IEEE Global Communications Conference, GLOBECOM, 2021. [9] Timothy O’Shea and Jakob Hoydis. An Introduction to Deep Learning for the Physical Layer. IEEE Transactions on Cognitive Communications and Net- working, 3(4):563–575, 12 2017. [10] Sebastian Dorner, Sebastian Cammerer, Jakob Hoydis, and Stephan Ten Brink. Deep Learning Based Communication over the Air. IEEE Journal on Selected Topics in Signal Processing, 12(1):132–143, 2 2018. 39 Bibliography [11] Faycal Ait Aoudia and Jakob Hoydis. End-to-End Learning of Communications Systems Without a Channel Model. Conference Record - Asilomar Conference on Signals, Systems and Computers, 2018-October:298–303, 7 2018. [12] Alexander Felix, Sebastian Cammerer, Sebastian Dorner, Jakob Hoydis, and Stephan Ten Brink. OFDM-Autoencoder for End-to-End Learning of Commu- nications Systems. IEEE Workshop on Signal Processing Advances in Wireless Communications, SPAWC, 2018-June, 8 2018. [13] Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, and Pramod Viswanath. Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels. Advances in Neural Information Processing Systems, 32, 2019. [14] Nourhan Hesham, Mohamed Bouzid, Ahmad Abdel-Qader, and Anas Chaa- ban. Coding for the Gaussian Channel in the Finite Blocklength Regime Using a CNN-Autoencoder. 2023 IEEE International Black Sea Conference on Com- munications and Networking, BlackSeaCom 2023, pages 15–20, 2023. [15] Ognjen Jovanovic, Metodi P. Yankov, Francesco Da Ros, and Darko Zibar. Gradient-Free Training of Autoencoders for Non-Differentiable Communication Channels. Journal of Lightwave Technology, 39(20):6381–6391, 10 2021. [16] R. G. Gallager. Low-Density Parity-Check Codes. IRE Transactions on Infor- mation Theory, 8(1):21–28, 1962. [17] Claude Berrou, Alain Glavieux, and Punya Thitimajshima. Near SHANNON limit error-correcting coding and encoding: Turbo-codes (1). IEEE Interna- tional Conference on Communications, pages 1064–1070, 1993. [18] Solomon W. Golomb and Guang Gong. Signal design for good correlation: For wireless communication, cryptography, and radar. Signal Design for Good Correlation: For Wireless Communication, Cryptography, and Radar, 9780521821049:1–440, 1 2005. [19] David C. Chu. Polyphase Codes with Good Periodic Correlation Properties. IEEE Transactions on Information Theory, 18(4):531–532, 1972. [20] Huseyin Arslan. Wireless communication signals : a laboratory-based approach. [21] John G. Proakis. Digital communications. 2001. [22] C. E. Shannon. A Mathematical Theory of Communication. Bell System Tech- nical Journal, 27(3):379–423, 1948. [23] Yury Polyanskiy, H. Vincent Poor, and Sergio Verdú. Channel coding rate in the finite blocklength regime. IEEE Transactions on Information Theory, 56(5):2307–2359, 5 2010. [24] Alfonso Martinez and Albert Guillén I Fàbregas. Saddlepoint approximation of random-coding bounds. 2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings, pages 257–262, 2011. [25] Mustafa Cemil Coşkun, Giuseppe Durisi, Thomas Jerkovits, Gianluigi Liva, William Ryan, Brian Stein, and Fabian Steiner. Efficient error-correcting codes in the short blocklength regime. Physical Communication, 34:66–79, 6 2019. [26] Raymond Knopp and Pierre A. Humblet. On coding for block fading channels. IEEE Transactions on Information Theory, 46(1):189–205, 1 2000. 40 Bibliography [27] Wei Yang, Giuseppe Durisi, Tobias Koch, and Yury Polyanskiy. Quasi-static multiple-antenna fading channels at finite blocklength. IEEE Transactions on Information Theory, 60(7):4232–4265, 2014. [28] Giuseppe Durisi, Tobias Koch, Johan Östman, Yury Polyanskiy, and Wei Yang. Short-Packet Communications over Multiple-Antenna Rayleigh-Fading Chan- nels. IEEE Transactions on Communications, 64(2):618–629, 12 2014. [29] Siddharth Sharma, Simone Sharma, and Anidhya Athaiya. ACTIVATION FUNCTIONS IN NEURAL NETWORKS. International Journal of Engineer- ing Applied Sciences and Technology, 4:310–316, 2020. [30] Y Le Cun and Yann Le Cun. Generalization and Network Design Strategies. 1989. [31] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre Antoine Man- zagol. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning, pages 1096–1103, 2008. [32] Deep Learning - Ian Goodfellow, Yoshua Bengio, Aaron Courville - Google Books. [33] Paul J. Werbos. Backpropagation Through Time: What It Does and How to Do It. Proceedings of the IEEE, 78(10):1550–1560, 1990. [34] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning rep- resentations by back-propagating errors. Nature 1986 323:6088, 323(6088):533– 536, 1986. [35] Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning, 5 2013. [36] Diederik P. Kingma and Jimmy Lei Ba. Adam: A Method for Stochastic Op- timization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 12 2014. [37] Djork Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and Ac- curate Deep Network Learning by Exponential Linear Units (ELUs). 4th In- ternational Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, 11 2015. [38] Reinhard Wiesmayr, Gian Marti, Chris Dick, Haochuan Song, and Christoph Studer. Bit Error and Block Error Rate Training for ML-Assisted Communica- tion. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023-June, 2023. [39] A Oguz Kislal, Madhavi Rajiv, Giuseppe Durisi, Senior Member, Erik G Ström, and Urbashi Mitra. Is Synchronization a Bottleneck for Pilot-Assisted URLLC Links? 1 2024. [40] Jakob Hoydis, Sebastian Cammerer, Fayçal A¨ıt, A¨ıt Aoudia, Avinash Vem, Nikolaus Binder, Guillermo Marcus, and Alexander Keller. Sionna: An Open- Source Library for Next-Generation Physical Layer Research. 3 2022. [41] gdurisi/fbl-notes: Transmitting short-packet over wireless channels—an information-theoretic perspective. [42] yp-mit/spectre: SPECTRE: Short packet communication toolbox. [43] Jannis Clausius, Sebastian Dorner, Sebastian Cammerer, and Stephan Ten Brink. Serial vs. Parallel Turbo-Autoencoders and Accelerated Training for 41 Bibliography Learned Channel Codes. 2021 11th International Symposium on Topics in Coding, ISTC 2021, 2021. 42 DEPARTMENT OF SOME SUBJECT OR TECHNOLOGY CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden www.chalmers.se www.chalmers.se List of Acronyms List of Figures List of Tables Introduction Background Objectives Limitations Thesis Outline Notation Theory Classical Communication System Classical Transmitter Classical Receiver Channel Models and Fundamental Limits Deep Learning Basics Neural Networks Autoencoder Loss Functions Gradient-based Learning Methods An AE-based Communication System An AE-based Joint Synchronization, Equalization, and Decoding System Results Performance of CNN-AE Performance of CNN-AE under AWGN Channel Performance of CNN-AE under Block-fading Channel Performance of CNN-AE-based Joint Synchronization, Equalization, and Decoding System Synchronization Performance Decoding Performance Conclusion Future Work