Expectation Propagation-based Receiver for Coherent Optical Communication Master of Science Thesis in the Master Degree Programme, Communication Engineering SHENZHOU HOU Department of Signals and Systems Division of Communication Systems and Information Theory CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden, 2011 Master’s Thesis 2011:1 T SPM Loss a SPM Loss N2 R X1 R1 X2 R2 N1 Abstract This thesis presents a novel maximum a posteriori receiver based on factor graphs and expectation propagation (EP) technologies for the coherent optical communication system. The receiver deals with the linear (AWGN) and nonlinear distortions in the desired system. Comparison has also been made on SER between the EP-based receiver and other methods. The discussions and analysis are given for the result in the end. Keywords: optical, factor graphs, expectation propagation, linear, nonlinear Acknowledgements I would like to thank all those people who always offer me kind help and make my two years’ master study an exciting and inspiring journey. First of all, I want to thank my supervisor and examiner Henk Wymeersch. You always give me the confidence to conquer the unexpected problems during my master thesis. Without your qualified suggestion and powerful support, I cannot imagine that I could on day finish the thesis. Nan and Yan, thanks a lot for your kind assistance on my thesis work. I have learned a lot from the discussion with you. There is no doubt that we have experienced a wonderful cooperation in the past few months. Meanwhile, I should also like to thank my classmates who gave me good advices during these two years in Chalmers. Thanks should also be given to Lu. Thanks for your unconditional support. Without you, lots of fun will be undoubtedly missing. I will also thank all the members of Jiangyou party. I will never forget our precious experience for the rest of my life and hope to see your guys again. Finally, I will give my deeply appreciation to my parents. It is my big fortune to be your child. This moment, words are no longer enough to express how I love you. Thank you very much for everything in these passed years. No matter where I am, my heart will be always with you. Shenzhou Hou, Göteborg, Sweden, Sept 19, 2011 CONTENTS i Contents 1 Introduction ....................................................................................................................... 1 1.1 Background .................................................................................................................... 1 1.1.1 Coherent optical communication system .............................................................. 1 1.1.2 Factor graphs and Expectation propagation ......................................................... 1 1.2 Purpose .......................................................................................................................... 1 1.3 Structure of the thesis .................................................................................................... 2 2 Coherent optical communication system ......................................................................... 3 2.1 Introduction to the coherent optical communication model .......................................... 3 2.2 SPM, AWGN and Polarization ...................................................................................... 3 2.3 The MAP Receiver ........................................................................................................ 4 3 Factor graphs ..................................................................................................................... 6 3.1 Introduction to Factor graphs ........................................................................................ 6 3.1.1 Marginals, Function factorizations, Factor graphs ............................................... 6 3.1.2 Sum-product algorithm in Factor graphs .............................................................. 8 3.2 Factor graphs for the optical communication system .................................................. 10 3.2.1 Factorization and Graphs .................................................................................... 10 3.2.2 Sum-product algorithm ....................................................................................... 11 3.2.3 Challenges .......................................................................................................... 11 4 Expectation Propagation................................................................................................. 13 4.1 Introduction to Expectation Propagation ..................................................................... 13 4.1.1 Why Expectation Propagation ............................................................................ 13 4.1.2 Exponential family and Kullback-Leibler divergence ........................................ 13 4.1.3 Expectation Propagation ..................................................................................... 15 4.1.4 Potential problem ................................................................................................ 17 4.2 An example of Expectation Propagation: clutter problem .......................................... 17 CONTENTS ii 4.2.1 The Problem ....................................................................................................... 17 4.2.2 Factor Graph ....................................................................................................... 18 4.2.3 Processing Expectation Propagation .................................................................. 18 4.3 Expectation Propagation for the optical communication system ................................. 20 4.3.1 Expectation Propagation for message computing .............................................. 20 4.3.2 Message computing for chain-like model with Expectation Propagation .......... 22 5 Implementation of Expectation Propagation ................................................................ 25 5.1 Optical communication model with linear block ........................................................ 25 5.1.1 System setup ....................................................................................................... 25 5.1.2 Message approximation for linear block ............................................................ 26 5.2 Optical communication model with linear and nonlinear block .................................. 27 5.2.1 System setup ....................................................................................................... 28 5.2.2 Message approximation for nonlinear system with single polarization ............. 28 5.2.3 Message approximation for nonlinear system with dual polarization ................ 29 5.3 Simplification on EP algorithm ................................................................................... 30 6 Data analysis and comparison ........................................................................................ 32 6.1 Introduction to ML, SBP, BP method ......................................................................... 32 6.1.1 ML, SBP and BP ................................................................................................ 32 6.1.2 Simulation parameters ........................................................................................ 32 6.2 System with linear block ............................................................................................. 33 6.3 System with linear and nonlinear block ...................................................................... 34 6.3.1 Message changing through the factor graph ....................................................... 34 6.3.2 Effect of the number of samples in Monte Carlo ............................................... 35 6.3.3 Comparison with other methods ......................................................................... 36 7 Conclusion and future work ........................................................................................... 40 7.1 Conclusion ................................................................................................................... 40 7.2 Future work ................................................................................................................. 40 7.2.1 Expectation propagation ..................................................................................... 40 7.2.2 Other problems ................................................................................................... 41 References .............................................................................................................................. 42 CHAPTER 1. INTRODUCTION 1 Chapter 1 Introduction 1.1 Background This section will give a brief introduction to coherent optical communication system and also the corresponding techniques used in the thesis. 1.1.1 Coherent optical communication system In coherent optical communication system, the multi-level quadrature amplitude modulation (M-QAM) is employed because of its capability to provide a high data rate and spectral efficiency compared to traditional binary optical communication system. M-QAM often needs a high signal power to hold an acceptable signal-error-rate (SER) due to its more densely packed constellation. Nevertheless, the optical system gives a limitation on the performance while the input power increase, which is mostly due to the Kerr nonlinearity effect. The impairments from the Kerr effect contain deterministic and stochastic aspect, such as intrachannel four-wave mixing and the self-phase modulation (SPM) respectively [1, 2]. Additive white Gaussian noise (AWGN) which is often used to simulate the background noise of the channel can be also applied to the coherent optical communication system in the thesis. 1.1.2 Factor graphs and Expectation propagation Factor graphs and expectation propagation are two fundamental techniques employed in this thesis. Factor graphs are a kind of graph models which can be used to transform the factorized functions into the form of graph models. The advantage of factor graphs is that it offers an efficient way to calculate the marginal distributions of the function [3]. Expectation Propagation (EP) is an algorithm which can be used together with factor graphs. EP can iteratively make approximations for the desired distributions. 1.2 Purpose The purpose of this thesis is to design and develop a novel receiver which is based on factor graphs and expectation propagation for the coherent optical communication system with CHAPTER 1. INTRODUCTION 2 16-QAM constellation. The receiver aims to solve the linear (AWGN) and nonlinear (SPM) distortions happened in the communication system. The simulations will be made in MATLAB and the results will be compared with other existing methods. Comments will be also provided in the end. 1.3 Structure of the thesis This thesis is organized in a problem-solving structure. In chapter 2, details of the system and the corresponding problem will be introduced. A MAP receiver is also suggested in the end of chapter 2. Chapter 3 will explain the definition and details of factor graphs and show how to use factor graphs to build the MAP receiver mentioned in chapter 2. Moreover, the problem happened while using factor graphs will be also discussed in the end of chapter 3. In chapter 4, expectation propagation (EP) will be introduced to solve the problem in chapter 3. In the end of chapter 4, a general EP algorithm will be given for the optical communication system. Chapter 5 explains the details on how to implement the EP algorithm for linear and nonlinear system and a simplified EP algorithm will be given in the end of chapter 5. In chapter 6, comparison is made between the EP and other existing methods. The analysis on the results will be also provided. The last Chapter 7 talks about the conclusion of the thesis and the suggested work which could be carried out in the future. CHAPTER 2. COHERENT OPTICAL COMMUNICATION SYSTEM 3 Chapter 2 Coherent optical communication system 2.1 Introduction to the coherent optical communication model As it’s mentioned in the last chapter, because of the special property of the coherent optical communication system, the signal passing through the system will be distorted by a kind of nonlinear interference called self-phase modulation (SPM) [1] which is due to Kerr effect. And AWGN is also existed as the background noise of the system. When considering long haul transmission in the optical fiber system, the system can be divided into a number of blocks called Spans. Each span contains a single mode fiber (SMF), a dispersion-compensating fiber, and an amplifier. The thesis will focus on the interferences coming from the SPM and AWGN, which are also considered to represent the nonlinear and linear noise respectively in the system model. The figure below gives an example for a system with two spans. Figure 2.1 System model for optical communication system with two spans 2.2 SPM, AWGN and Polarization SPM: In figure 2.1, the function for SPM block is: 𝑆𝑆𝑆(𝑅𝑖−1) = 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑅𝑖−1||2) (2.1) Here 𝛾 is the nonlinearity parameter and 𝐿𝑒𝑒𝑒 is the effective lengthen factor of the single mode fiber (SMF) for each fiber. From the system model the signal after the SPM can be T SPM Loss a Fiber Amplifier SPM Loss Fiber Amplifier N2 R X1 R1 X2 R2 N1 CHAPTER 2. COHERENT OPTICAL COMMUNICATION SYSTEM 4 obtained: 𝑋𝑖 = 𝑅𝑖−1 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑅𝑖−1||2) (2.2) AWGN: In the system, AWGN is considered to be a two dimensional Gaussian noise [4]: 𝑁 ~ 𝑁(𝑋;𝑀,𝛴) (2.3) Therefore, the relationship between the input and the output of a span is: 𝑅𝑖 = 𝑅𝑖−1 ∗ 𝑒𝑒 𝑝 �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑅𝑖−1|�2� + N𝑖 (2.4) Polarization: The optical fiber communication discussed in the thesis is polarization multiplexed, which means that more than one set of signals (more than one ray of light) can be transmitted simultaneously in the system. Under this condition, the signal will be affected by all the signals transmitted at the same time in SPM block. Take dual polarization for an instance, the formula is as follows: 𝑋1𝑖 = 𝑅1𝑖−1 ∗ 𝑒𝑒𝑒 ( 𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ � �|𝑅1𝑖−1|�2 + �|𝑅2𝑖−1|�2 � ) (2.5) 𝑋2𝑖 = 𝑅2𝑖−1 ∗ 𝑒𝑒𝑒 ( 𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ � �|𝑅1𝑖−1|�2 + �|𝑅2𝑖−1|�2 � ) (2.6) Here, 𝑋1𝑖 and 𝑋2𝑖 denote two simultaneously transmitted signals. 2.3 The MAP Receiver According to the system model, it can be expected that, when passing through the optical system, the signals will be iteratively rotated and added Gaussian noise. The picture below gives an example of a distorted 16-QAM constellation after passing through the system with 17 spans and 5dbm average input power. CHAPTER 2. COHERENT OPTICAL COMMUNICATION SYSTEM 5 Figure 2.2 Distorted constellation of 16-QAM with 17 spans and average input power 5 dbm To solve this problem and recover the distorted signals in the end, the thesis tries to design a MAP receiver: 𝑎�(𝑅𝑁) = arg𝑀𝑀𝑀�𝑃(𝑎|𝑅𝑁)� ; 𝑎 ∈ Θ (2.7) Here Θ is the constellation of the input signal, 𝑅𝑁 is the observation and 𝑎� is the decision made based on 𝑅𝑁. Hence, the computation of the posterior 𝑃(𝑎|𝑅𝑁) turns out to be the main task. The thesis introduces two technologies --- factor graphs and expectation propagation to deal with this problem in next chapters. -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 CHAPTER 3. FACTOR GRAPHS 6 Chapter 3 Factor graphs 3.1 Introduction to Factor graphs Factor graphs are a particular kind of graph model which can be used to represent the factorized functions in the form of graphs. Moreover, factor graphs enable an efficient way to compute the marginal distributions of the function with the algorithm called --- sum-product algorithm [3]. This section will introduce the basic theory of the factor graphs and sum-product algorithm with examples. 3.1.1 Marginals, Function factorizations, Factor graphs Marginal: For function: 𝑓(𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛) , the marginal of variable 𝑥𝑛 can be calculated by summing out all other variables: 𝐵𝑥𝑛(𝑋𝑛) = � 𝑓(𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛) {\𝑋𝑛} ; (3.1) Here 𝐵𝑥𝑛(𝑋𝑛) is the marginal of 𝑋𝑛 and {\𝑋𝑛} means the set of all possible values of all variables expect 𝑋𝑛. Function factorization: Assume the function 𝑓(𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛) can be factorized as a product of 𝐾 sub-functions (factors), therefore: 𝑓(𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛) = �𝑓𝑘(𝑉𝑘) 𝐾 𝐾=1 ; (3.2) Here 𝑉𝑘 is the set of variables and 𝑉𝑘 ⊆ {𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛} . One thing should be mentioned is each function may have more than one factorization form. Factor graphs: Factor graphs provide a one-to-one mapping from a factorized function. In factor graphs, neighborhood is an important concept which indicates that with a factorized function:  The neighborhood of the factor 𝑓𝑘: 𝑁(𝑓𝑘), means a set of variables which appear in 𝑓𝑘. CHAPTER 3. FACTOR GRAPHS 7  The neighborhood of the variable 𝑋𝑛: 𝑁(𝑋𝑛), means a set of factors which include 𝑋𝑛. For an instance, a function can be factorized as: 𝑓(𝑋1,𝑋2,𝑋3,𝑋4,𝑋5) = 𝑓1(𝑋1,𝑋2 ) ∗ 𝑓2(𝑋2,𝑋3) ∗ 𝑓3�𝑋1,𝑋4, 𝑋5� (3.3) In this case, the neighborhood of the variable 𝑋1: 𝑁(𝑋1) = {𝑓1, 𝑓3}, while the neighborhood of the factor 𝑓2:𝑁(𝑓2) = {𝑋2,𝑋3}. When mapping a given factorized function such as formula (3.2) into the corresponding factor graph, the process can be carried out as follows:  Draw a vertex for each variable 𝑋𝑛, which is called variable node.  Draw a vertex for each function 𝑓𝑘, which is called function node.  Draw an edge between the function node 𝑓𝑘 and every variable node 𝑋𝑛 ⊆ 𝑁(𝑓𝑘). The following example will show the process of mapping the factorized function into factor graph: Assume the function can be factorized into 4 factors: 𝑓(𝑋1,𝑋2,𝑋3,𝑋4,𝑋5,𝑋6) = 𝑓1(𝑋1,𝑋2,𝑋3) ∗ 𝑓2(𝑋1,𝑋4) ∗ 𝑓3(𝑋1,𝑋5,𝑋6) ∗ 𝑓4(𝑋2); (3.4) Then the corresponding factor graph is: Figure 3.1 the factor graph of the function 3.4 𝑋2 𝑋4 𝑓2 𝑋1 𝑋5 𝑋6 𝑋3 𝑓3 𝑓1 𝑓4 CHAPTER 3. FACTOR GRAPHS 8 The factor graph obviously shows that the neighborhood of 𝑋1: 𝑁(𝑋1) = {𝑓1, 𝑓2, 𝑓3} and the neighborhood of 𝑓3:𝑁(𝑓3) = {𝑋1,𝑋5,𝑋6}. 3.1.2 Sum-product algorithm in Factor graphs The sum-product algorithm introduces a concept named ‘Messages’ which are passing over the edges between the variable nodes and function nodes in factor graphs and can be used to compute the marginals of the variables. In factor graphs the messages are functions of the related variable.  The message from the function node 𝑓𝑘 to the variable node 𝑋𝑛 ⊆ 𝑁(𝑓𝑘) is denoted as 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛);  The message from the variable node 𝑋𝑛 to the function node 𝑓𝑘 ⊆ 𝑁(𝑋𝑛) is denoted as 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛); The messages can be calculated as follows: 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) = � 𝑓𝑘 \{𝑋𝑛} �{𝑋𝑚 = 𝑥𝑚}𝑋𝑚⊆𝑁(𝑓𝑘)� ∗ � 𝑀𝑋𝑚→𝑓𝑘(𝑋𝑚) 𝑋𝑚⊆𝑁(𝑓𝑘)\{𝑋𝑛} (3.5) 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛) = � 𝑀𝑓𝑟→𝑋𝑛(𝑋𝑛) 𝑓𝑟⊆𝑁(𝑋𝑛)\{𝑓𝑘} (3.6) 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛) 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) 𝑀𝑓1→𝑋𝑛(𝑋𝑛) 𝑀𝑓2→𝑋𝑛(𝑋𝑛) 𝑀𝑋1→𝑓𝑘(𝑋𝑛) 𝑀𝑋2→𝑓𝑘(𝑋𝑛) Figure 3.2 Messages passing from variable to function node and function to variable node In factor graphs the marginal of a particular variable 𝐵𝑥𝑛(𝑋𝑛) can be obtained by the multiplication of the in-coming and out-going messages between the variable node 𝑋𝑛 and the function node 𝑓𝑘 ⊆ 𝑁(𝑋𝑛): 𝐵𝑥𝑛(𝑋𝑛) = 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) ∗ 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛) (3.7) 𝑓2 𝑋𝑛 𝑓𝑘 𝑓1 𝑋2 𝑋1 𝑓𝑘 𝑋𝑛 CHAPTER 3. FACTOR GRAPHS 9 Here 𝐵𝑥𝑛(𝑋𝑛) can also be called the belief of 𝑋𝑛. Finally, a general sum-product algorithm can be:  Initialization: Each leaf function node 𝑓𝑘 send the message to its neighborhood variable node 𝑋𝑛 with 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) = 𝑓𝑘(𝑋𝑛). While each leaf variable node 𝑋𝑛 send the message to its neighborhood function node 𝑓𝑘 with 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛) = 1. Notice that leaf node only has one neighborhood node.  Compute the message: Compute all the messages passing through the graph by using the formula (3.5) and (3.6).  Compute the Marginal: Finally, the Marginal of the variable 𝑋𝑛 can be obtained by formula (3.7). Take figure 3.1 for an example. The algorithm can be processed as follows: Initialization: Assume the message coming from the leaf function nodes: 𝑀𝑓4→𝑋2(𝑋2) = 𝑓4(𝑋2) Assume the message coming from the leaf variable nodes: 𝑀𝑋3→𝑓1(𝑋3) = 1 𝑀𝑋4→𝑓2(𝑋4) = 1 𝑀𝑋5→𝑓3(𝑋5) = 1 𝑀𝑋6→𝑓3(𝑋6) = 1 Compute all the messages: From the figure 3.1, following messages can be obtained: 𝑀𝑋2→𝑓1(𝑋2) = 𝑀𝑓4→𝑋2(𝑋2) 𝑴𝒇𝟏→𝑿𝟏(𝑿𝟏) = � 𝑓1(𝑋1,𝑋2,𝑋3) ∗ \{𝑋1} 𝑀𝑋2→𝑓1(𝑋2) ∗ 𝑀𝑋3→𝑓1(𝑋3) 𝑀𝑓2→𝑋1(𝑋1) = � 𝑓2(𝑋1,𝑋4) ∗ \{𝑋1} 𝑀𝑋4→𝑓2(𝑋4) 𝑀𝑓3→𝑋1(𝑋1) = � 𝑓3(𝑋1,𝑋5,𝑋6) ∗ \{𝑋1} 𝑀𝑋5→𝑓3(𝑋5) ∗ 𝑀𝑋6→𝑓3(𝑋6) CHAPTER 3. FACTOR GRAPHS 10 𝑴𝑿𝑿→𝒇𝒇(𝑿𝟏) = 𝑀𝑓3→𝑋1(𝑋1) ∗ 𝑀𝑓2→𝑋1(𝑋1) Compute the marginal: Hence, the marginal of variable 𝑋1: 𝐵𝑋1(𝑋1) = 𝑴𝒇𝒇→𝑿𝑿(𝑿𝟏) ∗ 𝑴𝑿𝑿→𝒇𝒇(𝑿𝟏) 3.2 Factor graphs for the optical communication system This section will apply the factor graphs and sum-product algorithm to the optical communication model discussed in the last chapter. And some problems happened in the message computation will be mentioned in the end. 3.2.1 Factorization and Graphs According to the last section, the factor graphs can be applied to the optical communication system described in figure 2.1. The distribution 𝑆(𝑎,𝑋1,𝑅1,𝑋2|𝑅2) can be factorized: 𝑆(𝑎,𝑋1,𝑅1,𝑋2|𝑅2) = Z𝑝 ∗ 𝑃(𝑎) ∗ 𝑃(𝑋1|𝑎) ∗ 𝑃(𝑅1|𝑋1) ∗ 𝑃(𝑋2|𝑅1) ∗ 𝑃(𝑅2|𝑋2) (3.8) Here Z𝑝 is a constant. The factor graphs can be draw as follows: Figure 3.3 Factor graphs for optical communication system Here 𝑅2 is the observation and 𝑎 denotes the input signal. The task is to get the marginal of variable 𝑎 based on the observation 𝑅2. One thing should be mentioned is formula (3.8) and the factor graphs can be easily extended according to the number of spans in the system. 𝑎 𝑃(𝑋1|𝑎) 𝑋1 𝑅1 𝑃(𝑅1|𝑋1) 𝑃(𝑎) 𝑃(𝑋2|𝑅1) 𝑋2 𝑃(𝑅2|𝑋2) 𝑅2 CHAPTER 3. FACTOR GRAPHS 11 3.2.2 Sum-product algorithm As discussed in section 3.2, the messages passing over the graph in figure 3.3 can be computed. For simplification, only one span is considered this time. Hence the factor graph is: 𝑓1 𝑓2 𝑓3 Figure 3.4 Factor graphs for optical communication system with one span According to formula (3.5) and (3.6), the message computation can be simplified to:  For variable nodes, the outgoing message is equal to the incoming message.  For function nodes, the outgoing message is an integral of the product of the function and the incoming message. For instance, in figure 3.3: 𝑀𝑓2→𝑋1(𝑋1) = 𝑀𝑋1→𝑓3(𝑋1) (3.9) 𝑀𝑓2→𝑎(𝑎) = �𝑓2 (𝑎,𝑋1) ∗ 𝑀𝑋1→𝑓2(𝑋1)𝑑𝑥1 (3.10) After all the messages have been obtained, the marginal of 𝑎 can be calculated as 𝑃(𝑎|𝑅1) = 𝑀𝑓1→𝑎(𝑎) ∗ 𝑀𝑎→𝑓1(𝑎) (3.11) 3.2.3 Challenges As is known in chapter 2, the optical fiber will introduce SPM and AWGN when signals are passing through. For factor graph in figure 3.3, the function node f2 and f1 represents the nonlinear effect and Gaussian noise separately: 𝑓2(𝑎,𝑋1) = 𝛿 �𝑋1 − 𝑎 ∗ 𝑒𝑒 𝑝 �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑎|�2�� (3.12) 𝑓1(𝑋1,𝑅1) = 𝐶 ∗ 𝑒𝑒 𝑝 �− 1 𝑁0 �|𝑅1 − 𝑋1|�2� (3.13) Hence, with formula (3.10) the message 𝑀𝑓2→𝑎(𝑎) can be computed as: 𝑎 𝑃(𝑋1|𝑎) 𝑋1 𝑅1 𝑃(𝑅1|𝑋1) 𝑃(𝑎) CHAPTER 3. FACTOR GRAPHS 12 𝑀𝑓2→𝑎(𝑎) = �𝑓2(𝑎,𝑋1) ∗𝑀𝑋1→𝑓2(𝑋1)𝑑𝑥1 = �𝛿(𝑋1 − 𝑎 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑎||2)) ∗𝑀𝑋1→𝑓2(𝑋1)𝑑𝑥1 = 𝑀𝑋1→𝑓2�𝑎 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑎||2)� What’s more, because of the Gaussian function node f1 the message 𝑀𝑋1→𝑓2(𝑋1) also belongs to Gaussian family: 𝑀𝑋1→𝑓2(𝑋1) = 𝐶 ∗ 𝑒𝑒 𝑝 �− 1 𝑁0 �|𝑋1 − 𝑅1|�2� Here the observation R1 can be seen as a constant. Finally the message 𝑀𝑓2→𝑎(𝑎) is: 𝑀𝑓2→𝑎(𝑎) = 𝐶 ∗ 𝑒𝑒 𝑝 �− 1 𝑁0 ��𝑎 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑎||2) − 𝑅1�� 2 � (3.14) It is obviously that the function of 𝑀𝑓2→𝑎(𝑎) is complicated and one can imagine that after a number of spans the messages will be too hard to compute and the final result of the marginal will be too complicated to use. That is why Expectation Propagation is introduced in the next chapter. CHAPTER 4. EXPECTATION PROPAGATION 13 Chapter 4 Expectation Propagation 4.1 Introduction to Expectation Propagation Expectation Propagation (EP) is an algorithm which can iteratively make approximations for the desired distributions. This section will discuss the basic theory of EP [6, 12] and give an instance to show the general process of EP [7]. 4.1.1 Why Expectation Propagation In the end of the last chapter a problem of computing messages has been thrown out. In a more general model, assume X is continuous unknown sample which need to be decided, 𝑨 = {𝑌1, 𝑌2 , , , , , 𝑌𝑛−1, 𝑌𝑛} denote observations of the system and all belong to i.i.d.. According to Bayes’ rule: 𝑝(𝑋|𝑨) ∝ 𝑝(𝑋)�𝑝(𝑌𝑖|𝑋) 𝑖 (4.1) Here 𝑝(𝑋|𝑨) is already normalized. Assume 𝑝(𝑋) and 𝑝(𝑌𝑖|𝑋) are prior and likelihood function respectively. It can be seen that it will be unpractical to directly multiply a number of likelihoods. For instance, assume p(Yi|X) is the mixture of two different distributions: p(Yi|X) = G1 + G2, hence the complexity of the calculation will dramatically increase along with the growing of the number of observations. Actually, 500 observations need 2500 times of multiplication of two distributions. Therefore, the result formula for p(X|A) (normalized) is hard to access by directly computation. One reasonable method to deal with this problem is to make an approximation ℎ(𝑋) which could be close to the true 𝑝(𝑋|𝑨): 𝑝(𝑋|𝑨) ≈ ℎ(𝑋) (4.2) This is the main strategy of EP for solving this kind of problem as in the last chapter. 4.1.2 Exponential family and Kullback-Leibler divergence When making approximations, one may consider which kinds of the distributions can be selected to make the approximation and how to do it. These concern two basic concepts of EP: Exponential family and Kullback-Leibler divergence. CHAPTER 4. EXPECTATION PROPAGATION 14 Exponential family: Exponential family is often used to make approximations in EP. The exponential family has the form: 𝑟(𝒙|𝜃) = ℎ(𝒙)exp ��𝜂𝑖(𝜃)𝛵𝑖(𝒙) 𝑠 𝑖=1 − 𝛢(𝜃)� (4.3) Here 𝜃 is called the parameter of the family. Function 𝜂𝑖(𝜃) presents the weight of the potential function 𝛵𝑖(𝒙). Function 𝛢(𝜃) is a normalizing function which ensures that the value of the integral or summation of 𝑟(𝒙|𝜃) results to be one. Exponential family is often used because it contains many famous members such as Gaussian, Poisson, Bernoulli, etc., which is suitable to be applied to many different kind of models. What is more, the use of exponential family can simplify the computation when making the approximation. Because of the property of the exponential family, the multiplication or division of two distributions from the same exponential family results a distribution belonging to the same family with a normalizing factor. An example of the multiplication and division of two Gaussian distribution is given below [5]: 𝑁(𝑥; 𝑚1,𝜎1) ∗ 𝑁(𝑥; 𝑚2,𝜎2) = 𝑁(𝑚1; 𝑚2,𝜎1 + 𝜎2) ∗ 𝑁(𝑥; 𝑚3,𝜎3) (4.4) 𝜎3 = (𝜎1−1 + 𝜎2−1)−1 𝑚3 = 𝜎3𝜎1−1𝑚1 + 𝜎3𝜎2−1𝑚2 𝑁(𝑥; 𝑚1,𝜎1) 𝑁(𝑥; 𝑚2,𝜎2) = 𝜎2 (𝜎2 − 𝜎1)𝑁(𝑚1; 𝑚2,𝜎2 − 𝜎1) ∗ 𝑁 (𝑥; 𝑚3,𝜎3) (4.5) 𝜎3 = (𝜎1−1 − 𝜎2−1)−1 𝑚3 = 𝜎3𝜎1−1𝑚1 − 𝜎3𝜎2−1𝑚2 The formula above also can be extended to two-dimensional Gaussian distribution. Kullback-Leibler (KL) divergence: When approximating the distribution to a certain exponential family, the Kullback-Leibler (KL) divergence is often used. The KL divergence can measure the difference between the existing distribution and the desired exponential family. A general form of KL divergence is: 𝐾𝐾�𝑝(𝑋)||𝑝′(𝑋)� = �𝑝(𝑋) ∗ log� 𝑝(𝑋) 𝑝′(𝑋)�𝑑𝑑 (4.6) Here 𝑝′(𝑋) is the approximation of 𝑝(𝑋). CHAPTER 4. EXPECTATION PROPAGATION 15 In order to make the approximation, one should try to minimize the KL divergence: 𝜃′ = 𝑎𝑎𝑎𝜃 𝑚𝑚𝑚 𝐾𝐾�𝑝(𝑋)||𝑝′(𝑋)� (4.7) Moment matching is considered to be an optimal way to minimize the KL divergence: �𝑝(𝑋) ∗ 𝑇(𝒙)𝑑𝑑 = �𝑝′(𝑋) ∗ 𝑇(𝒙)𝑑𝑑 (4.8) The formula can also be rewritten as: 𝔼𝑷[𝑇(𝒙)] = 𝔼𝒑′[𝑇(𝒙)] (4.9) Hence, it can be seen that minimizing the KL divergence is equal to mapping the desired moments of the existing distribution to a certain exponential family. Take Gaussian distribution for an example. The Minimization of the KL divergence is just computing the mean and covariance of the existing distribution and using the results to be the mean and covariance for the Gaussian distribution. When talking about KL-divergence, another related concept --- Projection is often used, which can denote the process of moment matching: 𝑝′(𝑥) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷[𝑝(𝑥)] (4.10) Here 𝑝′(𝑥) is the approximated distribution with the same moments mapped from 𝑝(𝑥). 4.1.3 Expectation Propagation It has been said that EP makes approximations ‘iteratively’. Why ‘iteratively’? Rewrite formula (4.1): 𝑝(𝑋|𝑨) ∝ 𝑝(𝑋)�𝑝(𝑌𝑖|𝑋) 𝑖 = �𝑟𝑖 𝑖 (𝑋) (4.11) Here 𝑟0(𝑋) = 𝑝(𝑋), 𝑟1(𝑋) = 𝑝(𝑌1|𝑋),,,, 𝑟𝑖(𝑋) = 𝑝(𝑌𝑖|𝑋). An approximation ℎ(𝑋) needed to be made for 𝑝(𝑋|𝑨). As it already known in the last section, a projection can be made to get the approximation: ℎ(𝑋) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷[𝑝(𝑋|𝑨)] (4.12) Here ℎ(𝑋) is the normalized version of 𝐻(𝑋). Because 𝑝(𝑋|𝑨) consists of a bunch of CHAPTER 4. EXPECTATION PROPAGATION 16 distributions 𝑟𝑖(𝑋), the moment matching is not easy to carry out directly. EP introduces a method to get the final approximation ℎ(𝑋) iteratively. Assume approximation 𝐻(𝑋) also consists of a bunch of sub-approximations 𝑟̃𝑖(𝑋): 𝐻(𝑋) = �𝑟̃𝑖 𝑖 (𝑋) (4.13) ℎ(𝑋) = ∏ 𝑟̃𝑖𝑖 (𝑋) ∫∏ 𝑟̃𝑖𝑖 (𝑋)𝑑𝑑 (4.14) Each 𝑟̃𝑖(𝑋) is an approximated distribution for 𝑟𝑖(𝑋). Hence, after all 𝑟̃𝑖(𝑋) have been obtained, the final approximation can be calculated by formula (4.13). EP algorithm firstly initializes all 𝑟̃𝑖(𝑋) to an exponential family. In each iteration, one can choose any of 𝑟̃𝑖(𝑋) to make a new approximation and then refine that 𝑟̃𝑖(𝑋). When approximating 𝑟̃𝑖(𝑋), EP makes a projection on the multiplication of the true 𝑟𝑖(𝑋) and the term ℎ\𝑖(𝑋) which denotes all sub-approximations except 𝑟̃𝑖(𝑋): ℎ\𝑖(𝑋) ∝ ℎ(𝑋) 𝑟̃𝑖(𝑋) (4.15) EP will continuously process the iteration until all 𝑟̃𝑖(𝑋) converge. Hence, a typical EP process can be: 1. Initialization of the sub-approximation term 𝑟̃𝑖(𝑋): 2. Calculate the posterior of 𝑥: ℎ(𝑋) = ∏ 𝑟̃𝑖(𝑋)𝑖 ∫∏ 𝑟̃𝑖(𝑋)𝑖 𝑑𝑑 3. Do iterations until all 𝑟̃𝑖(𝑋) converge:  Pick up any 𝑟̃𝑖(𝑋) for updating  Divide ℎ(𝑋) by 𝑟̃𝑖(𝑋) and normalize to get the ‘lack’ posterior ℎ\𝑖(𝑋): ℎ\𝑖(𝑋) ∝ ℎ(𝑋) 𝑟̃𝑖(𝑋)  Make a projection on the Multiplication of ℎ\𝑖(𝑋) and 𝑟𝑖(𝑋), which will produce the new mean, covariance and a normalizing factor 𝐾𝑖 for the new ℎ(𝑋): CHAPTER 4. EXPECTATION PROPAGATION 17 ℎ(𝑋) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷[ℎ\𝑖(𝑋) ∗ 𝑟𝑖(𝑋)] (4.16)  Update 𝑟̃𝑖(𝑋) with the new ℎ(𝑋): 𝑟̃𝑖(𝑋) = 𝐾𝑖 ∗ ℎ(𝑋) ℎ\𝑖(𝑋) (4.17) 4. Compute the final approximation for 𝑝(𝑋): 𝑝(𝑋) ≈ ℎ(𝑋) ∝ 𝐻(𝑋) = �𝑟̃𝑖 𝑖 (𝑋) (4.18) 4.1.4 Potential problem One problem may happen when making the projection: The moments of the desired distribution may not be easy to calculate mathematically. Then other technique such as Monte Carlo sampling needed be introduced to assist the approximation. This will be discussed later in the thesis. 4.2 An example of Expectation Propagation: clutter problem This section presents the process of Expectation Propagation with factor graphs by solving a classic problem – clutter problem [7]. 4.2.1 The Problem A bunch of signals are transmitted through an added Gaussian noise channel which can cause the clutter problem. Assume that the distribution of the observed signals on the condition of the sent signals is a Gaussian distribution mixed with another Gaussian distribution caused by the clutter: 𝑝(𝑌|𝑋) = 𝛼 ∗ 𝑁(Y; X,σ1 ∗ 𝐈𝒅 ) + (1 − 𝛼) ∗ 𝑁(𝑌; 0,σ2 ∗ 𝐈𝒅) (4.19) 𝑁(𝑌;𝑀,𝛴 ) = 2𝜋|𝛴|− 1 2 ∗ 𝑒𝑒𝑒 �− 1 2 (𝑌 −𝑀) 𝛴−1 (𝑌 −𝑀)𝑇� (4.20) In formula (4.19) Y and X are observed and sent signal respectively. The first part of the formula is useful for later research while the second part presents the influence of the clutter. Finally, parameter α shows the ratio between the first and second part in the finally distribution. Moreover, 𝑝(𝑌|𝑋) also can be considered into Gaussian family because it is a CHAPTER 4. EXPECTATION PROPAGATION 18 mixture of two Gaussian distributions. Suppose a set of independent signals 𝐀 = {Y1, Y2 , , , , , Y𝑛−1, Y𝑛} has been observed in the end. Hence, the joint distribution of observation A and X is: 𝑝(𝑨,𝑋) ∝ 𝑝(𝑋)�𝑝(𝑌𝑖|𝑋) 𝑖 = �𝑟𝑖 𝑖 (𝑋) (4.21) Here 𝑟0(𝑋) = 𝑝(𝑋) , 𝑟1(𝑋) = 𝑝(𝑌1|𝑋) ,,,, 𝑟𝑖(𝑋) = 𝑝(𝑌𝑖|𝑋) . Finally, the task becomes to compute the probability density function 𝑝(𝑨,𝑋) while 𝑟𝑖(𝑋) is known. 4.2.2 Factor Graph According to Chapter 3, formula (4.21) can be represented into factor graph below: Figure 4.1 Factor Graph of clutter problem 4.2.3 Processing Expectation Propagation From the last section, the following formula can be obtained: �𝑟𝑖 𝑖 (𝑋) = 𝑝(𝑋)�𝑝(𝑌𝑖|𝑋) 𝑖 ∝ 𝑝(𝑨,𝑋) ≈ ℎ(𝑋) ∝ 𝐻(𝑋) = �𝑟̃𝑖 𝑖 (𝑋) (4.22) It can be seen that 𝑟0 = 𝑝(𝑋) = r�0 and 𝑟i(𝑋), 𝑖 = {1,2,3, , ,𝑛 − 1,𝑛} consists of Gaussian distributions. Therefore the approximation component 𝑟̃𝑖(𝑋) can also be approximated to be Gaussian-like: 𝑟̃𝑖(𝑋) = 𝐶𝑖 ∗ 𝑒𝑒𝑒 �− 1 2 (𝑋 −𝑀𝑖) 𝛴𝑖−1 (𝑋 −𝑀𝑖)𝑇� (4.23) 𝑋𝑛 𝑝(𝑋) 𝑝(𝑦2|𝑋) 𝑝(𝑦1|𝑋) 𝑝(𝑦𝑛−1|𝑋) 𝑝(𝑦𝑛|𝑋) … … CHAPTER 4. EXPECTATION PROPAGATION 19 What is more, because the result of the multiplication of Gaussian distributions is still an Gaussian distribution (not normalized). Hence, 𝑝(𝑨,𝑋)(normalized) can be approximated to be a Gaussian distribution. The formulas for Gaussian multiplication and division can be found in formula (4.4) and (4.5). According to section 4.2.3, the EP can process as follows: 1. Initialization of the mean and covariance matrix of the approximation term 𝑟̃𝑖(𝑋).  Assume 𝑀0 = 𝟎,𝛴0 = 10𝐈𝒅,𝐶0 = 2𝜋|𝛴0|− 1 2 for the 𝑟̃0(𝑋) term.  Because the rest of the approximation term are unknown at the beginning, the mean and covariance matrix are initialized to be 𝑀𝑖 = 𝟎,𝛴𝑖 = 1000𝐈𝒅 → ∞,𝐶𝑖 = 1 to make the probability of 𝑟̃𝑖(𝑋) be approximately identical in the data space at first. 2. According to formula (4.14) and (4.23): ℎ(𝑋) = 𝑁(𝑋; 𝒎𝒙,𝜎𝑥 ∗ 𝑰𝒅),𝒎𝒙 = 𝒎𝟎,𝜎𝑥 = 𝜎0. 3. Doing loop from 𝑟̃1, … , 𝑟̃𝑛 until all 𝑟̃𝑖 converge:  According to formula (4.15) and (4.5), Dividing ℎ(𝑋) by 𝑟̃𝑖(𝑋) and normalizing to get the mean and covariance of the ‘lack’ posterior ℎ\𝑖(𝑋): 𝜎𝑥 \𝑖 = (𝜎𝑥−1 − 𝜎𝑖−1)−1 𝑚𝑥 \𝑖 = 𝜎𝑥 \𝑖𝜎𝑥−1𝑚𝑥 − 𝜎𝑥 \𝑖𝜎𝑖−1𝑚𝑖  Multiply ℎ\𝑖(𝑋) and 𝑟𝑖(𝑋) which equals to formula (4.23). Make a projection and get the new mean, covariance and a normalizing factor 𝐾𝑖 for the new ℎ(𝑋): ℎ(𝑋) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷[ℎ\𝑖(𝑋) ∗ 𝑟𝑖(𝑋)] 𝐾𝑖 = (1 − 𝛼) ∗ 𝑁(𝑌𝑖; 0,𝜎2 ∗ 𝑰𝒅 ) + 𝛼 ∗ 𝑁(𝑌𝑖;𝑚𝑥 \𝑖 , (𝜎𝑥 \𝑖 + 𝜎1)𝑰𝒅 ) 𝑚𝑥 = 𝑚𝑥 \𝑖 + 𝛼 ∗ 𝜎𝑥 \𝑖 ∗ �𝑌𝑖 − 𝑚𝑥 \𝑖� ∗ 𝑁(𝑌𝑖;𝑚𝑥 \𝑖 , (𝜎𝑥 \𝑖 + 𝜎1)𝑰𝒅 ) 𝐾𝑖 ∗ (𝜎𝑥 \𝑖 + 1) CHAPTER 4. EXPECTATION PROPAGATION 20 𝜎𝑥 = 𝜎𝑥 \𝑖 − 𝛼 ∗ (𝜎𝑥 \𝑖)2 ∗ 𝑁(𝑌𝑖;𝑚𝑥 \𝑖 , (𝜎𝑥 \𝑖 + 𝜎1)𝑰𝒅 ) 𝐾𝑖 ∗ �𝜎𝑥 \𝑖 + 1� ∗ [1 − (1 − 𝛼) ∗ 𝑁(𝑌𝑖; 0,𝜎2 ∗ 𝑰𝒅 ) ∗ 𝜎𝑥 \𝑖 ∗ |𝑌𝑖 − 𝑚𝑥 \𝑖|2 𝐾𝑖 ∗ �𝜎𝑥 \𝑖 + 1� ]  According to formula (4.5) and (4.17), update 𝑟̃𝑖(𝑋) = 𝐾𝑖 ∗ ℎ(𝑋)/ ℎ\𝑖(𝑋): 𝜎𝑖 = �𝜎𝑥−1 − (𝜎𝑥\𝑖)−1�−1 𝑚𝑖 = 𝜎𝑖𝜎𝑥−1𝑚𝑥 − 𝜎𝑖(𝜎𝑥\𝑖)−1𝑚𝑥 \𝑖 𝐶𝑖 = (2𝜋)−1|𝜎𝑖 ∗ 𝑰𝒅|− 12 ∗ 𝜎𝑥\𝑖 (𝜎𝑥\𝑖 − 𝜎𝑥)𝑁 �𝑚𝑥; 𝑚𝑥 \𝑖 ,𝜎𝑥\𝑖 − 𝜎𝑥� ∗ 𝐾𝑖  Compute the approximated distribution --- 𝑝(𝑨,𝑋): 𝑝(𝑨,𝑋) ∝�𝑟̃𝑖(𝑋) 𝑖 4.3 Expectation Propagation for the optical communication system In the last chapter, the factor graphic model and message computation method has been introduced to deal with the optical communication model. This section will firstly extend EP algorithm to the message computing in factor graphs, then develop an method with EP for the message computing in chain-like model such as optical communication system. 4.3.1 Expectation Propagation for message computing An example will be given below to show how to deal with the message computing with EP. CHAPTER 4. EXPECTATION PROPAGATION 21 Figure 4.2 Message computing with EP According to the sum-product algorithm, the belief of 𝑋𝑛 is: 𝐵𝑥𝑛(𝑋𝑛) = 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) ∗ 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛) = � 𝑀𝑓𝑟→𝑋𝑛(𝑋𝑛) 𝑓𝑟⊆𝑁(𝑋𝑛) (4.24) 𝑏𝑥𝑛(𝑋𝑛) ∝ � 𝑀𝑓𝑟→𝑋𝑛(𝑋𝑛) (4.25) 𝑓𝑟⊆𝑁(𝑋𝑛) Here 𝑏𝑥𝑛(𝑋𝑛) is normalized from 𝐵𝑥𝑛(𝑋𝑛). Now the task is to compute 𝑏𝑥𝑛(𝑋𝑛). It can be seen that 𝑏𝑥𝑛(𝑋𝑛) is the multiplication of a number of distributions, as it’s discussed before, make computation or approximation on 𝑏𝑥𝑛(𝑋𝑛) directly is not reasonable. EP algorithm can do approximation iteratively. Assume: � 𝑀�𝑓𝑟→𝑋𝑛(𝑋𝑛) 𝑓𝑟⊆𝑁(𝑋𝑛) ∝ ℎ(𝑋) ≈ 𝑏𝑥𝑛(𝑋𝑛) ∝ � 𝑀𝑓𝑟→𝑋𝑛(𝑋𝑛) 𝑓𝑟⊆𝑁(𝑋𝑛) (4.26) Here, 𝑀�𝑓𝑟→𝑋𝑛(𝑋𝑛) is the sub-approximation for each message. As in EP algorithm, do iteration until all messages converged. In each iteration, choose one message to refine by EP algorithm as discussed in section 4.1.3: For example, move 𝑀�𝑓𝑘→𝑋𝑛(𝑋𝑛) in figure 4.2: 𝑋𝑛 𝑓𝑘 𝑓2 𝑓3 𝑓1 𝑋2 𝑋3 𝑋1 CHAPTER 4. EXPECTATION PROPAGATION 22 ℎ\𝑓𝑘(𝑋) ∝ ℎ(𝑋) 𝑀�𝑓𝑘→𝑋𝑛(𝑋𝑛) (4.27) ℎ(𝑋) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷�ℎ\𝑓𝑘(𝑋) ∗ 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛)� (4.28) 𝑀�𝑓𝑘→𝑋𝑛(𝑋𝑛) = 𝐾𝑖 ∗ ℎ(𝑋) ℎ\𝑓𝑘(𝑋) (4.29) After all messages converged, then 𝑏𝑥𝑛(𝑋𝑛) can be obtained: 𝑏𝑥𝑛(𝑋𝑛) ≈ ℎ(𝑋) ∝ � 𝑀�𝑓𝑟→𝑋𝑛(𝑋𝑛) 𝑓𝑟⊆𝑁(𝑋𝑛) (4.30) Obviously, the example in figure 4.1 in section 4.2 can also be translated into message computing where approximation and computation are made for 𝑏𝑥𝑛(𝑋𝑛). 4.3.2 Message computing for chain-like model with Expectation Propagation In last section, one thing should be noticed that, according to the sum-product algorithm, the message: 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) in formula (4.28) is equal to: 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) = � 𝑓𝑘 \{𝑋𝑛} �{𝑋𝑚 = 𝑥𝑚}𝑋𝑚⊆𝑁(𝑓𝑘)� ∗ � 𝑀𝑋𝑚→𝑓𝑘(𝑋𝑚) 𝑋𝑚⊆𝑁(𝑓𝑘)\{𝑋𝑛} = � 𝑓𝑘 \{𝑋𝑛} (𝑋1,𝑋2,𝑋3,𝑋𝑛) ∗ 𝑀𝑋1→𝑓𝑘(𝑋1) ∗ 𝑀𝑋2→𝑓𝑘(𝑋2) ∗ 𝑀𝑋3→𝑓𝑘(𝑋3) (4.31) From formula (4.31), it can be seen that in a chain-like model, more messages (𝑀𝑋1→𝑓𝑘(𝑋1),𝑀𝑋2→𝑓𝑘(𝑋2),𝑀𝑋3→𝑓𝑘(𝑋3) in this case) need to be computed or approximated before the message: 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) can be used to make the approximation. Hence, in a chain-like model [8] (optical communication) the EP algorithm will be modified from the one discussed in beginning of this chapter. Take the optical communication model with one span for an instance. Figure 4.3 Factor graph for optical communication model with one span 𝑃(𝑋1) 𝑋1 𝑃(𝑋3|𝑋2) 𝑃(𝑋2|𝑋1) 𝑋2 𝑋3 𝑓1 𝑓2 𝑓3 CHAPTER 4. EXPECTATION PROPAGATION 23 Figure 4.3 can be simplified as: Figure 4.4 Simplified factor graph for optical communication model with one span In figure 4.4, the direction from 𝑓1 to 𝑓3 is define as ‘forward’ and message passing in this direction is written as: 𝑚𝑚𝑚_𝑓(𝑋𝑛). While the opposite direction is ‘backward’ and the message is: 𝑚𝑚𝑚_𝑏(𝑋𝑛). In figure 4.4, the target is to get the belief of 𝑋1: 𝐵𝑋1(𝑋1) = 𝑚𝑚𝑚_𝑓(𝑋1) ∗ 𝑚𝑚𝑚_𝑏(𝑋1) (4.32) Take figure 4.4 for an instance, the EP process can be carried out as follows: 1. Initialize the forward and backward messages passing through the factor graph. 2. Do a loop until all the messages converged:  Make approximations for the messages of the last function block in the end of the model.  Move to the backward function block and obtain the corresponding messages block by block until finish the message approximation for the second function block of the model (block 𝑓2).  From block 𝑓2, move to the forward function block and approximate the messages block by block until reach the last function block in the end of the model. 3. Calculate the belief of 𝑋1: 𝐵𝑋1(𝑋1) = 𝑚𝑚𝑚_𝑓(𝑋1) ∗ 𝑚𝑚𝑚_𝑏(𝑋1) (4.33) This process can also be seen in Figure 4.4. What’s more, according to the algorithm discussed above, it can be seen that the two outgoing messages computed in each function block can update the backward and forward incoming message of its left and right function 𝑃(𝑋1) 𝑃(𝑋3|𝑋2) 𝑃(𝑋2|𝑋1) 𝑓1 𝑓2 𝑓3 𝑿𝟏 𝑿𝟐 𝑿𝟑 𝑚𝑚𝑚_𝑓(𝑋1) 𝑚𝑚𝑚_𝑓(𝑋2) 𝑚𝑚𝑚_𝑓(𝑋3) 𝑚𝑚𝑚_𝑏(𝑋1) 𝑚𝑚𝑚_𝑏(𝑋2) 𝑚𝑚𝑚_𝑏(𝑋3) CHAPTER 4. EXPECTATION PROPAGATION 24 block respectively in the model. Hence, the messages passing through the model can be refined block by block in iterations. The method for message computation and approximation for each function block is described below: Figure 4.5 Message computation and approximation for one function block According to figure 4.5, the task is to get the outgoing messages 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) and 𝑚𝑚𝑚_𝑏(𝑋𝑛−1), while the incoming messages 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) and 𝑚𝑚𝑚_𝑏(𝑋𝑛) are given. Take the outgoing message 𝑚𝑚𝑚_𝑓(𝑋𝑛) for an example: According to the sum-production algorithm discussed in section 3.2.2, the belief of Xn can be calculated as: 𝐵𝑋𝑛(𝑋𝑛) = �𝑃(𝑋𝑛|𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) ∗𝑚𝑚𝑚_𝑏(𝑋𝑛)𝑑𝑥𝑛−1 (4.34) A projection can be made on BXn(Xn), which results an approximated distribution b�(Xn) and a normalizing factor Kn. 𝑏�(𝑋𝑛) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷�𝐵𝑋𝑛(𝑋𝑛)� (4.35) The outgoing message of Xn can be computed as: 𝑚𝑠𝑠�_𝑓(𝑋𝑛) = 𝐾𝑛 ∗ 𝑏�(𝑋𝑛) 𝑚𝑚𝑚_𝑏(𝑋𝑛) (4.36) The outgoing message 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) can be obtained in the same way. One thing should be mentioned that the algorithm above is possible to be simplified in some particular case, which will be discussed in the next chapter. 𝑃(𝑋𝑛|𝑋𝑛−1) 𝑓𝑛 𝑿𝒏−𝟏 𝑿𝒏 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) 𝑚𝑚𝑚_𝑓(𝑋𝑛) 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) 𝑚𝑚𝑚_𝑏(𝑋𝑛) CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 25 Chapter 5 Implementation of Expectation Propagation 5.1 Optical communication model with linear block In this section, the optical communication system model involving linear (AWGN) function block will be discussed. The details of the message approximation for linear block will be talked about. The linear system model and the corresponding factor graph for two spans are as follows: Figure 5.1 System model with only linear block Figure 5.2 Factor graph for model with only linear block 5.1.1 System setup As discussed in chapter 4, the initialization should be done before the main process will be carried out. For this system, choose two dimensional Gaussian distribution from the exponential family as the approximation term for the messages: 𝑟̃𝑖(𝑋) = 𝐶𝑖 ∗ 𝑒𝑒𝑒 �− 1 2 (𝑋 −𝑀𝑖) 𝛴𝑖−1 (𝑋 −𝑀𝑖)𝑇� (5.1) Obviously, the message passing through factor graph is the mean and covariance matrix of the Gaussian distribution. T R N N N N 𝑃(𝑎) 𝑎 𝑅1 𝑋1 𝑃(𝑅1|𝑋1) 𝑃(𝑋1|𝑎) 𝑓1 𝑓2 𝑓3 CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 26 Initial all the forward incoming messages of each function block to have: 𝑀𝑀𝑀𝑀:𝑀 = [0,0],𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶:𝛴 = �∞ 0 0 ∞� Considering the implementation, make ∞ to be 100. What’s more, the back incoming message from the observation R to the last function block in the end of the model can be seen as a Gaussian distribution with: 𝑀𝑀𝑀𝑀:𝑀 = [𝑋𝑅 ,𝑌𝑅],𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶:𝛴 = �0 0 0 0� 5.1.2 Message approximation for linear block Figure 5.3 Message computation and approximation for linear function block According to figure 5.3, the task is to get the outgoing message: 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) and 𝑚𝑚𝑚_𝑓(𝑋𝑛) by using the incoming message: 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) and 𝑚𝑚𝑚_𝑏(𝑋𝑛). And the function 𝑓𝑛 is: 𝑓𝑛(𝑋𝑛,𝑋𝑛−1) = 𝐾 ∗ 𝑒𝑒 𝑝 �− 1 𝑁0 �|𝑋𝑛 − 𝑋𝑛−1|�2� (5.2) 𝒎𝒎𝒎_𝒇(𝑿𝒏): For the reason that 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) has: 𝑀𝑀𝑀𝑀:𝑀 = [0,0],𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶:𝛴 = �∞ 0 0 ∞� This makes the Gaussian distribution actually to be a uniform distribution. Consider that the function 𝑓𝑛 is Gaussian-like, therefore the message should be still a uniform distribution after passed the function. Hence, in this case: 𝑚𝑚𝑚_𝑓(𝑋𝑛) = 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) (5.3) 𝒎𝒎𝒎_𝒃(𝑿𝒏−𝟏): Refer to the formula (4.34): 𝑃(𝑋𝑛|𝑋𝑛−1) 𝑓𝑛 𝑿𝒏−𝟏 𝑿𝒏 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) 𝑚𝑚𝑚_𝑓(𝑋𝑛) 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) 𝑚𝑚𝑚_𝑏(𝑋𝑛) CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 27 𝐵𝑋𝑛−1(𝑋𝑛−1) = �𝑃(𝑋𝑛|𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) ∗𝑚𝑚𝑚_𝑏(𝑋𝑛)𝑑𝑥𝑛 Because the components involved in the formula above are all Gaussian-like, hence the message can be computed directly without any approximation. 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) = �𝑃(𝑋𝑛|𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑏(𝑋𝑛)𝑑𝑥𝑛 (5.4) According to formula (5.1) and (5.2) 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) = �𝑁(𝑋𝑛;𝑋𝑛−1,𝑁0) ∗ 𝑁(𝑋𝑛;𝑀𝑛,𝛴𝑛) 𝑑𝑥𝑛 (5.5) With formula (4.4), one can obtain: 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) = �𝑁(𝑋𝑛−1;𝑀𝑛,𝑁0 + 𝛴𝑛) ∗ 𝑁(𝑋𝑛;𝑀, Σ) 𝑑𝑥𝑛 (5.6) = 𝑁(𝑋𝑛−1;𝑀𝑛,𝑁0 + 𝛴𝑛) 5.2 Optical communication model with linear and nonlinear block This section will introduce the nonlinear block (SPM) into the system discussed above. Monte Carlo method will be discussed to solve the message approximation for the nonlinear block. Finally, solutions will be given respectively for the optical communication system with single and dual polarization. The system model and factor graph for one span are as given below: Figure 5.4 System model with nonlinear and linear block Figure 5.5 Factor graph for model with linear and nonlinear block T R N N N N NL NL 𝑃(𝑎) 𝑎 𝑅1 𝑋1 𝑃(𝑅1|𝑋1) 𝑃(𝑋1|𝑎) 𝑓1 𝑓2 𝑓3 CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 28 In figure 5.5, 𝑓2 and 𝑓3 represent the nonlinear and linear block respectively in one span. 5.2.1 System setup The initialization for the system with nonlinear block is the same as in section 5.1.1. 5.2.2 Message approximation for nonlinear system with single polarization Consider the picture in figure 5.3 again, the function block 𝑓𝑛 now is: 𝑓𝑛(𝑋𝑛,𝑋𝑛−1) = 𝛿�𝑋𝑛 − 𝑋𝑛−1 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑋𝑛−1||2)� (5.7) 𝒎𝒎𝒎_𝒇(𝑿𝒏): As discussed in section 5.1.2, the incoming message: 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) is actually a uniform distribution. Meanwhile the effect of function 𝑓𝑛 is a rotation for the incoming message according to message computation method discussed in chapter 3. Hence, formula (5.3) can be still applied in this case. 𝒎𝒎𝒎_𝒃(𝑿𝒏−𝟏): Refer to the formula (4.34): ⎩ ⎨ ⎧ 𝐵𝑋𝑛−1(𝑋𝑛−1) = �𝑃(𝑋𝑛|𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) ∗𝑚𝑚𝑚_𝑏(𝑋𝑛)𝑑𝑥𝑛 𝑃(𝑋𝑛|𝑋𝑛−1) = 𝛿�𝑋𝑛 − 𝑋𝑛−1 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑋𝑛−1||2)� 𝐵𝑋𝑛−1(𝑋𝑛−1) = 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑏 �𝑋𝑛−1 ∗ exp �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑋𝑛−1|�2�� (5.8) According to the EP algorithm discussed in section 4.2.2, 𝐵𝑋𝑛−1(𝑋𝑛−1) should be projected into a Gaussian distribution: 𝑁(𝑋𝑛−1;𝑀𝑛−1, Σ𝑛−1) ~ 𝑏�(𝑋𝑛−1) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷�𝐵𝑋𝑛−1(𝑋𝑛−1)� (5.9) In formula (5.10), the mean: 𝑀𝑛−1 and covariance: Σ𝑛−1 matrix of 𝐵𝑋𝑛−1(𝑋𝑛−1) should be calculated for the approximation term: 𝑏�(𝑋𝑛−1). In this case, the mean and covariance matrix is not easy to figure out mathematically, hence, the importance sampling and Monte Carlo integration [9, 10] are introduced to deal with the problem. Assume: 𝐵𝑋(𝑋) = 𝑚𝑚𝑚_𝑓(𝑋) ∗ 𝑚𝑚𝑚_𝑏 �𝑋 ∗ exp �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑋|�2�� CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 29 = 𝑁1(𝑋) ∗ 𝑁2 �𝑋 ∗ 𝑒𝑒𝑒 �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑋|�2�� = 𝑁1(𝑋) ∗ 𝑄(𝑋) (5.10) 1. Draw 𝐿 samples from 𝑄(𝑋): �𝒙(𝟏), , , , , ,𝒙(𝒍)�  Draw L samples from 𝑁2(𝑋): �𝒚(𝟏), , , , , ,𝒚(𝒍)�  Rotate the samples: 𝒙(𝒍) = 𝒚(𝒍) ∗ 𝑒𝑒𝑒 �−𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ��𝒚(𝒍)�� 2 � (5.11) 2. Compute weights for samples: �𝒙(𝟏), , , , , ,𝒙(𝒍)�: 𝑤(𝑙) = 𝑁1�𝒙(𝒍)� (5.12) 3. Normalize the weights: 𝑣(𝑙) = 𝑤(𝑙) ∑ 𝑤(𝑙) 𝐿 (5.13) The set: �𝒙(𝒍), 𝑣(𝑙)�𝒍=𝟏 𝑳 is called the representation of 𝐵𝑋(𝑋). 4. Calculate the mean and covariance by Monte Carlo integration: ⎩ ⎪ ⎨ ⎪ ⎧ 𝒎� = �𝑣(𝑙) 𝐿 𝑙=1 ∗ 𝑥(𝑙) 𝒗� = �𝑣(𝑙) 𝐿 𝑙=1 ∗ �𝑥(𝑙) −𝒎� � ∗ �𝑥(𝑙) −𝒎� � 𝑇 (5.14) After the mean and covariance matrix have been approximated the message: 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) can be made as in formula (4.36). 5.2.3 Message approximation for nonlinear system with dual polarization In the system with dual polarization, two signals are transmitted simultaneously. The rotation phenomena caused by the nonlinear block for each signal is not only dependent on the signal itself but also on the other one. The formulas are given below: 𝑋1𝑛 = 𝑋1𝑛−1 ∗ 𝑒𝑒 𝑝 � 𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ � �|𝑋1𝑛−1|�2 + �|𝑋2𝑛−1|�2 � � (5.15) 𝑋2𝑛 = 𝑋2𝑛−1 ∗ 𝑒𝑒 𝑝 � 𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ � �|𝑋1𝑛−1|�2 + �|𝑋2𝑛−1|�2 � � (5.16) CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 30 Figure 5.6 Message computation and approximation for nonlinear function block, dual polarization According to figure 5.6, it is obvious that the analysis method and the result of the message: 𝒎𝒎𝒎_𝒇(𝑿𝟏𝒏) is the same as in the last section. The only difference is in the message: 𝒎𝒎𝒎_𝒃(𝑿𝟏𝒏−𝟏) when processing the importance sampling and Monte Carlo integration in step 1: 1. Draw 𝐿 samples from 𝑄(𝑋): �𝒙(𝟏), , , , , ,𝒙(𝒍)�  Draw L samples from 𝑁2(𝑋): �𝒚𝟏(𝟏), , , , , ,𝒚𝟏(𝒍)�  Draw L samples from 𝑁2 2(𝑋2): �𝒚𝟐(𝟏), , , , , ,𝒚𝟐(𝒍)�  Rotate the samples: 𝒙𝟏(𝒍) = 𝒚𝟏(𝒍) ∗ 𝑒𝑒𝑒 �−𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ���𝒚𝟏 (𝒍)�� 2 + ��𝒚𝟐(𝒍)�� 2 �� (5.17) The following steps are the same as discussed in the last section. And the message: 𝒎𝒎𝒎_𝒇(𝑿𝟐𝒏), 𝒎𝒎𝒎_𝒃(𝑿𝟐𝒏−𝟏) can be obtained in the same way above. 5.3 Simplification on EP algorithm From the discussion on initialization and message computation and approximation method above, it can be easily computed that all the forward messages passing through the system have: 𝑀𝑀𝑀𝑀:𝑀 = [0,0],𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶:𝛴 = �∞ 0 0 ∞� Therefore, the messages (mean and covariance matrix) will be converged in only one way from the end to the beginning of the model without iteration. Hence, the EP algorithm discussed in section 4.3 is simplified: 𝑃(𝑋1𝑛|𝑋1𝑛−1) 𝑓𝑛 𝑿𝟏𝒏−𝟏 𝑿𝟏𝒏 𝑚𝑚𝑚_𝑓(𝑋1 𝑛−1) 𝑚𝑚𝑚_𝑓(𝑋1 𝑛) 𝑚𝑚𝑚_𝑏(𝑋1 𝑛−1) 𝑚𝑚𝑚_𝑏(𝑋1 𝑛) 𝑚𝑚𝑚_𝑏(𝑋2 𝑛) = 𝑁2 2(𝑋2) CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 31 4. Initialize the forward and backward messages. 5. Make approximations for the messages of the last function block in the end of the model. Move to the backward function block and obtain the corresponding messages block by block until finish the message approximation for the second function block of the model (block 𝑓2). 6. Calculate the belief of 𝑋𝑛: 𝐵𝑋𝑛(𝑋𝑛) = 𝑚𝑚𝑚_𝑓(𝑋𝑛) ∗ 𝑚𝑚𝑚_𝑏(𝑋𝑛) (5.18) Note that the algorithm can be easily extended to dual polarization system. CHAPTER 6. DATA ANALYSIS AND COMPARISON 32 Chapter 6 Data analysis and comparison 6.1 Introduction to ML, SBP, BP method In the following sections of this chapter, the results from the EP implementation will be analyzed together with some other methods such as ML (Maximum Likelihood), SBP (Stochastic Back Propagation) [11] and BP (Back Propagation) [11]. This section will give a short introduction of these technologies. Moreover, some simulation parameters will be also mentioned in the end. 6.1.1 ML, SBP and BP ML: ML (Maximum Likelihood) is a common used method for the receiver design. It simply compares the ‘distance’ between the observation 𝒓𝑵 and every possible original signal 𝐬, finds the smallest ‘distance’ and makes decision: 𝒔� = 𝑎𝑎𝑎𝑚𝑚𝑚 ‖𝒓𝑵 − 𝒔‖2, 𝒔 ∈ 𝛺2 (6.1) SBP: SBP (Stochastic Back Propagation) [11] is a new method developed for solving the problem in optical communication system. It is based on the factor graphs and particle technology which draws a number of samples (particles) for each observation in the end. Then all the particles are propagated backward to the beginning of the model and try to compensate for the distortions when passing through the system model. Finally, Monte Carlo integration is applied. And the result is compared with all possible original signals in order to make a decision. BP: BP (Back Propagation) is similar to SBP, the main difference is that BP does not consider the AWGN in the model and does not use particles while SBP dose. 𝒔� = 𝑎𝑎𝑎𝑚𝑚𝑚 �𝒓𝑵 − 𝒔 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ 𝑁 ∗ ||𝒔||2)�2, 𝒔 ∈ 𝛺2 (6.2) 6.1.2 Simulation parameters During the simulation, the system has: 16-QAM constellation, 22 spans, 𝑁0 = 4.89 ∗ 10−7 𝑊/𝐻𝐻 for AWGN block, 𝛾 = 1.25𝑊−1𝑘𝑘−1 and 𝐿𝑒𝑒𝑒 = 17.36 𝑘𝑘 for the nonlinear block, 𝑁𝑁𝑁 = 100 for importance sampling. CHAPTER 6. DATA ANALYSIS AND COMPARISON 33 6.2 System with linear block The figure below presents the result SER along with the input power 𝑃𝑖𝑖 in dBm for the linear system equipped with EP and ML receiver. Figure 6.1 SER as a function of 𝑃𝑖𝑖 with 16-QAM The figure shows that the two curves coincide with each other. The EP and ML receiver has the same performance with each other. This is reasonable, because in EP process, the observation 𝒓𝑵 is just the mean matrix of the first message. According to the message approximation method for linear block in section 5.1, the mean of the message won’t change while passing through the factor graph while the covariance of the new message is the covariance of the old message plus 𝑵𝟎 (covariance of AWGN). Hence, the final message used to make the decision has the mean matrix which is equal to 𝒓𝑵 and covariance matrix which is equal to the sum of 𝑵𝟎 (covariance of AWGN). Therefore, the decision from EP is expected to be the same with the result from ML receiver which measures the ‘distance’ between the observation 𝒓𝑵 and all possible original signals. One more comment in this case can be given is that the process of EP is more efficient than SBP. The reason is that in SBP a number of samples should be generated and transmitted back through the model. For each linear block, every sample should be added an AWGN in order to compensate for the distortion caused by the linear block. And finally Monte Carlo integration -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 10 -4 10 -3 10 -2 10 -1 10 0 S E R Pin (dBm) EP ML CHAPTER 6. DATA ANALYSIS AND COMPARISON 34 has to be used. While for EP, only the mean and covariance are transmitted and changed in the factor graph. 6.3 System with linear and nonlinear block This section will firstly show how the message is changed when passing through the factor graph. And the influence of the number of samples in Monte Carlo in nonlinear block will be also discussed. The comparison between EP and other methods in SER is given at last. 6.3.1 Message changing through the factor graph It has already been mentioned that the message passing though the factor graph is approximated into Gaussian distributions. The figures below displays how the message (Gaussian distribution) is changed from the end to the beginning of the factor graph in one polarized system. (1) (2) (4) (3) -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 x y -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 x y -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 x y -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 x y CHAPTER 6. DATA ANALYSIS AND COMPARISON 35 (5) (6) Figure 6.2 Message changing from the end to the beginning of the model The Observation and original signal in the case of figure 6.2 are: 𝒓𝑵 = [0.0906,−0.0093], 𝒔 = [−0.0634, 0.0634] From the figure, one can see that the approximated Gaussian distribution finally reach the original signal step by step. Actually, a correct decision can be just made from the fifth subplot of figure 6.2 for the reason that the value of the Gaussian distribution for the original signal is already the maximum one among the 16-QAM constellation. Note that the whole process has 44 steps for 22 spans. Figure 6.2 only shows 6 steps to roughly present the way EP works. 6.3.2 Effect of the number of samples in Monte Carlo It has been already discussed that the Monte Carlo method has to be used to make the approximation for the messages for each nonlinear block. Hence, the number of samples drew by using Monte Carlo is expected to affect the final result. Figure 6.3 shows the relationship between the SER and the number of samples for the two polarized system with a fixed input power in 4 dBm. The figure shows that the SER goes down dramatically at first and doesn’t decrease much after the number of sample reaches certain value. Because the number of sample in Monte Carlo will obviously influence the speed of the process, a ‘cost-effective’ number of samples should be chosen when considering the efficiency of the process. Figure 6.3 displays that a number around 300 could be a reasonable one for the system. In this thesis, consider the condition of the simulation equipment and comparison, the number of 100 is chosen -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 x y -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 x y CHAPTER 6. DATA ANALYSIS AND COMPARISON 36 Figure 6.3 SER along with the number of samples for dual polarization 6.3.3 Comparison with other methods In this section, a comparison has been made between EP and other methods aiming to analyze the different performance of these methods. Figure 6.4 and 6.5 compare the SER along with the input power between ML, BP, SBP and EP method for single polarization and dual polarization respectively. 100 particles (samples) are used in SBP and EP in both single and dual polarization. From these two figures, following comment could be obtained: 1. All the methods except ML have the lowest SER in the middle area along the X axis and the performance roughly get worse while the input power decrease and increase from the lowest SER point. 2. All the methods have better performance in single polarization system. 3. EP has a greater degradation in performance than SBP and BP dose when the system extends from single polarization to dual polarization. 100 200 300 400 500 600 700 800 900 1000 10 -3 10 -2 S E R Num of samples Two polarization CHAPTER 6. DATA ANALYSIS AND COMPARISON 37 Figure 6.4 SER for single polarization with 16-QAM Figure 6.5 SER for dual polarization with 16-QAM -10 -5 0 5 10 15 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 S E R Pin (dBm) ML BP SBP EP -10 -5 0 5 10 15 10 -4 10 -3 10 -2 10 -1 10 0 S E R Pin (dBm) ML BP SBP EP CHAPTER 6. DATA ANALYSIS AND COMPARISON 38 4. In both single and dual polarization system, SBP and ML have the best and worst results respectively. And BP and ML are comparable with each other. 5. The curve of BP method has oscillations in both single and dual polarization. Firstly, because the ML method just neglects the nonlinear effect, so it is not strange that it has the worst performance while the input power grows. All of the rest three methods have taken nonlinear into consideration. AWGN and nonlinear effect are the main distortions for which BP, SBP and EP try to compensate. When the input power is pretty small the AWGN contribute the most to the distortion in the system. When the input power is getting bigger, the effect of the AWGN becomes smaller and the SER goes down until it reaches the lowest point. As the input power continues increasing, the nonlinear effect dominates the distortion and will increase the value of SER. That is why the SER curve has ‘V’ shape in the figure. In dual polarization the rotation caused by the nonlinear block is based on both two signals transmitted simultaneously in the system (formula 2.5). Hence, the nonlinear effect in dual polarization is stronger than that in single polarization and the obtained worse results. Moreover, it can be found that the performance of EP obviously has a bigger degradation than that of SBP and BP has from the single to dual polarization. The possible reason for this phenomenon is that: In the dual polarization system, the nonlinear rotation effect is greater than that in the single polarization case. Hence, the Gaussian distribution chosen to approximated the message in the system may be no longer a good approximation term. And it can be also expected that the performance of EP with Gaussian distribution as the approximation term will have a great drop when the system extends to triple or quadruple polarization system. It can be seen from the figure that SBP give out the best performance in SER. The possible reason is that: Compared with BP, SBP has taken care of AWGN while BP doesn’t. In the nonlinear system model each AWGN block is followed by a nonlinear block. When the input power is high, the interaction between the AWGN and the nonlinear rotation also get bigger, which is supposed to be the reason for the poorer performance for BP in high input power compared with SBP and EP. Moreover, because of the natural uncertainty of AWGN, the uncertainty of the result SER for BP is also expected. That can explain the oscillation happened in the BP curve in the figure above. Compared with BP, both SBP and EP have taken AWGN and nonlinear into consideration. Nevertheless, SBP uses particles and makes a stochastic process in the model while EP makes approximations block by block, in other words, EP process continues losing the accuracy while it makes approximations for the blocks in the model. That is why EP obtains the SER curve which has the same shape with SBP but a worse SER. CHAPTER 6. DATA ANALYSIS AND COMPARISON 39 One more thing should be mentioned is that the strategy of EP is making a balance between the accuracy of the result and the efficiency of the process. In the linear system, EP successfully makes a more efficient algorithm than SBP does. However, because of the use of Monte Carlo method for the nonlinear block, the performance of EP in the nonlinear system is more or less the same with SBP when it comes to the efficiency. CHAPTER 7. CONCLUSION AND FUTURE WORK 40 Chapter 7 Conclusion and future work 7.1 Conclusion The thesis works on factor graphs and expectation propagation (EP) and try to implement these technologies into the coherent optical communication system aiming to deal with the linear and nonlinear distortion existed in the system. The results indicate that factor graphs and EP works well on both linear (AWGN) and nonlinear system. Moreover, Gaussian distribution has also been proved a reasonable approximation term for EP to approximate the messages in the system. Comparison has also been made between EP and other existing methods. Generally, the performance of EP is as expected. 7.2 Future work According to the discussions in the thesis, the EP algorithm and the message computation method could be improved to make a more efficient algorithm. Moreover, besides the linear and nonlinear effect, other kinds of distortion and components in the system could also be take into consider. 7.2.1 Expectation propagation As discussed in chapter 6, the Monte Carlo method used in the message approximation for nonlinear block makes EP algorithm less efficient. Hence, more work could be taken on how to make message approximation process mathematically or use other methods which can be more efficient than Monte Carlo. What’s more, discussion shows that, as an approximation term, the performance of Gaussian distribution degrades a lot in the multiplexed polarization system. Therefore, a better distribution could be chosen from the exponential family for the multiplexed polarization system. CHAPTER 7. CONCLUSION AND FUTURE WORK 41 7.2.2 Other problems Besides the linear and nonlinear distortion discussed in the thesis, other problems could also be taken into consider. Polarization dependent loss (PDL): PDL is also a significant distortion which influences the performance of the system. PDL can be calculated as the ratio of the maximum and minimum power amplitude to all the states of the polarization. In the optical system model, PDL exists in each span of fiber and the PDL of the system cannot be obtained by just adding all the PDL components together but should be considered separately [11]. More noise: This thesis only takes care of AWGN and nonlinear phase noise. In the future work, more noise could be taken into consider for the system, such as photoelectron noise, thermal noise and photon noise. Moreover, the unitary channel [11] and the corresponding equalizer are also suggested to be added into the system model. REFERENCES 42 References [1] S. Tan, H. Wymeersch, P. Johannisson, E. Agrell, P. Andrekson, and M. Karlsson, “An ML-based detector for optical communication in the presence of nonlinear phase noise,” International Conference on Communications, 2010. [2] Nan Jiang, Yan Gong, Johnny Karout, Henk Wymeersch, Pontus Johannisson, Magnus Karlsson, Erik Agrell, Peter A. Andrekson, “Stochastic Backpropagation for Coherent Optical Communications,” 37th European Conference and Exhibition on Optical Communication (ECOC), 2011. [3] H. Wymeersch, Iterative Receiver Design. Cambridge University Press, 2007, pp. 36-75. [4] Scott L. Miller, Donald G. Childers, Probability and Random Processes with Applications to Signal Processing and Communications. Elsevier Academic Press, 2004, pp. 148-232 [5] Peter Ahrendt, The Multivariate Gaussian Probability Distribution. IMM, Technical University of Denmark, 2005. [6] Thomas P Minka, “A family of algorithms for approximate Bayesian inference,” Doctor’s thesis, 2001. [7] Thomas P Minka, “Expectation Propagation for Approximate Bayesian Inference,” Statistics Dept. Carnegie Mellon University, 2002. [8] Tom Heskes, Onno Zoeter, “Extended Version of “Expectation propagation for approximate inference in dynamic Bayesian networks”,” SNN, University of Nijmegen, The Netherlands, 2003. [9] David J.C. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2005, pp. 357–382. [10] Eric C. Anderson, “Monte Carlo Methods and Importance Sampling,” Lecture Notes for Stat 578C, Statistical Genetics, 1999 [11] Yan gong, Nan jiang, “Stochastic Backpropagation for Coherent Optical Communication,” Master’s thesis in optical communication systems, 2011 [12] Shuang Wang, “Expectation Propagation Algorithm,” School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, 2011. Master of Science Thesis in the Master Degree Programme, Communication Engineering SHENZHOU HOU Abstract Acknowledgements Contents Chapter 1 Introduction 1.1 Background 1.1.1 Coherent optical communication system 1.1.2 Factor graphs and Expectation propagation 1.2 Purpose 1.3 Structure of the thesis Chapter 2 Coherent optical communication system 2.1 Introduction to the coherent optical communication model 2.2 SPM, AWGN and Polarization 2.3 The MAP Receiver Chapter 3 Factor graphs 3.1 Introduction to Factor graphs 3.1.1 Marginals, Function factorizations, Factor graphs 3.1.2 Sum-product algorithm in Factor graphs 3.2 Factor graphs for the optical communication system 3.2.1 Factorization and Graphs 3.2.2 Sum-product algorithm 3.2.3 Challenges Chapter 4 Expectation Propagation 4.1 Introduction to Expectation Propagation 4.1.1 Why Expectation Propagation 4.1.2 Exponential family and Kullback-Leibler divergence 4.1.3 Expectation Propagation 4.1.4 Potential problem 4.2 An example of Expectation Propagation: clutter problem 4.2.1 The Problem 4.2.2 Factor Graph 4.2.3 Processing Expectation Propagation 4.3 Expectation Propagation for the optical communication system 4.3.1 Expectation Propagation for message computing 4.3.2 Message computing for chain-like model with Expectation Propagation Chapter 5 Implementation of Expectation Propagation 5.1 Optical communication model with linear block 5.1.1 System setup 5.1.2 Message approximation for linear block 5.2 Optical communication model with linear and nonlinear block 5.2.1 System setup 5.2.2 Message approximation for nonlinear system with single polarization 5.2.3 Message approximation for nonlinear system with dual polarization 5.3 Simplification on EP algorithm Chapter 6 Data analysis and comparison 6.1 Introduction to ML, SBP, BP method 6.1.1 ML, SBP and BP 6.1.2 Simulation parameters 6.2 System with linear block 6.3 System with linear and nonlinear block 6.3.1 Message changing through the factor graph 6.3.2 Effect of the number of samples in Monte Carlo 6.3.3 Comparison with other methods Chapter 7 Conclusion and future work 7.1 Conclusion 7.2 Future work 7.2.1 Expectation propagation 7.2.2 Other problems References