Expectation Propagation-based Receiver for 
Coherent Optical Communication 

Master of Science Thesis in the Master Degree Programme,      

Communication Engineering 

SHENZHOU HOU 

Department of Signals and Systems 
Division of Communication Systems and Information Theory 
CHALMERS UNIVERSITY OF TECHNOLOGY 
Göteborg, Sweden, 2011 
Master’s Thesis 2011:1  

T 

SPM 

Loss 

a 

 
SPM 

Loss 
 

N2 

R 

X1 R1 X2 R2

N1 

  
Abstract 

This thesis presents a novel maximum a posteriori receiver based on factor graphs and 
expectation propagation (EP) technologies for the coherent optical communication system. 
The receiver deals with the linear (AWGN) and nonlinear distortions in the desired system. 
Comparison has also been made on SER between the EP-based receiver and other methods. 
The discussions and analysis are given for the result in the end. 
 
 
Keywords: optical, factor graphs, expectation propagation, linear, nonlinear 
  

Acknowledgements 

I would like to thank all those people who always offer me kind help and make my two years’ 
master study an exciting and inspiring journey. 
 
First of all, I want to thank my supervisor and examiner Henk Wymeersch. You always give 
me the confidence to conquer the unexpected problems during my master thesis. Without your 
qualified suggestion and powerful support, I cannot imagine that I could on day finish the 
thesis.  
 
Nan and Yan, thanks a lot for your kind assistance on my thesis work. I have learned a lot 
from the discussion with you. There is no doubt that we have experienced a wonderful 
cooperation in the past few months. Meanwhile, I should also like to thank my classmates 
who gave me good advices during these two years in Chalmers. 
 
Thanks should also be given to Lu. Thanks for your unconditional support. Without you, lots 
of fun will be undoubtedly missing. I will also thank all the members of Jiangyou party. I will 
never forget our precious experience for the rest of my life and hope to see your guys again.   
 
Finally, I will give my deeply appreciation to my parents. It is my big fortune to be your child. 
This moment, words are no longer enough to express how I love you. Thank you very much 
for everything in these passed years. No matter where I am, my heart will be always with you.  
 
 
Shenzhou Hou, Göteborg, Sweden, Sept 19, 2011 

  
   CONTENTS 

i 

 
Contents 
 

1  Introduction ....................................................................................................................... 1 

1.1 Background .................................................................................................................... 1 

1.1.1 Coherent optical communication system .............................................................. 1 

1.1.2 Factor graphs and Expectation propagation ......................................................... 1 

1.2 Purpose .......................................................................................................................... 1 

1.3 Structure of the thesis .................................................................................................... 2 

 
2  Coherent optical communication system ......................................................................... 3 

2.1 Introduction to the coherent optical communication model .......................................... 3 

2.2 SPM, AWGN and Polarization ...................................................................................... 3 

2.3 The MAP Receiver ........................................................................................................ 4 

 
3  Factor graphs ..................................................................................................................... 6 

3.1 Introduction to Factor graphs ........................................................................................ 6 

3.1.1 Marginals, Function factorizations, Factor graphs ............................................... 6 

3.1.2 Sum-product algorithm in Factor graphs .............................................................. 8 

3.2 Factor graphs for the optical communication system .................................................. 10 

3.2.1 Factorization and Graphs .................................................................................... 10 

3.2.2 Sum-product algorithm ....................................................................................... 11 

3.2.3 Challenges .......................................................................................................... 11 

 
4  Expectation Propagation................................................................................................. 13 

4.1 Introduction to Expectation Propagation ..................................................................... 13 

4.1.1 Why Expectation Propagation ............................................................................ 13 

4.1.2 Exponential family and Kullback-Leibler divergence ........................................ 13 

4.1.3 Expectation Propagation ..................................................................................... 15 

4.1.4 Potential problem ................................................................................................ 17 

4.2 An example of Expectation Propagation: clutter problem .......................................... 17 


   CONTENTS 

ii 

4.2.1 The Problem ....................................................................................................... 17 

4.2.2 Factor Graph ....................................................................................................... 18 

4.2.3 Processing Expectation Propagation .................................................................. 18 

4.3 Expectation Propagation for the optical communication system ................................. 20 

4.3.1 Expectation Propagation for message computing .............................................. 20 

4.3.2 Message computing for chain-like model with Expectation Propagation .......... 22 

 
5  Implementation of Expectation Propagation ................................................................ 25 

5.1 Optical communication model with linear block ........................................................ 25 

5.1.1 System setup ....................................................................................................... 25 

5.1.2 Message approximation for linear block ............................................................ 26 

5.2 Optical communication model with linear and nonlinear block .................................. 27 

5.2.1 System setup ....................................................................................................... 28 

5.2.2 Message approximation for nonlinear system with single polarization ............. 28 

5.2.3 Message approximation for nonlinear system with dual polarization ................ 29 

5.3 Simplification on EP algorithm ................................................................................... 30 

 
6  Data analysis and comparison ........................................................................................ 32 

6.1 Introduction to ML, SBP, BP method ......................................................................... 32 

6.1.1 ML, SBP and BP ................................................................................................ 32 

6.1.2 Simulation parameters ........................................................................................ 32 

6.2 System with linear block ............................................................................................. 33 

6.3 System with linear and nonlinear block ...................................................................... 34 

6.3.1 Message changing through the factor graph ....................................................... 34 

6.3.2 Effect of the number of samples in Monte Carlo ............................................... 35 

6.3.3 Comparison with other methods ......................................................................... 36 

 
7  Conclusion and future work ........................................................................................... 40 

7.1 Conclusion ................................................................................................................... 40 

7.2 Future work ................................................................................................................. 40 

7.2.1 Expectation propagation ..................................................................................... 40 

7.2.2 Other problems ................................................................................................... 41 

 
References .............................................................................................................................. 42 

 
 CHAPTER 1. INTRODUCTION  

1 

Chapter 1                       
Introduction 

1.1 Background 

This section will give a brief introduction to coherent optical communication system and also 
the corresponding techniques used in the thesis. 
 

1.1.1 Coherent optical communication system 

In coherent optical communication system, the multi-level quadrature amplitude modulation 
(M-QAM) is employed because of its capability to provide a high data rate and spectral 
efficiency compared to traditional binary optical communication system. M-QAM often needs 
a high signal power to hold an acceptable signal-error-rate (SER) due to its more densely 
packed constellation. Nevertheless, the optical system gives a limitation on the performance 
while the input power increase, which is mostly due to the Kerr nonlinearity effect. The 
impairments from the Kerr effect contain deterministic and stochastic aspect, such as 
intrachannel four-wave mixing and the self-phase modulation (SPM) respectively [1, 2]. 
 
Additive white Gaussian noise (AWGN) which is often used to simulate the background noise 
of the channel can be also applied to the coherent optical communication system in the thesis. 
 

1.1.2 Factor graphs and Expectation propagation 

Factor graphs and expectation propagation are two fundamental techniques employed in this 
thesis. Factor graphs are a kind of graph models which can be used to transform the factorized 
functions into the form of graph models. The advantage of factor graphs is that it offers an 
efficient way to calculate the marginal distributions of the function [3]. 
 
Expectation Propagation (EP) is an algorithm which can be used together with factor graphs.  
EP can iteratively make approximations for the desired distributions.  
 

1.2 Purpose 

The purpose of this thesis is to design and develop a novel receiver which is based on factor 
graphs and expectation propagation for the coherent optical communication system with 


 CHAPTER 1. INTRODUCTION  

2 

16-QAM constellation. The receiver aims to solve the linear (AWGN) and nonlinear (SPM) 
distortions happened in the communication system. The simulations will be made in 
MATLAB and the results will be compared with other existing methods. Comments will be 
also provided in the end. 
 

1.3 Structure of the thesis 

This thesis is organized in a problem-solving structure. In chapter 2, details of the system and 
the corresponding problem will be introduced. A MAP receiver is also suggested in the end of 
chapter 2. Chapter 3 will explain the definition and details of factor graphs and show how to 
use factor graphs to build the MAP receiver mentioned in chapter 2. Moreover, the problem 
happened while using factor graphs will be also discussed in the end of chapter 3. In chapter 4, 
expectation propagation (EP) will be introduced to solve the problem in chapter 3. In the end 
of chapter 4, a general EP algorithm will be given for the optical communication system. 
Chapter 5 explains the details on how to implement the EP algorithm for linear and nonlinear 
system and a simplified EP algorithm will be given in the end of chapter 5. In chapter 6, 
comparison is made between the EP and other existing methods. The analysis on the results 
will be also provided. The last Chapter 7 talks about the conclusion of the thesis and the 
suggested work which could be carried out in the future. 
 
 
 CHAPTER 2. COHERENT OPTICAL COMMUNICATION SYSTEM 

3 

Chapter 2                       
Coherent optical communication system 

2.1 Introduction to the coherent optical communication model 

As it’s mentioned in the last chapter, because of the special property of the coherent optical 
communication system, the signal passing through the system will be distorted by a kind of 
nonlinear interference called self-phase modulation (SPM) [1] which is due to Kerr effect. 
And AWGN is also existed as the background noise of the system. 
 
When considering long haul transmission in the optical fiber system, the system can be 
divided into a number of blocks called Spans. Each span contains a single mode fiber (SMF), 
a dispersion-compensating fiber, and an amplifier.  
 
The thesis will focus on the interferences coming from the SPM and AWGN, which are also 
considered to represent the nonlinear and linear noise respectively in the system model. The 
figure below gives an example for a system with two spans. 
 

Figure 2.1 System model for optical communication system with two spans 

 
2.2 SPM, AWGN and Polarization 

SPM: In figure 2.1, the function for SPM block is: 
 

𝑆𝑆𝑆(𝑅𝑖−1) = 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑅𝑖−1||2)                (2.1) 
 

Here 𝛾 is the nonlinearity parameter and 𝐿𝑒𝑒𝑒 is the effective lengthen factor of the single 
mode fiber (SMF) for each fiber. From the system model the signal after the SPM can be 

T 

SPM 

Loss 

a 

Fiber Amplifier 

 
SPM 

Loss 
 

Fiber 

 
Amplifier 

N2 

R 

X1 R1 X2 R2

N1 

  
 CHAPTER 2. COHERENT OPTICAL COMMUNICATION SYSTEM 

4 

obtained: 
 𝑋𝑖 = 𝑅𝑖−1 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑅𝑖−1||2)                 (2.2) 

 
AWGN: In the system, AWGN is considered to be a two dimensional Gaussian noise [4]: 
 

𝑁 ~ 𝑁(𝑋;𝑀,𝛴)                           (2.3) 
 
Therefore, the relationship between the input and the output of a span is: 
 

𝑅𝑖 = 𝑅𝑖−1 ∗ 𝑒𝑒 𝑝 �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑅𝑖−1|�2� + N𝑖              (2.4) 

 
Polarization: The optical fiber communication discussed in the thesis is polarization 
multiplexed, which means that more than one set of signals (more than one ray of light) can 
be transmitted simultaneously in the system. Under this condition, the signal will be affected 
by all the signals transmitted at the same time in SPM block. Take dual polarization for an 
instance, the formula is as follows: 
 

  𝑋1𝑖 = 𝑅1𝑖−1 ∗ 𝑒𝑒𝑒 ( 𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ � �|𝑅1𝑖−1|�2 + �|𝑅2𝑖−1|�2 � )        (2.5) 

  𝑋2𝑖 = 𝑅2𝑖−1 ∗ 𝑒𝑒𝑒 ( 𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ � �|𝑅1𝑖−1|�2 + �|𝑅2𝑖−1|�2 � )        (2.6) 

 
Here, 𝑋1𝑖 and 𝑋2𝑖 denote two simultaneously transmitted signals. 
 

2.3 The MAP Receiver 

According to the system model, it can be expected that, when passing through the optical 
system, the signals will be iteratively rotated and added Gaussian noise. The picture below 
gives an example of a distorted 16-QAM constellation after passing through the system with 
17 spans and 5dbm average input power. 


 CHAPTER 2. COHERENT OPTICAL COMMUNICATION SYSTEM 

5 

 
Figure 2.2 Distorted constellation of 16-QAM with 17 spans and average input power 5 dbm 

 
To solve this problem and recover the distorted signals in the end, the thesis tries to design a 
MAP receiver: 
 

𝑎�(𝑅𝑁) = arg𝑀𝑀𝑀�𝑃(𝑎|𝑅𝑁)� ;  𝑎 ∈ Θ                  (2.7) 
 
Here Θ is the constellation of the input signal, 𝑅𝑁 is the observation and 𝑎� is the decision 
made based on 𝑅𝑁. 
 
Hence, the computation of the posterior 𝑃(𝑎|𝑅𝑁) turns out to be the main task. The thesis 
introduces two technologies --- factor graphs and expectation propagation to deal with this 
problem in next chapters.
 
  
-0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1
-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1


  CHAPTER 3. FACTOR GRAPHS 

6 

Chapter 3                        
Factor graphs 

3.1 Introduction to Factor graphs 

Factor graphs are a particular kind of graph model which can be used to represent the 
factorized functions in the form of graphs. Moreover, factor graphs enable an efficient way to 
compute the marginal distributions of the function with the algorithm called --- sum-product 
algorithm [3].  
 
This section will introduce the basic theory of the factor graphs and sum-product algorithm 
with examples. 
 

3.1.1 Marginals, Function factorizations, Factor graphs 

Marginal: For function: 𝑓(𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛) , the marginal of variable 𝑥𝑛  can be 
calculated by summing out all other variables: 

                      
                                                        𝐵𝑥𝑛(𝑋𝑛) = � 𝑓(𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛)
{\𝑋𝑛}

;                                 (3.1) 

 
Here 𝐵𝑥𝑛(𝑋𝑛) is the marginal of 𝑋𝑛 and {\𝑋𝑛} means the set of all possible values of all 
variables expect 𝑋𝑛.  
 
Function factorization: Assume the function 𝑓(𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛) can be factorized as a 
product of 𝐾 sub-functions (factors), therefore: 
 

                                                       𝑓(𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛) = �𝑓𝑘(𝑉𝑘)
𝐾

𝐾=1

;                                   (3.2) 

 
Here 𝑉𝑘  is the set of variables and 𝑉𝑘 ⊆ {𝑋1,𝑋2 , , ,𝑋𝑛−1,𝑋𝑛} . One thing should be 
mentioned is each function may have more than one factorization form. 
 
Factor graphs: Factor graphs provide a one-to-one mapping from a factorized function. In 
factor graphs, neighborhood is an important concept which indicates that with a factorized 
function: 
 
 The neighborhood of the factor 𝑓𝑘: 𝑁(𝑓𝑘), means a set of variables which appear in 𝑓𝑘. 


  CHAPTER 3. FACTOR GRAPHS 

7 

 The neighborhood of the variable 𝑋𝑛: 𝑁(𝑋𝑛), means a set of factors which include 𝑋𝑛. 
 
For an instance, a function can be factorized as: 
 

          𝑓(𝑋1,𝑋2,𝑋3,𝑋4,𝑋5) =  𝑓1(𝑋1,𝑋2 ) ∗ 𝑓2(𝑋2,𝑋3) ∗  𝑓3�𝑋1,𝑋4, 𝑋5�          (3.3) 
 
In this case, the neighborhood of the variable 𝑋1: 𝑁(𝑋1) = {𝑓1, 𝑓3}, while the neighborhood 
of the factor 𝑓2:𝑁(𝑓2) = {𝑋2,𝑋3}. 
 
When mapping a given factorized function such as formula (3.2) into the corresponding factor 
graph, the process can be carried out as follows: 
 
 Draw a vertex for each variable 𝑋𝑛, which is called variable node. 
 Draw a vertex for each function 𝑓𝑘, which is called function node. 
 Draw an edge between the function node 𝑓𝑘 and every variable node 𝑋𝑛 ⊆ 𝑁(𝑓𝑘). 
 
The following example will show the process of mapping the factorized function into factor 
graph: 
 
Assume the function can be factorized into 4 factors: 
 

𝑓(𝑋1,𝑋2,𝑋3,𝑋4,𝑋5,𝑋6) =  𝑓1(𝑋1,𝑋2,𝑋3) ∗ 𝑓2(𝑋1,𝑋4) ∗ 𝑓3(𝑋1,𝑋5,𝑋6) ∗ 𝑓4(𝑋2); 
 (3.4) 

 
Then the corresponding factor graph is: 
 
 
Figure 3.1 the factor graph of the function 3.4 

𝑋2 

𝑋4 𝑓2 

𝑋1 

𝑋5 𝑋6 

𝑋3 

𝑓3 

𝑓1 

𝑓4 


  CHAPTER 3. FACTOR GRAPHS 

8 

 
The factor graph obviously shows that the neighborhood of 𝑋1: 𝑁(𝑋1) = {𝑓1,  𝑓2, 𝑓3} and the 
neighborhood of 𝑓3:𝑁(𝑓3) = {𝑋1,𝑋5,𝑋6}. 
 

3.1.2 Sum-product algorithm in Factor graphs 

The sum-product algorithm introduces a concept named ‘Messages’ which are passing over 
the edges between the variable nodes and function nodes in factor graphs and can be used to 
compute the marginals of the variables. In factor graphs the messages are functions of the 
related variable. 
 
 The message from the function node 𝑓𝑘 to the variable node 𝑋𝑛 ⊆ 𝑁(𝑓𝑘) is denoted as 

𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛); 
 The message from the variable node 𝑋𝑛 to the function node 𝑓𝑘 ⊆ 𝑁(𝑋𝑛) is denoted as 

𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛); 
 
The messages can be calculated as follows: 
 

𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) = � 𝑓𝑘
\{𝑋𝑛}

�{𝑋𝑚 = 𝑥𝑚}𝑋𝑚⊆𝑁(𝑓𝑘)� ∗ � 𝑀𝑋𝑚→𝑓𝑘(𝑋𝑚)
𝑋𝑚⊆𝑁(𝑓𝑘)\{𝑋𝑛}

 
 (3.5) 
 

𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛) = � 𝑀𝑓𝑟→𝑋𝑛(𝑋𝑛)
𝑓𝑟⊆𝑁(𝑋𝑛)\{𝑓𝑘}

 
 (3.6) 
 
 
                    𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛)                               𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) 
 
      𝑀𝑓1→𝑋𝑛(𝑋𝑛)       𝑀𝑓2→𝑋𝑛(𝑋𝑛)               𝑀𝑋1→𝑓𝑘(𝑋𝑛)       𝑀𝑋2→𝑓𝑘(𝑋𝑛) 
 
 
Figure 3.2 Messages passing from variable to function node and function to variable node 
 
In factor graphs the marginal of a particular variable 𝐵𝑥𝑛(𝑋𝑛) can be obtained by the 
multiplication of the in-coming and out-going messages between the variable node 𝑋𝑛 and 
the function node 𝑓𝑘 ⊆ 𝑁(𝑋𝑛): 
 

𝐵𝑥𝑛(𝑋𝑛) = 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) ∗ 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛)                (3.7) 
 

𝑓2 𝑋𝑛 

𝑓𝑘 

𝑓1 𝑋2 𝑋1 𝑓𝑘 

𝑋𝑛 


  CHAPTER 3. FACTOR GRAPHS 

9 

Here 𝐵𝑥𝑛(𝑋𝑛) can also be called the belief of 𝑋𝑛. 
 
Finally, a general sum-product algorithm can be: 
 
 Initialization: 

Each leaf function node 𝑓𝑘 send the message to its neighborhood variable node 𝑋𝑛 
with 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) = 𝑓𝑘(𝑋𝑛). While each leaf variable node 𝑋𝑛 send the message to its 
neighborhood function node 𝑓𝑘 with 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛) = 1. Notice that leaf node only has 
one neighborhood node. 

 
 Compute the message: 

Compute all the messages passing through the graph by using the formula (3.5) and 
(3.6). 

 
 Compute the Marginal: 

Finally, the Marginal of the variable 𝑋𝑛 can be obtained by formula (3.7).  
 
Take figure 3.1 for an example. The algorithm can be processed as follows: 
 
Initialization: 
Assume the message coming from the leaf function nodes: 
 

𝑀𝑓4→𝑋2(𝑋2) = 𝑓4(𝑋2) 
 
Assume the message coming from the leaf variable nodes: 
 

𝑀𝑋3→𝑓1(𝑋3) = 1 
𝑀𝑋4→𝑓2(𝑋4) = 1 
𝑀𝑋5→𝑓3(𝑋5) = 1 
𝑀𝑋6→𝑓3(𝑋6) = 1 

 
Compute all the messages: 
From the figure 3.1, following messages can be obtained: 
 

𝑀𝑋2→𝑓1(𝑋2) = 𝑀𝑓4→𝑋2(𝑋2) 
 

𝑴𝒇𝟏→𝑿𝟏(𝑿𝟏) = � 𝑓1(𝑋1,𝑋2,𝑋3) ∗
\{𝑋1}

𝑀𝑋2→𝑓1(𝑋2) ∗ 𝑀𝑋3→𝑓1(𝑋3) 

 
𝑀𝑓2→𝑋1(𝑋1) = � 𝑓2(𝑋1,𝑋4) ∗
\{𝑋1}

𝑀𝑋4→𝑓2(𝑋4) 

 
𝑀𝑓3→𝑋1(𝑋1) = � 𝑓3(𝑋1,𝑋5,𝑋6) ∗
\{𝑋1}

𝑀𝑋5→𝑓3(𝑋5) ∗ 𝑀𝑋6→𝑓3(𝑋6) 


  CHAPTER 3. FACTOR GRAPHS 

10 

 
𝑴𝑿𝑿→𝒇𝒇(𝑿𝟏) = 𝑀𝑓3→𝑋1(𝑋1) ∗ 𝑀𝑓2→𝑋1(𝑋1) 

 
Compute the marginal: 
Hence, the marginal of variable 𝑋1: 
 

𝐵𝑋1(𝑋1) = 𝑴𝒇𝒇→𝑿𝑿(𝑿𝟏) ∗ 𝑴𝑿𝑿→𝒇𝒇(𝑿𝟏) 
 
 
3.2 Factor graphs for the optical communication system 

This section will apply the factor graphs and sum-product algorithm to the optical 
communication model discussed in the last chapter. And some problems happened in the 
message computation will be mentioned in the end. 
  

3.2.1 Factorization and Graphs 

According to the last section, the factor graphs can be applied to the optical communication 
system described in figure 2.1. The distribution 𝑆(𝑎,𝑋1,𝑅1,𝑋2|𝑅2) can be factorized: 
 

𝑆(𝑎,𝑋1,𝑅1,𝑋2|𝑅2) = Z𝑝 ∗ 𝑃(𝑎) ∗ 𝑃(𝑋1|𝑎) ∗ 𝑃(𝑅1|𝑋1) ∗ 𝑃(𝑋2|𝑅1) ∗ 𝑃(𝑅2|𝑋2) (3.8) 
 
Here Z𝑝 is a constant. The factor graphs can be draw as follows: 
 
 
Figure 3.3 Factor graphs for optical communication system 
 

Here 𝑅2 is the observation and 𝑎 denotes the input signal. The task is to get the marginal of 
variable 𝑎 based on the observation 𝑅2. One thing should be mentioned is formula (3.8) and 
the factor graphs can be easily extended according to the number of spans in the system. 
 

𝑎 𝑃(𝑋1|𝑎) 𝑋1 

𝑅1 

𝑃(𝑅1|𝑋1) 𝑃(𝑎) 
 

𝑃(𝑋2|𝑅1) 𝑋2 𝑃(𝑅2|𝑋2) 𝑅2 


  CHAPTER 3. FACTOR GRAPHS 

11 

3.2.2 Sum-product algorithm 

As discussed in section 3.2, the messages passing over the graph in figure 3.3 can be 
computed. For simplification, only one span is considered this time. Hence the factor graph is: 
 
    𝑓1                         𝑓2                          𝑓3 
           
 
Figure 3.4 Factor graphs for optical communication system with one span 
 

According to formula (3.5) and (3.6), the message computation can be simplified to: 
 
 For variable nodes, the outgoing message is equal to the incoming message. 
 For function nodes, the outgoing message is an integral of the product of the function and 

the incoming message.  
 
For instance, in figure 3.3: 

𝑀𝑓2→𝑋1(𝑋1) = 𝑀𝑋1→𝑓3(𝑋1)                                            (3.9)                     
 

                                                 𝑀𝑓2→𝑎(𝑎) = �𝑓2 (𝑎,𝑋1) ∗ 𝑀𝑋1→𝑓2(𝑋1)𝑑𝑥1                             (3.10) 

 
After all the messages have been obtained, the marginal of 𝑎 can be calculated as 
 

𝑃(𝑎|𝑅1) = 𝑀𝑓1→𝑎(𝑎) ∗ 𝑀𝑎→𝑓1(𝑎)                  (3.11) 
 

3.2.3 Challenges 

As is known in chapter 2, the optical fiber will introduce SPM and AWGN when signals are 
passing through. For factor graph in figure 3.3, the function node f2 and f1 represents the 
nonlinear effect and Gaussian noise separately: 
 

              𝑓2(𝑎,𝑋1) = 𝛿 �𝑋1 − 𝑎 ∗ 𝑒𝑒 𝑝 �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑎|�2��                         (3.12) 

 
                                                   𝑓1(𝑋1,𝑅1) = 𝐶 ∗ 𝑒𝑒 𝑝 �−
1
𝑁0

�|𝑅1 − 𝑋1|�2�                              (3.13) 

 
Hence, with formula (3.10) the message 𝑀𝑓2→𝑎(𝑎) can be computed as: 
 

𝑎 𝑃(𝑋1|𝑎) 𝑋1 𝑅1 𝑃(𝑅1|𝑋1) 𝑃(𝑎) 
 

  CHAPTER 3. FACTOR GRAPHS 

12 

𝑀𝑓2→𝑎(𝑎) = �𝑓2(𝑎,𝑋1) ∗𝑀𝑋1→𝑓2(𝑋1)𝑑𝑥1       

 
                                                              = �𝛿(𝑋1 − 𝑎 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑎||2)) ∗𝑀𝑋1→𝑓2(𝑋1)𝑑𝑥1 

 
                          = 𝑀𝑋1→𝑓2�𝑎 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑎||2)� 

 
What’s more, because of the Gaussian function node f1  the message 𝑀𝑋1→𝑓2(𝑋1) also 
belongs to Gaussian family: 
 

𝑀𝑋1→𝑓2(𝑋1) = 𝐶 ∗ 𝑒𝑒 𝑝 �−
1
𝑁0

�|𝑋1 − 𝑅1|�2� 

 
Here the observation R1 can be seen as a constant. Finally the message 𝑀𝑓2→𝑎(𝑎) is: 
 

𝑀𝑓2→𝑎(𝑎) =  𝐶 ∗ 𝑒𝑒 𝑝 �− 1
𝑁0
��𝑎 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑎||2) − 𝑅1��

2
�    (3.14) 

 
It is obviously that the function of 𝑀𝑓2→𝑎(𝑎) is complicated and one can imagine that after a 
number of spans the messages will be too hard to compute and the final result of the marginal 
will be too complicated to use. That is why Expectation Propagation is introduced in the next 
chapter. 
 
  
  CHAPTER 4. EXPECTATION PROPAGATION 

13 

Chapter 4                   
Expectation Propagation 

4.1 Introduction to Expectation Propagation 

Expectation Propagation (EP) is an algorithm which can iteratively make approximations for 
the desired distributions. This section will discuss the basic theory of EP [6, 12] and give an 
instance to show the general process of EP [7]. 
 

4.1.1 Why Expectation Propagation 

In the end of the last chapter a problem of computing messages has been thrown out. In a 
more general model, assume X is continuous unknown sample which need to be decided, 
𝑨 =  {𝑌1, 𝑌2  , , , , ,  𝑌𝑛−1, 𝑌𝑛} denote observations of the system and all belong to i.i.d..  
 
According to Bayes’ rule: 
 

                                                              𝑝(𝑋|𝑨) ∝ 𝑝(𝑋)�𝑝(𝑌𝑖|𝑋)
𝑖

                                               (4.1) 

                    
Here 𝑝(𝑋|𝑨) is already normalized. Assume 𝑝(𝑋) and 𝑝(𝑌𝑖|𝑋) are prior and likelihood 
function respectively. It can be seen that it will be unpractical to directly multiply a number of 
likelihoods. For instance, assume p(Yi|X) is the mixture of two different distributions: 
p(Yi|X) = G1 + G2, hence the complexity of the calculation will dramatically increase along 
with the growing of the number of observations. Actually, 500 observations need 2500 times 
of multiplication of two distributions. Therefore, the result formula for p(X|A) (normalized) 
is hard to access by directly computation. One reasonable method to deal with this problem is 
to make an approximation ℎ(𝑋) which could be close to the true 𝑝(𝑋|𝑨): 
 

𝑝(𝑋|𝑨) ≈ ℎ(𝑋)                           (4.2) 
 

This is the main strategy of EP for solving this kind of problem as in the last chapter. 
 

4.1.2 Exponential family and Kullback-Leibler divergence 

When making approximations, one may consider which kinds of the distributions can be 
selected to make the approximation and how to do it. These concern two basic concepts of EP: 
Exponential family and Kullback-Leibler divergence. 


  CHAPTER 4. EXPECTATION PROPAGATION 

14 

 
Exponential family: Exponential family is often used to make approximations in EP. The 
exponential family has the form: 
 

                                          𝑟(𝒙|𝜃) = ℎ(𝒙)exp ��𝜂𝑖(𝜃)𝛵𝑖(𝒙)
𝑠

𝑖=1

− 𝛢(𝜃)�                                   (4.3) 

 
Here 𝜃 is called the parameter of the family. Function 𝜂𝑖(𝜃) presents the weight of the 
potential function 𝛵𝑖(𝒙). Function 𝛢(𝜃) is a normalizing function which ensures that the 
value of the integral or summation of 𝑟(𝒙|𝜃) results to be one. 
 
Exponential family is often used because it contains many famous members such as Gaussian, 
Poisson, Bernoulli, etc., which is suitable to be applied to many different kind of models. 
What is more, the use of exponential family can simplify the computation when making the 
approximation. Because of the property of the exponential family, the multiplication or 
division of two distributions from the same exponential family results a distribution belonging 
to the same family with a normalizing factor. An example of the multiplication and division of 
two Gaussian distribution is given below [5]: 
 

𝑁(𝑥;  𝑚1,𝜎1) ∗ 𝑁(𝑥;  𝑚2,𝜎2) = 𝑁(𝑚1;  𝑚2,𝜎1 + 𝜎2) ∗ 𝑁(𝑥;  𝑚3,𝜎3)    (4.4) 
 

𝜎3 = (𝜎1−1 + 𝜎2−1)−1 
             𝑚3 = 𝜎3𝜎1−1𝑚1 + 𝜎3𝜎2−1𝑚2 

 
𝑁(𝑥;  𝑚1,𝜎1)
 𝑁(𝑥;  𝑚2,𝜎2) =

𝜎2
(𝜎2 − 𝜎1)𝑁(𝑚1;  𝑚2,𝜎2 − 𝜎1) ∗ 𝑁

(𝑥;  𝑚3,𝜎3)             (4.5) 

 
𝜎3 = (𝜎1−1 − 𝜎2−1)−1 

             𝑚3 = 𝜎3𝜎1−1𝑚1 − 𝜎3𝜎2−1𝑚2 
 
The formula above also can be extended to two-dimensional Gaussian distribution. 
 
Kullback-Leibler (KL) divergence: When approximating the distribution to a certain 
exponential family, the Kullback-Leibler (KL) divergence is often used. The KL divergence 
can measure the difference between the existing distribution and the desired exponential 
family.  
 
A general form of KL divergence is: 
 

                                         𝐾𝐾�𝑝(𝑋)||𝑝′(𝑋)� = �𝑝(𝑋) ∗ log�
𝑝(𝑋)
𝑝′(𝑋)�𝑑𝑑                           (4.6) 

 
Here 𝑝′(𝑋) is the approximation of 𝑝(𝑋).  


  CHAPTER 4. EXPECTATION PROPAGATION 

15 

 
In order to make the approximation, one should try to minimize the KL divergence:  
  

𝜃′ = 𝑎𝑎𝑎𝜃 𝑚𝑚𝑚 𝐾𝐾�𝑝(𝑋)||𝑝′(𝑋)�                   (4.7) 
 

Moment matching is considered to be an optimal way to minimize the KL divergence: 
 

                                                  �𝑝(𝑋) ∗ 𝑇(𝒙)𝑑𝑑 = �𝑝′(𝑋) ∗ 𝑇(𝒙)𝑑𝑑                                      (4.8) 

 
The formula can also be rewritten as: 
 

𝔼𝑷[𝑇(𝒙)] =  𝔼𝒑′[𝑇(𝒙)]                         (4.9) 
 
Hence, it can be seen that minimizing the KL divergence is equal to mapping the desired 
moments of the existing distribution to a certain exponential family. Take Gaussian 
distribution for an example. The Minimization of the KL divergence is just computing the 
mean and covariance of the existing distribution and using the results to be the mean and 
covariance for the Gaussian distribution. 
 
When talking about KL-divergence, another related concept --- Projection is often used, 
which can denote the process of moment matching: 
  

𝑝′(𝑥) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷[𝑝(𝑥)]                     (4.10) 
 

Here 𝑝′(𝑥) is the approximated distribution with the same moments mapped from 𝑝(𝑥). 
 

4.1.3 Expectation Propagation  

It has been said that EP makes approximations ‘iteratively’. Why ‘iteratively’? 
 
Rewrite formula (4.1): 
 

                                                   𝑝(𝑋|𝑨) ∝ 𝑝(𝑋)�𝑝(𝑌𝑖|𝑋)
𝑖

= �𝑟𝑖
𝑖

(𝑋)                                (4.11) 

 
Here 𝑟0(𝑋) =  𝑝(𝑋), 𝑟1(𝑋) = 𝑝(𝑌1|𝑋),,,, 𝑟𝑖(𝑋) = 𝑝(𝑌𝑖|𝑋). An approximation ℎ(𝑋) needed 
to be made for 𝑝(𝑋|𝑨). As it already known in the last section, a projection can be made to 
get the approximation: 
 

ℎ(𝑋) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷[𝑝(𝑋|𝑨)]                    (4.12) 
 
Here ℎ(𝑋) is the normalized version of 𝐻(𝑋). Because 𝑝(𝑋|𝑨) consists of a bunch of 


  CHAPTER 4. EXPECTATION PROPAGATION 

16 

distributions 𝑟𝑖(𝑋), the moment matching is not easy to carry out directly. EP introduces a 
method to get the final approximation ℎ(𝑋) iteratively. 
 
Assume approximation 𝐻(𝑋) also consists of a bunch of sub-approximations 𝑟̃𝑖(𝑋): 
 

                                                                     𝐻(𝑋) = �𝑟̃𝑖
𝑖

(𝑋)                                                       (4.13) 

 
                                                                 ℎ(𝑋) =
∏ 𝑟̃𝑖𝑖 (𝑋)

∫∏ 𝑟̃𝑖𝑖 (𝑋)𝑑𝑑
                                                    (4.14) 

 
Each 𝑟̃𝑖(𝑋) is an approximated distribution for 𝑟𝑖(𝑋). Hence, after all 𝑟̃𝑖(𝑋) have been 
obtained, the final approximation can be calculated by formula (4.13). 
 
EP algorithm firstly initializes all 𝑟̃𝑖(𝑋) to an exponential family. In each iteration, one can 
choose any of 𝑟̃𝑖(𝑋) to make a new approximation and then refine that 𝑟̃𝑖(𝑋). When 
approximating 𝑟̃𝑖(𝑋), EP makes a projection on the multiplication of the true 𝑟𝑖(𝑋) and the 
term ℎ\𝑖(𝑋) which denotes all sub-approximations except 𝑟̃𝑖(𝑋): 
 

                                                                         ℎ\𝑖(𝑋) ∝
ℎ(𝑋)
 𝑟̃𝑖(𝑋)                                                        (4.15) 

 
EP will continuously process the iteration until all 𝑟̃𝑖(𝑋) converge. 
 
Hence, a typical EP process can be: 
 
1. Initialization of the sub-approximation term 𝑟̃𝑖(𝑋):   

 
2. Calculate the posterior of 𝑥: 

 
ℎ(𝑋) =
∏ 𝑟̃𝑖(𝑋)𝑖

∫∏ 𝑟̃𝑖(𝑋)𝑖 𝑑𝑑
 

3. Do iterations until all  𝑟̃𝑖(𝑋) converge: 

 
 Pick up any 𝑟̃𝑖(𝑋) for updating 
 Divide ℎ(𝑋) by 𝑟̃𝑖(𝑋) and normalize to get the ‘lack’ posterior ℎ\𝑖(𝑋): 
 

 ℎ\𝑖(𝑋) ∝
ℎ(𝑋)
 𝑟̃𝑖(𝑋)  

 
 Make a projection on the Multiplication of ℎ\𝑖(𝑋) and 𝑟𝑖(𝑋), which will produce 

the new mean, covariance and a normalizing factor 𝐾𝑖 for the new ℎ(𝑋): 
 

  CHAPTER 4. EXPECTATION PROPAGATION 

17 

ℎ(𝑋) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷[ℎ\𝑖(𝑋) ∗ 𝑟𝑖(𝑋)]               (4.16) 
 
 Update 𝑟̃𝑖(𝑋) with the new ℎ(𝑋): 
 

                                                                𝑟̃𝑖(𝑋) =
𝐾𝑖 ∗ ℎ(𝑋)
ℎ\𝑖(𝑋)                                                      (4.17) 

 
4. Compute the final approximation for 𝑝(𝑋): 

 
                                                   𝑝(𝑋) ≈ ℎ(𝑋) ∝ 𝐻(𝑋) = �𝑟̃𝑖
𝑖

(𝑋)                                    (4.18) 

 
4.1.4 Potential problem 

One problem may happen when making the projection: The moments of the desired 
distribution may not be easy to calculate mathematically. Then other technique such as Monte 
Carlo sampling needed be introduced to assist the approximation. This will be discussed later 
in the thesis. 
 

4.2 An example of Expectation Propagation: clutter problem 

This section presents the process of Expectation Propagation with factor graphs by solving a 
classic problem – clutter problem [7]. 
 

4.2.1 The Problem 

A bunch of signals are transmitted through an added Gaussian noise channel which can cause 
the clutter problem. Assume that the distribution of the observed signals on the condition of 
the sent signals is a Gaussian distribution mixed with another Gaussian distribution caused by 
the clutter: 
 

𝑝(𝑌|𝑋) = 𝛼 ∗ 𝑁(Y; X,σ1 ∗ 𝐈𝒅 ) + (1 − 𝛼) ∗ 𝑁(𝑌; 0,σ2 ∗ 𝐈𝒅)      (4.19) 
 

𝑁(𝑌;𝑀,𝛴 ) = 2𝜋|𝛴|−
1
2 ∗ 𝑒𝑒𝑒 �− 1

2
(𝑌 −𝑀) 𝛴−1 (𝑌 −𝑀)𝑇�      (4.20) 

 
In formula (4.19) Y and X are observed and sent signal respectively. The first part of the 
formula is useful for later research while the second part presents the influence of the clutter. 
Finally, parameter α  shows the ratio between the first and second part in the finally 
distribution. Moreover, 𝑝(𝑌|𝑋) also can be considered into Gaussian family because it is a 


  CHAPTER 4. EXPECTATION PROPAGATION 

18 

mixture of two Gaussian distributions. 
 
Suppose a set of independent signals 𝐀 =  {Y1, Y2  , , , , ,  Y𝑛−1, Y𝑛} has been observed in the 
end. Hence, the joint distribution of observation A and X is: 
 

                                                    𝑝(𝑨,𝑋) ∝ 𝑝(𝑋)�𝑝(𝑌𝑖|𝑋)
𝑖

= �𝑟𝑖
𝑖

(𝑋)                               (4.21) 

 
Here 𝑟0(𝑋) =  𝑝(𝑋) , 𝑟1(𝑋) = 𝑝(𝑌1|𝑋) ,,,,  𝑟𝑖(𝑋) = 𝑝(𝑌𝑖|𝑋) . Finally, the task becomes to 
compute the probability density function 𝑝(𝑨,𝑋) while 𝑟𝑖(𝑋) is known. 
 

4.2.2 Factor Graph 

According to Chapter 3, formula (4.21) can be represented into factor graph below: 

 
Figure 4.1 Factor Graph of clutter problem 

 
4.2.3 Processing Expectation Propagation 

From the last section, the following formula can be obtained: 
 

                    �𝑟𝑖
𝑖

(𝑋) =  𝑝(𝑋)�𝑝(𝑌𝑖|𝑋)
𝑖

∝ 𝑝(𝑨,𝑋) ≈ ℎ(𝑋) ∝ 𝐻(𝑋) = �𝑟̃𝑖
𝑖

(𝑋)      (4.22) 

 
It can be seen that 𝑟0 = 𝑝(𝑋) = r�0 and 𝑟i(𝑋), 𝑖 = {1,2,3, , ,𝑛 − 1,𝑛} consists of Gaussian 
distributions. Therefore the approximation component 𝑟̃𝑖(𝑋) can also be approximated to be  
Gaussian-like: 

𝑟̃𝑖(𝑋) = 𝐶𝑖 ∗ 𝑒𝑒𝑒 �−
1
2

(𝑋 −𝑀𝑖) 𝛴𝑖−1 (𝑋 −𝑀𝑖)𝑇�         (4.23) 

𝑋𝑛 

𝑝(𝑋) 

𝑝(𝑦2|𝑋) 𝑝(𝑦1|𝑋) 𝑝(𝑦𝑛−1|𝑋) 𝑝(𝑦𝑛|𝑋) … … 


  CHAPTER 4. EXPECTATION PROPAGATION 

19 

 
What is more, because the result of the multiplication of Gaussian distributions is still an 
Gaussian distribution (not normalized). Hence, 𝑝(𝑨,𝑋)(normalized) can be approximated to 
be a Gaussian distribution. The formulas for Gaussian multiplication and division can be 
found in formula (4.4) and (4.5). 
 
According to section 4.2.3, the EP can process as follows: 
 
1. Initialization of the mean and covariance matrix of the approximation term 𝑟̃𝑖(𝑋).  
 

 Assume 𝑀0 = 𝟎,𝛴0 = 10𝐈𝒅,𝐶0 = 2𝜋|𝛴0|−
1
2 for the 𝑟̃0(𝑋) term.  

 Because the rest of the approximation term are unknown at the beginning, the mean 
and covariance matrix are initialized to be 𝑀𝑖 = 𝟎,𝛴𝑖 = 1000𝐈𝒅 → ∞,𝐶𝑖 = 1 to 
make the probability of 𝑟̃𝑖(𝑋) be approximately identical in the data space at first. 

 
2. According to formula (4.14) and (4.23): 
 

ℎ(𝑋) = 𝑁(𝑋;  𝒎𝒙,𝜎𝑥 ∗ 𝑰𝒅),𝒎𝒙 = 𝒎𝟎,𝜎𝑥 = 𝜎0. 
 

3. Doing loop from 𝑟̃1, … , 𝑟̃𝑛 until all 𝑟̃𝑖 converge:  
 

 According to formula (4.15) and (4.5), Dividing ℎ(𝑋)  by 𝑟̃𝑖(𝑋)  and 
normalizing to get the mean and covariance of the ‘lack’ posterior ℎ\𝑖(𝑋):  

 
𝜎𝑥
\𝑖 = (𝜎𝑥−1 − 𝜎𝑖−1)−1 

    𝑚𝑥
\𝑖 = 𝜎𝑥

\𝑖𝜎𝑥−1𝑚𝑥 − 𝜎𝑥
\𝑖𝜎𝑖−1𝑚𝑖  

     
 Multiply ℎ\𝑖(𝑋) and 𝑟𝑖(𝑋) which equals to formula (4.23). Make a projection 

and get the new mean, covariance and a normalizing factor 𝐾𝑖 for the new 
ℎ(𝑋): 
 

ℎ(𝑋) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷[ℎ\𝑖(𝑋) ∗ 𝑟𝑖(𝑋)] 
 

𝐾𝑖 = (1 − 𝛼) ∗ 𝑁(𝑌𝑖; 0,𝜎2 ∗ 𝑰𝒅 ) + 𝛼 ∗ 𝑁(𝑌𝑖;𝑚𝑥
\𝑖 , (𝜎𝑥

\𝑖 + 𝜎1)𝑰𝒅 ) 

 
𝑚𝑥 =  𝑚𝑥
\𝑖 +

𝛼 ∗ 𝜎𝑥
\𝑖 ∗ �𝑌𝑖 − 𝑚𝑥

\𝑖� ∗ 𝑁(𝑌𝑖;𝑚𝑥
\𝑖 , (𝜎𝑥

\𝑖 + 𝜎1)𝑰𝒅 )

𝐾𝑖 ∗ (𝜎𝑥
\𝑖 + 1)

 
  CHAPTER 4. EXPECTATION PROPAGATION 

20 

𝜎𝑥 = 𝜎𝑥
\𝑖 −

𝛼 ∗ (𝜎𝑥
\𝑖)2 ∗ 𝑁(𝑌𝑖;𝑚𝑥

\𝑖 , (𝜎𝑥
\𝑖 + 𝜎1)𝑰𝒅 )

𝐾𝑖 ∗ �𝜎𝑥
\𝑖 + 1�

∗ [1

−
(1 − 𝛼) ∗ 𝑁(𝑌𝑖; 0,𝜎2 ∗ 𝑰𝒅 ) ∗ 𝜎𝑥

\𝑖 ∗ |𝑌𝑖 − 𝑚𝑥
\𝑖|2

𝐾𝑖 ∗ �𝜎𝑥
\𝑖 + 1�

] 

 
 According to formula (4.5) and (4.17), update 𝑟̃𝑖(𝑋) = 𝐾𝑖 ∗  ℎ(𝑋)/ ℎ\𝑖(𝑋): 

 
𝜎𝑖 = �𝜎𝑥−1 − (𝜎𝑥\𝑖)−1�−1 

 𝑚𝑖 = 𝜎𝑖𝜎𝑥−1𝑚𝑥 −  𝜎𝑖(𝜎𝑥\𝑖)−1𝑚𝑥
\𝑖     

𝐶𝑖 =
(2𝜋)−1|𝜎𝑖 ∗ 𝑰𝒅|− 12 ∗ 𝜎𝑥\𝑖

(𝜎𝑥\𝑖 − 𝜎𝑥)𝑁 �𝑚𝑥;  𝑚𝑥
\𝑖 ,𝜎𝑥\𝑖 − 𝜎𝑥�

∗ 𝐾𝑖 

 
 Compute the approximated distribution --- 𝑝(𝑨,𝑋): 

 
𝑝(𝑨,𝑋) ∝�𝑟̃𝑖(𝑋)
𝑖

 
4.3 Expectation Propagation for the optical communication system 

In the last chapter, the factor graphic model and message computation method has been 
introduced to deal with the optical communication model. This section will firstly extend EP 
algorithm to the message computing in factor graphs, then develop an method with EP for the 
message computing in chain-like model such as optical communication system. 
 

4.3.1 Expectation Propagation for message computing 

An example will be given below to show how to deal with the message computing with EP. 


  CHAPTER 4. EXPECTATION PROPAGATION 

21 

 
Figure 4.2 Message computing with EP 
 

According to the sum-product algorithm, the belief of 𝑋𝑛 is: 
 

𝐵𝑥𝑛(𝑋𝑛) = 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) ∗ 𝑀𝑋𝑛→𝑓𝑘(𝑋𝑛) 
 

                                                                  = � 𝑀𝑓𝑟→𝑋𝑛(𝑋𝑛)
𝑓𝑟⊆𝑁(𝑋𝑛)

                                                 (4.24) 

 
                                                  𝑏𝑥𝑛(𝑋𝑛) ∝ � 𝑀𝑓𝑟→𝑋𝑛(𝑋𝑛)                                               (4.25)
𝑓𝑟⊆𝑁(𝑋𝑛)

 
Here 𝑏𝑥𝑛(𝑋𝑛) is normalized from 𝐵𝑥𝑛(𝑋𝑛). Now the task is to compute 𝑏𝑥𝑛(𝑋𝑛). It can be 
seen that 𝑏𝑥𝑛(𝑋𝑛) is the multiplication of a number of distributions, as it’s discussed before, 
make computation or approximation on 𝑏𝑥𝑛(𝑋𝑛) directly is not reasonable. EP algorithm can 
do approximation iteratively.  
 
Assume: 
 

                             � 𝑀�𝑓𝑟→𝑋𝑛(𝑋𝑛) 
𝑓𝑟⊆𝑁(𝑋𝑛)

∝ ℎ(𝑋) ≈ 𝑏𝑥𝑛(𝑋𝑛) ∝ � 𝑀𝑓𝑟→𝑋𝑛(𝑋𝑛) 
𝑓𝑟⊆𝑁(𝑋𝑛)

         (4.26) 

 
Here, 𝑀�𝑓𝑟→𝑋𝑛(𝑋𝑛) is the sub-approximation for each message.  
 
As in EP algorithm, do iteration until all messages converged. In each iteration, choose one 
message to refine by EP algorithm as discussed in section 4.1.3: 
 
For example, move 𝑀�𝑓𝑘→𝑋𝑛(𝑋𝑛) in figure 4.2: 
 

𝑋𝑛 𝑓𝑘 𝑓2 

𝑓3 

𝑓1 

𝑋2 

𝑋3 

𝑋1 


  CHAPTER 4. EXPECTATION PROPAGATION 

22 

                                                               ℎ\𝑓𝑘(𝑋) ∝
ℎ(𝑋)

𝑀�𝑓𝑘→𝑋𝑛(𝑋𝑛)
                                                   (4.27) 

 
                                           ℎ(𝑋) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷�ℎ\𝑓𝑘(𝑋) ∗ 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛)�                               (4.28) 

 
                                                               𝑀�𝑓𝑘→𝑋𝑛(𝑋𝑛) =
𝐾𝑖 ∗ ℎ(𝑋)
ℎ\𝑓𝑘(𝑋)                                                 (4.29) 

 
After all messages converged, then 𝑏𝑥𝑛(𝑋𝑛) can be obtained: 
 

                                                   𝑏𝑥𝑛(𝑋𝑛) ≈ ℎ(𝑋) ∝ � 𝑀�𝑓𝑟→𝑋𝑛(𝑋𝑛) 
𝑓𝑟⊆𝑁(𝑋𝑛)

                                (4.30) 

 
Obviously, the example in figure 4.1 in section 4.2 can also be translated into message 
computing where approximation and computation are made for 𝑏𝑥𝑛(𝑋𝑛). 
 

4.3.2 Message computing for chain-like model with Expectation Propagation 

In last section, one thing should be noticed that, according to the sum-product algorithm, the 
message: 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) in formula (4.28) is equal to: 
 

𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) = � 𝑓𝑘
\{𝑋𝑛}

�{𝑋𝑚 = 𝑥𝑚}𝑋𝑚⊆𝑁(𝑓𝑘)� ∗ � 𝑀𝑋𝑚→𝑓𝑘(𝑋𝑚)
𝑋𝑚⊆𝑁(𝑓𝑘)\{𝑋𝑛}

 
                                   = � 𝑓𝑘
\{𝑋𝑛}

(𝑋1,𝑋2,𝑋3,𝑋𝑛) ∗ 𝑀𝑋1→𝑓𝑘(𝑋1) ∗ 𝑀𝑋2→𝑓𝑘(𝑋2) ∗ 𝑀𝑋3→𝑓𝑘(𝑋3) 

(4.31) 
 
From formula (4.31), it can be seen that in a chain-like model, more messages 
(𝑀𝑋1→𝑓𝑘(𝑋1),𝑀𝑋2→𝑓𝑘(𝑋2),𝑀𝑋3→𝑓𝑘(𝑋3) in this case) need to be computed or approximated 
before the message: 𝑀𝑓𝑘→𝑋𝑛(𝑋𝑛) can be used to make the approximation. Hence, in a 
chain-like model [8] (optical communication) the EP algorithm will be modified from the one 
discussed in beginning of this chapter. 
 
Take the optical communication model with one span for an instance. 

 
Figure 4.3 Factor graph for optical communication model with one span 

𝑃(𝑋1) 
  

𝑋1 𝑃(𝑋3|𝑋2) 
  

𝑃(𝑋2|𝑋1) 
  

𝑋2 𝑋3 

𝑓1 𝑓2 𝑓3 


  CHAPTER 4. EXPECTATION PROPAGATION 

23 

 
Figure 4.3 can be simplified as: 

 
Figure 4.4 Simplified factor graph for optical communication model with one span 

 
In figure 4.4, the direction from 𝑓1 to 𝑓3 is define as ‘forward’ and message passing in this 
direction is written as: 𝑚𝑚𝑚_𝑓(𝑋𝑛). While the opposite direction is ‘backward’ and the 
message is: 𝑚𝑚𝑚_𝑏(𝑋𝑛). 
 
In figure 4.4, the target is to get the belief of 𝑋1: 
 
                                                     𝐵𝑋1(𝑋1) = 𝑚𝑚𝑚_𝑓(𝑋1) ∗ 𝑚𝑚𝑚_𝑏(𝑋1)                                     (4.32) 
 
Take figure 4.4 for an instance, the EP process can be carried out as follows: 
 
1. Initialize the forward and backward messages passing through the factor graph. 

 
2. Do a loop until all the messages converged: 

 
 Make approximations for the messages of the last function block in the end of the 

model. 
 
 Move to the backward function block and obtain the corresponding messages block 

by block until finish the message approximation for the second function block of the 
model (block 𝑓2). 

 
 From block 𝑓2, move to the forward function block and approximate the messages 

block by block until reach the last function block in the end of the model. 
 

3. Calculate the belief of 𝑋1: 
 

                                              𝐵𝑋1(𝑋1) = 𝑚𝑚𝑚_𝑓(𝑋1) ∗ 𝑚𝑚𝑚_𝑏(𝑋1)                                     (4.33) 
 

This process can also be seen in Figure 4.4. What’s more, according to the algorithm 
discussed above, it can be seen that the two outgoing messages computed in each function 
block can update the backward and forward incoming message of its left and right function 

𝑃(𝑋1) 
  

𝑃(𝑋3|𝑋2) 
  

𝑃(𝑋2|𝑋1) 
  

𝑓1 𝑓2 𝑓3 
𝑿𝟏 𝑿𝟐 𝑿𝟑 

𝑚𝑚𝑚_𝑓(𝑋1) 

𝑚𝑚𝑚_𝑓(𝑋2) 

𝑚𝑚𝑚_𝑓(𝑋3) 

𝑚𝑚𝑚_𝑏(𝑋1) 

𝑚𝑚𝑚_𝑏(𝑋2) 

𝑚𝑚𝑚_𝑏(𝑋3) 


  CHAPTER 4. EXPECTATION PROPAGATION 

24 

block respectively in the model. Hence, the messages passing through the model can be 
refined block by block in iterations.  
 
The method for message computation and approximation for each function block is described 
below: 

 
Figure 4.5 Message computation and approximation for one function block 

 
According to figure 4.5, the task is to get the outgoing messages 𝑚𝑚𝑚_𝑏(𝑋𝑛−1)  and 
𝑚𝑚𝑚_𝑏(𝑋𝑛−1), while the incoming messages 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) and 𝑚𝑚𝑚_𝑏(𝑋𝑛) are given. 
 
Take the outgoing message 𝑚𝑚𝑚_𝑓(𝑋𝑛) for an example: 
 
According to the sum-production algorithm discussed in section 3.2.2, the belief of Xn can 
be calculated as: 
 

                                 𝐵𝑋𝑛(𝑋𝑛) = �𝑃(𝑋𝑛|𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) ∗𝑚𝑚𝑚_𝑏(𝑋𝑛)𝑑𝑥𝑛−1            (4.34) 

  
A projection can be made on BXn(Xn), which results an approximated distribution b�(Xn) 
and a normalizing factor Kn. 
 
                                                        𝑏�(𝑋𝑛) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷�𝐵𝑋𝑛(𝑋𝑛)�                                          (4.35) 
 
The outgoing message of Xn can be computed as: 
 

                                                              𝑚𝑠𝑠�_𝑓(𝑋𝑛) =
𝐾𝑛 ∗ 𝑏�(𝑋𝑛)
𝑚𝑚𝑚_𝑏(𝑋𝑛)

                                                      (4.36) 

 
The outgoing message 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) can be obtained in the same way. One thing should be 
mentioned that the algorithm above is possible to be simplified in some particular case, which 
will be discussed in the next chapter.

 
𝑃(𝑋𝑛|𝑋𝑛−1) 
  

𝑓𝑛 
𝑿𝒏−𝟏  𝑿𝒏 

𝑚𝑚𝑚_𝑓(𝑋𝑛−1) 

𝑚𝑚𝑚_𝑓(𝑋𝑛) 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) 

𝑚𝑚𝑚_𝑏(𝑋𝑛) 


  CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 

25 

Chapter 5               
Implementation of Expectation 
Propagation 

5.1 Optical communication model with linear block 

In this section, the optical communication system model involving linear (AWGN) function 
block will be discussed. The details of the message approximation for linear block will be 
talked about. The linear system model and the corresponding factor graph for two spans are as 
follows: 

 
Figure 5.1 System model with only linear block 

 
Figure 5.2 Factor graph for model with only linear block 
 

5.1.1 System setup 

As discussed in chapter 4, the initialization should be done before the main process will be 
carried out. For this system, choose two dimensional Gaussian distribution from the 
exponential family as the approximation term for the messages: 
 

                                          𝑟̃𝑖(𝑋) = 𝐶𝑖 ∗ 𝑒𝑒𝑒 �−
1
2

(𝑋 −𝑀𝑖) 𝛴𝑖−1 (𝑋 −𝑀𝑖)𝑇�                            (5.1) 

 
Obviously, the message passing through factor graph is the mean and covariance matrix of the 
Gaussian distribution. 
 

T     R 

N N N N 

𝑃(𝑎) 
  

𝑎 𝑅1 𝑋1 𝑃(𝑅1|𝑋1) 
 

𝑃(𝑋1|𝑎) 

𝑓1 𝑓2 𝑓3 


  CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 

26 

Initial all the forward incoming messages of each function block to have: 
 

𝑀𝑀𝑀𝑀:𝑀 = [0,0],𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶:𝛴 = �∞ 0
0 ∞� 

 
Considering the implementation, make ∞ to be 100. What’s more, the back incoming 
message from the observation R to the last function block in the end of the model can be seen 
as a Gaussian distribution with: 
 

𝑀𝑀𝑀𝑀:𝑀 = [𝑋𝑅 ,𝑌𝑅],𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶:𝛴 = �0 0
0 0� 

 
5.1.2 Message approximation for linear block 

 Figure 5.3 Message computation and approximation for linear function block 
 
According to figure 5.3, the task is to get the outgoing message: 𝑚𝑚𝑚_𝑏(𝑋𝑛−1)  and 
𝑚𝑚𝑚_𝑓(𝑋𝑛)  by using the incoming message: 𝑚𝑚𝑚_𝑓(𝑋𝑛−1)  and 𝑚𝑚𝑚_𝑏(𝑋𝑛). And the 
function 𝑓𝑛 is: 
 

                                           𝑓𝑛(𝑋𝑛,𝑋𝑛−1) = 𝐾 ∗ 𝑒𝑒 𝑝 �−
1
𝑁0

�|𝑋𝑛 − 𝑋𝑛−1|�2�                              (5.2) 

 
𝒎𝒎𝒎_𝒇(𝑿𝒏): For the reason that 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) has:  
 

𝑀𝑀𝑀𝑀:𝑀 = [0,0],𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶:𝛴 = �∞ 0
0 ∞� 

 
This makes the Gaussian distribution actually to be a uniform distribution. Consider that the 
function 𝑓𝑛 is Gaussian-like, therefore the message should be still a uniform distribution 
after passed the function. Hence, in this case: 
 
                                                                 𝑚𝑚𝑚_𝑓(𝑋𝑛) = 𝑚𝑚𝑚_𝑓(𝑋𝑛−1)                                                  (5.3) 

 
𝒎𝒎𝒎_𝒃(𝑿𝒏−𝟏): Refer to the formula (4.34): 
 

𝑃(𝑋𝑛|𝑋𝑛−1) 
  

𝑓𝑛 
𝑿𝒏−𝟏  𝑿𝒏 

𝑚𝑚𝑚_𝑓(𝑋𝑛−1) 

𝑚𝑚𝑚_𝑓(𝑋𝑛) 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) 

𝑚𝑚𝑚_𝑏(𝑋𝑛) 


  CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 

27 

                                 𝐵𝑋𝑛−1(𝑋𝑛−1) = �𝑃(𝑋𝑛|𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) ∗𝑚𝑚𝑚_𝑏(𝑋𝑛)𝑑𝑥𝑛   

 
Because the components involved in the formula above are all Gaussian-like, hence the 
message can be computed directly without any approximation. 
 

                                             𝑚𝑚𝑚_𝑏(𝑋𝑛−1) = �𝑃(𝑋𝑛|𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑏(𝑋𝑛)𝑑𝑥𝑛                              (5.4) 

 
According to formula (5.1) and (5.2) 
 

                                    𝑚𝑚𝑚_𝑏(𝑋𝑛−1) = �𝑁(𝑋𝑛;𝑋𝑛−1,𝑁0) ∗ 𝑁(𝑋𝑛;𝑀𝑛,𝛴𝑛) 𝑑𝑥𝑛                      (5.5) 

 
With formula (4.4), one can obtain: 
 

                                  𝑚𝑚𝑚_𝑏(𝑋𝑛−1) = �𝑁(𝑋𝑛−1;𝑀𝑛,𝑁0 + 𝛴𝑛) ∗ 𝑁(𝑋𝑛;𝑀, Σ) 𝑑𝑥𝑛                 (5.6) 

 
                                                                = 𝑁(𝑋𝑛−1;𝑀𝑛,𝑁0 + 𝛴𝑛) 
 

5.2 Optical communication model with linear and nonlinear block 

This section will introduce the nonlinear block (SPM) into the system discussed above. Monte 
Carlo method will be discussed to solve the message approximation for the nonlinear block. 
Finally, solutions will be given respectively for the optical communication system with single 
and dual polarization. The system model and factor graph for one span are as given below: 

 
Figure 5.4 System model with nonlinear and linear block 
 

Figure 5.5 Factor graph for model with linear and nonlinear block 

T    R 

N N N N 

NL NL 

 
𝑃(𝑎) 
  

𝑎 𝑅1 𝑋1 𝑃(𝑅1|𝑋1) 
 

𝑃(𝑋1|𝑎) 

𝑓1 𝑓2 𝑓3 


  CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 

28 

 
In figure 5.5, 𝑓2 and 𝑓3 represent the nonlinear and linear block respectively in one span. 
 

5.2.1 System setup 

The initialization for the system with nonlinear block is the same as in section 5.1.1. 
 

5.2.2 Message approximation for nonlinear system with single polarization 

Consider the picture in figure 5.3 again, the function block 𝑓𝑛 now is: 
 
                                   𝑓𝑛(𝑋𝑛,𝑋𝑛−1) = 𝛿�𝑋𝑛 − 𝑋𝑛−1 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑋𝑛−1||2)�                (5.7) 
 
𝒎𝒎𝒎_𝒇(𝑿𝒏): As discussed in section 5.1.2, the incoming message: 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) is actually 
a uniform distribution. Meanwhile the effect of function 𝑓𝑛 is a rotation for the incoming 
message according to message computation method discussed in chapter 3. Hence, formula 
(5.3) can be still applied in this case.  
 
𝒎𝒎𝒎_𝒃(𝑿𝒏−𝟏): Refer to the formula (4.34): 
      

⎩
⎨

⎧ 𝐵𝑋𝑛−1(𝑋𝑛−1) = �𝑃(𝑋𝑛|𝑋𝑛−1) ∗ 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) ∗𝑚𝑚𝑚_𝑏(𝑋𝑛)𝑑𝑥𝑛

𝑃(𝑋𝑛|𝑋𝑛−1) = 𝛿�𝑋𝑛 − 𝑋𝑛−1 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ||𝑋𝑛−1||2)�

 
               𝐵𝑋𝑛−1(𝑋𝑛−1)  = 𝑚𝑚𝑚_𝑓(𝑋𝑛−1) ∗  𝑚𝑚𝑚_𝑏 �𝑋𝑛−1 ∗ exp �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑋𝑛−1|�2��   (5.8)  

 
According to the EP algorithm discussed in section 4.2.2, 𝐵𝑋𝑛−1(𝑋𝑛−1) should be projected 
into a Gaussian distribution:  
 
                              𝑁(𝑋𝑛−1;𝑀𝑛−1, Σ𝑛−1) ~ 𝑏�(𝑋𝑛−1) = 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷�𝐵𝑋𝑛−1(𝑋𝑛−1)�              (5.9) 
 
In formula (5.10), the mean: 𝑀𝑛−1 and covariance: Σ𝑛−1 matrix of 𝐵𝑋𝑛−1(𝑋𝑛−1) should be 
calculated for the approximation term: 𝑏�(𝑋𝑛−1). In this case, the mean and covariance matrix 
is not easy to figure out mathematically, hence, the importance sampling and Monte Carlo 
integration [9, 10] are introduced to deal with the problem.  
 
Assume: 

            𝐵𝑋(𝑋)  = 𝑚𝑚𝑚_𝑓(𝑋) ∗  𝑚𝑚𝑚_𝑏 �𝑋 ∗ exp �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑋|�2�� 


  CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 

29 

                                                 = 𝑁1(𝑋) ∗ 𝑁2 �𝑋 ∗ 𝑒𝑒𝑒 �𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ �|𝑋|�2��                                         

                                                  = 𝑁1(𝑋) ∗ 𝑄(𝑋)                                                                                  (5.10) 
 
1. Draw 𝐿 samples from 𝑄(𝑋): �𝒙(𝟏), , , , , ,𝒙(𝒍)� 

 
 Draw L samples from 𝑁2(𝑋): �𝒚(𝟏), , , , , ,𝒚(𝒍)� 

 Rotate the samples:  𝒙(𝒍) = 𝒚(𝒍) ∗ 𝑒𝑒𝑒 �−𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ��𝒚(𝒍)��
2
�                          (5.11) 

 
2. Compute weights for samples: �𝒙(𝟏), , , , , ,𝒙(𝒍)�: 

 
                                                                    𝑤(𝑙) = 𝑁1�𝒙(𝒍)�                                                      (5.12) 
 

3. Normalize the weights: 
 

                                                                    𝑣(𝑙) =
𝑤(𝑙)

∑ 𝑤(𝑙)
𝐿

                                                        (5.13) 

 
The set: �𝒙(𝒍),  𝑣(𝑙)�𝒍=𝟏
𝑳

 is called the representation of  𝐵𝑋(𝑋). 

 
4. Calculate the mean and covariance by Monte Carlo integration: 

 
⎩
⎪
⎨

⎪
⎧ 𝒎� = �𝑣(𝑙)

𝐿

𝑙=1

∗ 𝑥(𝑙)

𝒗� = �𝑣(𝑙)
𝐿

𝑙=1

∗ �𝑥(𝑙) −𝒎� � ∗ �𝑥(𝑙) −𝒎� �
𝑇

                              (5.14) 

 
After the mean and covariance matrix have been approximated the message: 𝑚𝑚𝑚_𝑏(𝑋𝑛−1) 
can be made as in formula (4.36). 
 

5.2.3 Message approximation for nonlinear system with dual polarization 

In the system with dual polarization, two signals are transmitted simultaneously. The rotation 
phenomena caused by the nonlinear block for each signal is not only dependent on the signal 
itself but also on the other one. The formulas are given below: 
 

                            𝑋1𝑛 = 𝑋1𝑛−1 ∗ 𝑒𝑒 𝑝 � 𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ � �|𝑋1𝑛−1|�2 + �|𝑋2𝑛−1|�2 � �               (5.15) 

                            𝑋2𝑛 = 𝑋2𝑛−1 ∗ 𝑒𝑒 𝑝 � 𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ � �|𝑋1𝑛−1|�2 + �|𝑋2𝑛−1|�2 � �               (5.16) 


  CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 

30 

 
Figure 5.6 Message computation and approximation for nonlinear function block, dual polarization 
 

According to figure 5.6, it is obvious that the analysis method and the result of the message: 
𝒎𝒎𝒎_𝒇(𝑿𝟏𝒏) is the same as in the last section. The only difference is in the message: 
𝒎𝒎𝒎_𝒃(𝑿𝟏𝒏−𝟏) when processing the importance sampling and Monte Carlo integration in 
step 1: 
 
1. Draw 𝐿 samples from 𝑄(𝑋): �𝒙(𝟏), , , , , ,𝒙(𝒍)� 

 
 Draw L samples from 𝑁2(𝑋): �𝒚𝟏(𝟏), , , , , ,𝒚𝟏(𝒍)� 

 Draw L samples from 𝑁2
2(𝑋2): �𝒚𝟐(𝟏), , , , , ,𝒚𝟐(𝒍)� 

 
 Rotate the samples:   

 
                𝒙𝟏(𝒍) = 𝒚𝟏(𝒍) ∗ 𝑒𝑒𝑒 �−𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ ���𝒚𝟏
(𝒍)��

2
+ ��𝒚𝟐(𝒍)��

2
��                 (5.17) 

 
The following steps are the same as discussed in the last section. And the message: 
𝒎𝒎𝒎_𝒇(𝑿𝟐𝒏), 𝒎𝒎𝒎_𝒃(𝑿𝟐𝒏−𝟏) can be obtained in the same way above. 
 

5.3 Simplification on EP algorithm 

From the discussion on initialization and message computation and approximation method 
above, it can be easily computed that all the forward messages passing through the system 
have: 
 

𝑀𝑀𝑀𝑀:𝑀 = [0,0],𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶:𝛴 = �∞ 0
0 ∞� 

 
Therefore, the messages (mean and covariance matrix) will be converged in only one way 
from the end to the beginning of the model without iteration. Hence, the EP algorithm 
discussed in section 4.3 is simplified: 

𝑃(𝑋1𝑛|𝑋1𝑛−1) 
  

𝑓𝑛 
𝑿𝟏𝒏−𝟏  𝑿𝟏𝒏 

𝑚𝑚𝑚_𝑓(𝑋1
𝑛−1) 

𝑚𝑚𝑚_𝑓(𝑋1
𝑛) 𝑚𝑚𝑚_𝑏(𝑋1

𝑛−1) 

𝑚𝑚𝑚_𝑏(𝑋1
𝑛) 

𝑚𝑚𝑚_𝑏(𝑋2
𝑛) = 𝑁2

2(𝑋2) 


  CHAPTER 5. IMPLEMENTATION OF EXPECTATION PROPAGATION 

31 

 
4. Initialize the forward and backward messages. 

 
5. Make approximations for the messages of the last function block in the end of the model. 

Move to the backward function block and obtain the corresponding messages block by 
block until finish the message approximation for the second function block of the model 
(block 𝑓2). 
 

6. Calculate the belief of 𝑋𝑛: 
   

                                              𝐵𝑋𝑛(𝑋𝑛) = 𝑚𝑚𝑚_𝑓(𝑋𝑛) ∗ 𝑚𝑚𝑚_𝑏(𝑋𝑛)                                     (5.18) 
 
Note that the algorithm can be easily extended to dual polarization system.
 
  
  CHAPTER 6. DATA ANALYSIS AND COMPARISON 

32 

Chapter 6                      
Data analysis and comparison 

6.1 Introduction to ML, SBP, BP method 

In the following sections of this chapter, the results from the EP implementation will be 
analyzed together with some other methods such as ML (Maximum Likelihood), SBP 
(Stochastic Back Propagation) [11] and BP (Back Propagation) [11]. This section will give a 
short introduction of these technologies. Moreover, some simulation parameters will be also 
mentioned in the end. 
 

6.1.1 ML, SBP and BP 

ML: ML (Maximum Likelihood) is a common used method for the receiver design. It simply 
compares the ‘distance’ between the observation 𝒓𝑵 and every possible original signal 𝐬, 
finds the smallest ‘distance’ and makes decision: 
 

𝒔� = 𝑎𝑎𝑎𝑚𝑚𝑚 ‖𝒓𝑵 − 𝒔‖2, 𝒔 ∈ 𝛺2                                          (6.1)  
 

SBP: SBP (Stochastic Back Propagation) [11] is a new method developed for solving the 
problem in optical communication system. It is based on the factor graphs and particle 
technology which draws a number of samples (particles) for each observation in the end. 
Then all the particles are propagated backward to the beginning of the model and try to 
compensate for the distortions when passing through the system model. Finally, Monte Carlo 
integration is applied. And the result is compared with all possible original signals in order to 
make a decision. 
 
BP: BP (Back Propagation) is similar to SBP, the main difference is that BP does not consider 
the AWGN in the model and does not use particles while SBP dose.  
 

𝒔� = 𝑎𝑎𝑎𝑚𝑚𝑚 �𝒓𝑵 − 𝒔 ∗ 𝑒𝑒𝑒 (𝑗 ∗ 𝛾𝐿𝑒𝑒𝑒 ∗ 𝑁 ∗ ||𝒔||2)�2, 𝒔 ∈ 𝛺2            (6.2) 

 
6.1.2 Simulation parameters 

During the simulation, the system has: 16-QAM constellation, 22 spans, 𝑁0 = 4.89 ∗
10−7 𝑊/𝐻𝐻 for AWGN block, 𝛾 = 1.25𝑊−1𝑘𝑘−1  and 𝐿𝑒𝑒𝑒 = 17.36 𝑘𝑘  for the 
nonlinear block, 𝑁𝑁𝑁 = 100 for importance sampling. 


  CHAPTER 6. DATA ANALYSIS AND COMPARISON 

33 

 
6.2 System with linear block 

The figure below presents the result SER along with the input power 𝑃𝑖𝑖 in dBm for the 
linear system equipped with EP and ML receiver.  

 
Figure 6.1 SER as a function of 𝑃𝑖𝑖 with 16-QAM 

 
The figure shows that the two curves coincide with each other. The EP and ML receiver has 
the same performance with each other. This is reasonable, because in EP process, the 
observation 𝒓𝑵 is just the mean matrix of the first message. According to the message 
approximation method for linear block in section 5.1, the mean of the message won’t change 
while passing through the factor graph while the covariance of the new message is the 
covariance of the old message plus 𝑵𝟎 (covariance of AWGN). 
 
Hence, the final message used to make the decision has the mean matrix which is equal to 𝒓𝑵 
and covariance matrix which is equal to the sum of 𝑵𝟎 (covariance of AWGN). Therefore, 
the decision from EP is expected to be the same with the result from ML receiver which 
measures the ‘distance’ between the observation 𝒓𝑵 and all possible original signals. 
 
One more comment in this case can be given is that the process of EP is more efficient than 
SBP. The reason is that in SBP a number of samples should be generated and transmitted back 
through the model. For each linear block, every sample should be added an AWGN in order to 
compensate for the distortion caused by the linear block. And finally Monte Carlo integration 

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
10

-4

10
-3

10
-2

10
-1

10
0

S
E

R

Pin (dBm)

 
EP
ML


  CHAPTER 6. DATA ANALYSIS AND COMPARISON 

34 

has to be used. While for EP, only the mean and covariance are transmitted and changed in the 
factor graph.  
 

6.3 System with linear and nonlinear block 

This section will firstly show how the message is changed when passing through the factor 
graph. And the influence of the number of samples in Monte Carlo in nonlinear block will be 
also discussed. The comparison between EP and other methods in SER is given at last. 
 

6.3.1 Message changing through the factor graph 

It has already been mentioned that the message passing though the factor graph is 
approximated into Gaussian distributions. The figures below displays how the message 
(Gaussian distribution) is changed from the end to the beginning of the factor graph in one 
polarized system. 
 

(1)                                 (2) 

 
(4)                                 (3) 

-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1
-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

x

y

-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08
-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

x

y

-0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08
-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

x

y

-0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08
-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

x

y


  CHAPTER 6. DATA ANALYSIS AND COMPARISON 

35 

 
(5)                                 (6) 
 

Figure 6.2 Message changing from the end to the beginning of the model 
  
The Observation and original signal in the case of figure 6.2 are: 
 

𝒓𝑵 = [0.0906,−0.0093], 𝒔 = [−0.0634, 0.0634] 
 
From the figure, one can see that the approximated Gaussian distribution finally reach the 
original signal step by step. Actually, a correct decision can be just made from the fifth 
subplot of figure 6.2 for the reason that the value of the Gaussian distribution for the original 
signal is already the maximum one among the 16-QAM constellation.  
 
Note that the whole process has 44 steps for 22 spans. Figure 6.2 only shows 6 steps to 
roughly present the way EP works. 
 

6.3.2 Effect of the number of samples in Monte Carlo 

It has been already discussed that the Monte Carlo method has to be used to make the 
approximation for the messages for each nonlinear block. Hence, the number of samples drew 
by using Monte Carlo is expected to affect the final result. Figure 6.3 shows the relationship 
between the SER and the number of samples for the two polarized system with a fixed input 
power in 4 dBm. 
 
The figure shows that the SER goes down dramatically at first and doesn’t decrease much 
after the number of sample reaches certain value. Because the number of sample in Monte 
Carlo will obviously influence the speed of the process, a ‘cost-effective’ number of samples 
should be chosen when considering the efficiency of the process. Figure 6.3 displays that a 
number around 300 could be a reasonable one for the system. In this thesis, consider the 
condition of the simulation equipment and comparison, the number of 100 is chosen  
 

-0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08
-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

x

y

-0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08
-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

x

y


  CHAPTER 6. DATA ANALYSIS AND COMPARISON 

36 

 
Figure 6.3 SER along with the number of samples for dual polarization 

 
6.3.3 Comparison with other methods 

In this section, a comparison has been made between EP and other methods aiming to analyze 
the different performance of these methods.  
 
Figure 6.4 and 6.5 compare the SER along with the input power between ML, BP, SBP and 
EP method for single polarization and dual polarization respectively. 100 particles (samples) 
are used in SBP and EP in both single and dual polarization. From these two figures, 
following comment could be obtained: 
 
1. All the methods except ML have the lowest SER in the middle area along the X axis and 

the performance roughly get worse while the input power decrease and increase from the 
lowest SER point. 
 

2. All the methods have better performance in single polarization system. 
 

3. EP has a greater degradation in performance than SBP and BP dose when the system 
extends from single polarization to dual polarization. 
 

100 200 300 400 500 600 700 800 900 1000

10
-3

10
-2

S
E

R

Num of samples

 
Two polarization


  CHAPTER 6. DATA ANALYSIS AND COMPARISON 

37 

 
Figure 6.4 SER for single polarization with 16-QAM 

 
Figure 6.5 SER for dual polarization with 16-QAM 

 
-10 -5 0 5 10 15
10

-6

10
-5

10
-4

10
-3

10
-2

10
-1

10
0

S
E

R

Pin (dBm)

 
ML
BP
SBP
EP

-10 -5 0 5 10 15

10
-4

10
-3

10
-2

10
-1

10
0

S
E

R

Pin (dBm)

 
ML
BP
SBP
EP


  CHAPTER 6. DATA ANALYSIS AND COMPARISON 

38 

4. In both single and dual polarization system, SBP and ML have the best and worst results 
respectively. And BP and ML are comparable with each other. 
 

5. The curve of BP method has oscillations in both single and dual polarization.  
 
 
Firstly, because the ML method just neglects the nonlinear effect, so it is not strange that it 
has the worst performance while the input power grows. All of the rest three methods have 
taken nonlinear into consideration. AWGN and nonlinear effect are the main distortions for 
which BP, SBP and EP try to compensate. When the input power is pretty small the AWGN 
contribute the most to the distortion in the system. When the input power is getting bigger, the 
effect of the AWGN becomes smaller and the SER goes down until it reaches the lowest point.  
As the input power continues increasing, the nonlinear effect dominates the distortion and will 
increase the value of SER. That is why the SER curve has ‘V’ shape in the figure. 
 
In dual polarization the rotation caused by the nonlinear block is based on both two signals 
transmitted simultaneously in the system (formula 2.5). Hence, the nonlinear effect in dual 
polarization is stronger than that in single polarization and the obtained worse results. 
 
Moreover, it can be found that the performance of EP obviously has a bigger degradation than 
that of SBP and BP has from the single to dual polarization. The possible reason for this 
phenomenon is that: In the dual polarization system, the nonlinear rotation effect is greater 
than that in the single polarization case. Hence, the Gaussian distribution chosen to 
approximated the message in the system may be no longer a good approximation term. And it 
can be also expected that the performance of EP with Gaussian distribution as the 
approximation term will have a great drop when the system extends to triple or quadruple 
polarization system. 
 
It can be seen from the figure that SBP give out the best performance in SER. The possible 
reason is that: 
 
Compared with BP, SBP has taken care of AWGN while BP doesn’t. In the nonlinear system 
model each AWGN block is followed by a nonlinear block. When the input power is high, the 
interaction between the AWGN and the nonlinear rotation also get bigger, which is supposed 
to be the reason for the poorer performance for BP in high input power compared with SBP 
and EP. Moreover, because of the natural uncertainty of AWGN, the uncertainty of the result 
SER for BP is also expected. That can explain the oscillation happened in the BP curve in the 
figure above. 
 
Compared with BP, both SBP and EP have taken AWGN and nonlinear into consideration. 
Nevertheless, SBP uses particles and makes a stochastic process in the model while EP makes 
approximations block by block, in other words, EP process continues losing the accuracy 
while it makes approximations for the blocks in the model. That is why EP obtains the SER 
curve which has the same shape with SBP but a worse SER.  


  CHAPTER 6. DATA ANALYSIS AND COMPARISON 

39 

 
One more thing should be mentioned is that the strategy of EP is making a balance between 
the accuracy of the result and the efficiency of the process. In the linear system, EP 
successfully makes a more efficient algorithm than SBP does. However, because of the use of 
Monte Carlo method for the nonlinear block, the performance of EP in the nonlinear system is 
more or less the same with SBP when it comes to the efficiency. 
 
 
  CHAPTER 7. CONCLUSION AND FUTURE WORK 

40 

Chapter 7                      
Conclusion and future work  

7.1 Conclusion 

The thesis works on factor graphs and expectation propagation (EP) and try to implement 
these technologies into the coherent optical communication system aiming to deal with the 
linear and nonlinear distortion existed in the system. The results indicate that factor graphs 
and EP works well on both linear (AWGN) and nonlinear system. Moreover, Gaussian 
distribution has also been proved a reasonable approximation term for EP to approximate the 
messages in the system.  
 
Comparison has also been made between EP and other existing methods. Generally, the 
performance of EP is as expected. 
 

7.2 Future work 

According to the discussions in the thesis, the EP algorithm and the message computation 
method could be improved to make a more efficient algorithm. Moreover, besides the linear 
and nonlinear effect, other kinds of distortion and components in the system could also be 
take into consider. 
 

7.2.1 Expectation propagation 

As discussed in chapter 6, the Monte Carlo method used in the message approximation for 
nonlinear block makes EP algorithm less efficient. Hence, more work could be taken on how 
to make message approximation process mathematically or use other methods which can be 
more efficient than Monte Carlo. 
 
What’s more, discussion shows that, as an approximation term, the performance of Gaussian 
distribution degrades a lot in the multiplexed polarization system. Therefore, a better 
distribution could be chosen from the exponential family for the multiplexed polarization 
system.  
 

  CHAPTER 7. CONCLUSION AND FUTURE WORK 

41 

7.2.2 Other problems 

Besides the linear and nonlinear distortion discussed in the thesis, other problems could also 
be taken into consider.  
 
Polarization dependent loss (PDL): PDL is also a significant distortion which influences the 
performance of the system. PDL can be calculated as the ratio of the maximum and minimum 
power amplitude to all the states of the polarization. In the optical system model, PDL exists 
in each span of fiber and the PDL of the system cannot be obtained by just adding all the PDL 
components together but should be considered separately [11]. 
 
More noise: This thesis only takes care of AWGN and nonlinear phase noise. In the future 
work, more noise could be taken into consider for the system, such as photoelectron noise, 
thermal noise and photon noise.  
 
Moreover, the unitary channel [11] and the corresponding equalizer are also suggested to be 
added into the system model.
 
  
  REFERENCES 

42 

References 
 
[1] S. Tan, H. Wymeersch, P. Johannisson, E. Agrell, P. Andrekson, and M. Karlsson, “An 

ML-based detector for optical communication in the presence of nonlinear phase noise,” 
International Conference on Communications, 2010. 

 
[2] Nan Jiang, Yan Gong, Johnny Karout, Henk Wymeersch, Pontus Johannisson, Magnus 

Karlsson, Erik Agrell, Peter A. Andrekson, “Stochastic Backpropagation for Coherent 
Optical Communications,” 37th European Conference and Exhibition on Optical 
Communication (ECOC), 2011.  

 
[3] H. Wymeersch, Iterative Receiver Design. Cambridge University Press, 2007, pp. 36-75. 
 
[4] Scott L. Miller, Donald G. Childers, Probability and Random Processes with 

Applications to Signal Processing and Communications. Elsevier Academic Press, 2004, 
pp. 148-232 

 
[5] Peter Ahrendt, The Multivariate Gaussian Probability Distribution. IMM, Technical 

University of Denmark, 2005. 
 
[6] Thomas P Minka, “A family of algorithms for approximate Bayesian inference,” 

Doctor’s thesis, 2001. 
 
[7] Thomas P Minka, “Expectation Propagation for Approximate Bayesian Inference,” 

Statistics Dept. Carnegie Mellon University, 2002. 
 
[8] Tom Heskes, Onno Zoeter, “Extended Version of “Expectation propagation for 

approximate inference in dynamic Bayesian networks”,” SNN, University of Nijmegen, 
The Netherlands, 2003. 

 
[9] David J.C. MacKay, Information Theory, Inference, and Learning Algorithms. 

Cambridge University Press, 2005, pp. 357–382. 
 
[10] Eric C. Anderson, “Monte Carlo Methods and Importance Sampling,” Lecture Notes for 

Stat 578C, Statistical Genetics, 1999 
 
[11] Yan gong, Nan jiang, “Stochastic Backpropagation for Coherent Optical 

Communication,” Master’s thesis in optical communication systems, 2011 
 
[12] Shuang Wang, “Expectation Propagation Algorithm,” School of Electrical and 

Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, 2011. 
 

	Master of Science Thesis in the Master Degree Programme,      Communication Engineering
	SHENZHOU HOU
	Abstract
	Acknowledgements
	Contents
	Chapter 1                       Introduction
	1.1 Background
	1.1.1 Coherent optical communication system
	1.1.2 Factor graphs and Expectation propagation

	1.2 Purpose
	1.3 Structure of the thesis

	Chapter 2                       Coherent optical communication system
	2.1 Introduction to the coherent optical communication model
	2.2 SPM, AWGN and Polarization
	2.3 The MAP Receiver

	Chapter 3                        Factor graphs
	3.1 Introduction to Factor graphs
	3.1.1 Marginals, Function factorizations, Factor graphs
	3.1.2 Sum-product algorithm in Factor graphs

	3.2 Factor graphs for the optical communication system
	3.2.1 Factorization and Graphs
	3.2.2 Sum-product algorithm
	3.2.3 Challenges


	Chapter 4                   Expectation Propagation
	4.1 Introduction to Expectation Propagation
	4.1.1 Why Expectation Propagation
	4.1.2 Exponential family and Kullback-Leibler divergence
	4.1.3 Expectation Propagation
	4.1.4 Potential problem

	4.2 An example of Expectation Propagation: clutter problem
	4.2.1 The Problem
	4.2.2 Factor Graph
	4.2.3 Processing Expectation Propagation

	4.3 Expectation Propagation for the optical communication system
	4.3.1 Expectation Propagation for message computing
	4.3.2 Message computing for chain-like model with Expectation Propagation


	Chapter 5               Implementation of Expectation Propagation
	5.1 Optical communication model with linear block
	5.1.1 System setup
	5.1.2 Message approximation for linear block

	5.2 Optical communication model with linear and nonlinear block
	5.2.1 System setup
	5.2.2 Message approximation for nonlinear system with single polarization
	5.2.3 Message approximation for nonlinear system with dual polarization

	5.3 Simplification on EP algorithm

	Chapter 6                      Data analysis and comparison
	6.1 Introduction to ML, SBP, BP method
	6.1.1 ML, SBP and BP
	6.1.2 Simulation parameters

	6.2 System with linear block
	6.3 System with linear and nonlinear block
	6.3.1 Message changing through the factor graph
	6.3.2 Effect of the number of samples in Monte Carlo
	6.3.3 Comparison with other methods


	Chapter 7                      Conclusion and future work
	7.1 Conclusion
	7.2 Future work
	7.2.1 Expectation propagation
	7.2.2 Other problems


	References