JESD204 

JESD204 Receiver and Data Reduction
Implementation for an SoC Platform

Master’s thesis in Embedded Electronic System Design

Ammar Shihabi
Lucas Johansson

Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2024


Master’s thesis 2024

JESD204 Receiver and Data Reduction
Implementation for an SoC Platform

Ammar Shihabi
Lucas Johansson

Department of Computer Science and Engineering
Chalmers University of Technology

University of Gothenburg
Gothenburg, Sweden 2024


JESD204 Receiver and Data Reduction Implementation for SoC Platform

Ammar Shihabi
Lucas Johansson

© Ammar Shihabi, Lucas Johansson, 2024.

Supervisor: Lars Svensson, Department of computer science and engineering
Advisors: Jan Andersson, Joaquín España Navarro, Frontgrade Gaisler
Examiner: Per Larsson-Edefors, Department of computer science and engineering

Master’s Thesis 2024
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: AI generated picture of ”System on Chip” from Pixlr

Typeset in LATEX
Gothenburg, Sweden 2024

iv


JESD204 Receiver and Data Reduction Implementation for SoC Platform
Ammar Shihabi, Lucas Johansson
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg

Abstract
Field programmable gate arrays (FPGAs) are currently used within the signal pro-
cessing chain of space systems to transfer data from high-speed sensors. Such sys-
tems employ a number of FPGAs, that are primarily used for tasks such as data
reception and data reduction, which is a critical step as microprocessors often strug-
gle to handle data at such high rates. The FPGAs cause an overall increase in
resource and power usage for the critical space computer systems with limited re-
sources. This thesis presents a prototype implementation of a JESD204B high-speed
serial receiver, together with an investigation of some existing data reduction algo-
rithms that are suitable for hardware implementation.

The methodology used in this project involved constructing an SoC subsystem on an
FPGA to create the signal processing chain needed to obtain high-speed communi-
cation. Formal verification methods were used extensively to verify the functionality
of the receiver. Python was used to explore two different implementations for data
reduction. The receiver RTL demonstrated correct behavioral functionality against
a transmitter in simulation. Although the receiver was successfully implemented on
the FPGA, actual data reception on the hardware was not achieved due to time limi-
tations. The study of the algorithms showed valuable results, making them practical
both for research and in terms of hardware implementation. Future work includes
establishing receiver and transmitter communication to read actual data on hard-
ware, further developing the receiver, and finally implementing the data reduction
algorithms in hardware.

Keywords: JESD204 receiver, data reduction algorithms, FPGA, System On Chip,
ADC, SerDes, Register Transfer Level, Advanced Peripheral Bus, Advanced High-
performance Bus, GRLIB.

v


Acknowledgements
First and foremost, we would like to express our deepest gratitude to our families.
Their unwavering support, infinite love, and endless encouragement have been the
reason for us continuing our journey and this education. Through every challenge
and achievement, they have stood by us, offering strength and inspiration. This
thesis is as much a testament to their dedication and sacrifices as it is to our hard
work. We are extremely grateful for their presence in our lives and for believing in
us every step of the way.

We also extend our sincere thanks to Jan Andersson for giving us this opportunity
to do this project. Special thanks to our technical advisor Joaquín España Navarro
and the others at Gaisler for giving us technical guidelines, as well as inspiring
ideas to enhance the outcome of this project. Finally, a big thanks to our academic
supervisor from Chalmers, Prof. Lars Svensson, as well as our examiner Prof. Per-
Larsson Edefors, who not only gave invaluable counseling when planning this project
but also provided careful reviews and insightful feedback.

Ammar Shihabi, Lucas Johansson, Gothenburg, 2024-09-18

vii


Contents

1 Introduction 1
1.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose and aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Scope and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Report organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Technical background 3
2.1 Field programmable gate array (FPGA) . . . . . . . . . . . . . . . . 4
2.2 Data converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Analog-to-digital conversion . . . . . . . . . . . . . . . . . . . 4
2.2.2 Digital-to-analog conversion . . . . . . . . . . . . . . . . . . . 4

2.3 Serial peripheral interface (SPI) . . . . . . . . . . . . . . . . . . . . . 5
2.4 AMBA shared bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4.1 AMBA AHB . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4.2 AMBA APB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 GRLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.6 JESD204 high-speed interface . . . . . . . . . . . . . . . . . . . . . . 7

2.6.1 JESD204B overview . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.2 Deterministic latency . . . . . . . . . . . . . . . . . . . . . . . 9
2.6.3 JESD204B subclasses . . . . . . . . . . . . . . . . . . . . . . . 10

2.7 JESD204B layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7.1 Physical layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7.2 Transport layer . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7.3 Scrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7.4 Data link layer . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7.5 Application layer . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.8 Data Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8.1 Run-length encoding . . . . . . . . . . . . . . . . . . . . . . . 19
2.8.2 Moving average . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Methods 21
3.1 Literature study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Define data acquisition use case . . . . . . . . . . . . . . . . . . . . . 21
3.3 Designing the receiver . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Creation of a subsystem . . . . . . . . . . . . . . . . . . . . . . . . . 22

ix


Contents

3.5 Verification through simulation . . . . . . . . . . . . . . . . . . . . . 22
3.6 Hardware verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Design and Implementation 25
4.1 Design overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.1 Choice of the JESD204 version . . . . . . . . . . . . . . . . . 26
4.1.2 Hardware selection and setup . . . . . . . . . . . . . . . . . . 26

4.2 JESD204B receiver operation . . . . . . . . . . . . . . . . . . . . . . 27
4.3 JESD204B receiver design overview . . . . . . . . . . . . . . . . . . . 29

4.3.1 Link parameter configuration . . . . . . . . . . . . . . . . . . 29
4.3.2 High-speed SerDes IP . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.3 Design of link layer . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.4 Design of transport layer . . . . . . . . . . . . . . . . . . . . . 31

4.4 Testbench setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Application layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5.1 Run-length encoding . . . . . . . . . . . . . . . . . . . . . . . 33
4.5.2 Moving average . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Results 35
5.1 Behavioral simulation of receiver RTL . . . . . . . . . . . . . . . . . . 36
5.2 FPGA implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2.1 Apbfifo and GRMON . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.2 ADC configuration and SerDes . . . . . . . . . . . . . . . . . 39
5.2.3 JESD204B receiver performance evaluation . . . . . . . . . . . 40

5.3 Application layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3.1 RLE results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3.2 Moving average results . . . . . . . . . . . . . . . . . . . . . . 44
5.3.3 Algorithm-FPGA interaction . . . . . . . . . . . . . . . . . . . 46

6 Conclusion 47

Bibliography 49

x


Abbreviations and definitions

General
FPGA : Field-Programmable Gate Array
ADC : Analog to Digital Converter
SoC : System on Chip
SerDes : Serializer/De-serializer
RD : Running Disparity
RLE : Run-Length Encoding
SMA : Sliding moving average
BMA : Block moving average
Octet : A group of eight bits
VHDL : Very High-Speed Integrated Circuit Hardware Description Language
RTL : Register-Transfer Level
IP : Intellectual property
SPI : Serial Peripheral Interface
AMBA : Advanced Microcontroller Bus Architecture
APB : Advanced Peripheral Bus
AHB : Advanced High-performance Bus
GRLIB : Gaisler Research IP Library
GRMON : Gaisler Research Monitor
LVDS : Low-Voltage Differential Signaling
PLL : Phase-Locked Loop

JESD204 specific
F : Number of octets per frame on a single lane
K : Number of frames per multiframe
N : Converter resolution
N’ : Bits per sample
CS : Control Bits
T : Tail Bits
LMFC : Local Multi-Frame Clock
Lane : A high-speed serial data channel in the JESD204B interface
Link : A high-speed connection between transmitter and receiver devices
Device clock : Clock signal from which a Tx or Rx must generate its local clocks
Core clock : A clock inside a transmitter or receiver device, used in the implemen-
tation of the JESD204 link
CGS : Code Group Synchronization
ILA : Initial Lane Alignment
Subclass : JESD204B supports three subclasses (0, 1, 2), each providing different
synchronization
SYSREF : Signal for achieving deterministic latency in subclass 1 systems
SYNC∼ : Signal for establishing synchronization between transmitter and receiver,
and deterministic latency in subclass 2 systems

xi


xii


1
Introduction

Contemporary space systems employ field programmable gate arrays (FPGAs) within
the signal processing chain, transferring data from high-speed sensors using analog
to digital converters (ADCs) to the FPGAs, before reaching the microprocessor sys-
tem on chip (SoC) [1]. The FPGA’s primary role in these systems would be to
act as a high-speed serial link to receive sensor data, which is a critical step as the
SoC processor often struggles to handle data at the requisite high rates [2]. The
corresponding large amount of sensor data needs to be handled in the critical space
computer systems with limited resources.

Often, a series of FPGAs are used to perform different tasks, such as data reception
and data reduction. However, the FPGAs used for data reduction come with a
significant overhead in terms of power dissipation and resource usage. In this report,
we explore the feasibility of implementing a high-speed serial link by the means of
a JESD204 receiver as an IP-core which will be directly integrated into the GR765
SoC from Frontgrade Gaisler [3]. Additionally, our work will attempt to investigate
data reduction algorithms on the incoming sensor data on the embedded FPGA
fabric of the SoC. This would eliminate the need for an external FPGA.

1.1 Related work
One relevant example of an existing architecture that could be improved through
the proposed thesis work, is the Laser-Ablation Time-of-Flight Mass Spectrometer
instrument (LMS) designed at the Physikalisches Institut of the University of Bern
[4][5]. This application produces 4 GBps (32 Gbps) in an architecture that is in the
end controlled by a GR712RC 100 MHz dual-core 32-bit processor. The GR712RC
SoC lacks a high-speed serial interface to acquire the data and does not have the
computational capabilities to process a high-bandwidth data stream [6]. To solve
the problem, the GR712RC would today be combined with several FPGAs to do
acquisition and data decimation, where relatively simple operations such as binning
can reduce the data stream from Gbps to kbps.

1


1. Introduction

1.2 Purpose and aim
The goal of this project is to investigate the feasibility of implementing a JESD204
receiver into Gaisler’s SoC. For that, we will study and specify the requirements of
a JESD204 receiver, to establish communication with data converters, with the goal
of developing a receiver prototype. Furthermore, we aim to study different data
reduction algorithms, and perform data reduction on the received data.

The motivation behind this integration is to achieve an overall reduction in resource
and power usage for SoC processors. To interface a JESD204 component, which
is gaining popularity amongst data converter manufacturers [7], with any of the
Gaisler LEON/NOEL processors [8], customers would currently need to implement
their own JESD204 receiver on an external FPGA, increasing resource usage and
power in resource-constrained space applications.

1.3 Scope and limitations
We develop this system for usage within Gaisler’s own SoC and GRLIB, which is
a library containing different SoC IP cores [9]. Verification of the receiver needs to
be done, ideally using a third party transmitter to make sure the standard in the
receiver is as specified by the JESD204 specification. The RTL of the receiver IP
core will be verified using EDA tools, then implemented on a hardware platform by
the means of an FPGA.

The primary focus of this work will be on the implementation of the JESD204
receiver, with the development and complexity of the data reduction algorithms
being considered a secondary priority, since the reduction rate and accuracy of the
data reduction algorithm are not useful without a functional receiver.

An important part of the complete IP core would be the interface between the
JESD204 receiver and the direct memory access (DMA) engine. Writing a new
DMA engine can be beneficial for the new JESD204 receiver. However, this would
introduce extra time and complexity to this project, which is why it will be omitted
from the scope of this thesis.

1.4 Report organization
In the theory chapter, a technical background of the JESD204 standard is first
established. Communication protocols and other modules related to the system
are introduced. Then, we will investigate some existing data reduction algorithms,
with the intent to perform data reduction on received data. In the design and
implementation chapter, the system is described from top to bottom in detail. In
chapter five, the performance of the system is evaluated by means of behavioral
simulation of RTL and hardware implementation. Chapter six concludes our work
with some reflections, and presents ideas for future work surrounding this project.

2


2
Technical background

In this part, necessary technical background for this thesis project is presented. We
will review aspects such as the JESD204 standard with its different layers, data
reduction, serial communication protocol for ADC programming, GRLIB and the
tools used in this project.

For the hardware part, we focus on the establishment of the sensor data reception
from an ADC. An FPGA is used for the receiver implementation and to support
an investigation of algorithmic data reduction, leveraging plenty of available pro-
grammable logic units and memory resources.

Figure 2.1 presents the intended JESD204 receiver IP-core on a Gaisler SoC archi-
tecture.

AMBA AHB

Memory
Controller

AHB/APB
bridge

Leon Processor
Leon Processor

SPARC V8

LEDs &
Switches

I/O ports I2C & 
SPI

UART

AMBA APB

JESD204
Receiver

eFPGA
Data

Reduction

Debug
Unit

GRMON

SerDes

/ /

High-Speed Serial Data
Existing IP-cores

Thesis poject study

Figure 2.1: System overview of the desired JESD204 receiver IP, with data reduction
implementation on FPGA fabric, inside an SoC.

3


2. Technical background

2.1 Field programmable gate array (FPGA)
An FPGA is a reconfigurable integrated circuit that can be configured to mimic
the behavior of any logic circuit. FPGAs are configured via a hardware description
language (HDL), such as Verilog or VHDL. FPGAs are made up of interconnects
and configurable logic blocks (CLBs), which use lookup-tables (LUTs) to imple-
ment desired logic gates. Due to the unique design of FPGAs, the CLBs can be
configured and operate in parallel, making FPGAs a very suitable platform for high-
performance applications [10].

Compared with a design implemented in an application specific integrated circuit
(ASIC), the same design implemented in an FPGA will consume more power [10].
FPGAs carry additional resources and routing to support reconfigurability, which
in turn results in higher power consumption compared to the optimized, fixed paths
in ASICs. However, this trade-off allows designs implemented in FPGAs to be more
versatile and quicker to deploy, as they do not require the extensive fabrication
process associated with ASICs.

2.2 Data converters
Data converters serve as a link between the analog world and the digital world. They
are responsible for the transformation of data between the discrete and continuous
domains [11].

2.2.1 Analog-to-digital conversion
An analog-to-digital converter (ADC) converts continuous-time and continuous-ampl-
itude signals into discrete-time and discrete-amplitude. The main blocks of an ADC
are the sampler and quantizer [11]. The purpose of a sampler is to sample the
continuous signal at certain time intervals, and construct a sampled duplicate of
the original waveform. The distance between two samples is determined by the
ADC sampling frequency. One should opt for a sampling frequency operating at
no less than twice the rate of the frequency present in the input signal, in order to
reduce aliasing. To further avoid aliasing and interference, an anti-aliasing filter is
commonly implemented before the sampler.

The role of the quantizer is to map the captured samples to a fixed set of discrete
digital values, determined by the ADC resolution. Greater resolution leads to a
small quantization error, which is the difference between the original sample and
the output.

2.2.2 Digital-to-analog conversion
A digital-to-analog converter (DAC) consists of a transcoder stage and a reconstruc-
tion stage. The transcoder converts digital samples into analog pulses. The recon-
struction stage merges the analog pulses into a continuous signal, using a sample-

4


2. Technical background

and-hold (S&H) circuit, and a reconstruction filter. The filter will remove high
frequency components, resulting in a more polished output [11].

2.3 Serial peripheral interface (SPI)
Serial communication protocols are needed to configure many ADCs and DACs.
One common protocol that is extensively used in data converter devices is the serial
peripheral interface (SPI).

SPI is a synchronous fully duplex interface, which means that both nodes can send
data at the same time. SPI is one of the most popular and used communication
protocols between controller devices (FPGAs, microcontrollers, etc) and peripheral
circuits such as sensors and actuators but also ADCs and DACs [12]. Since there is
no standard way of specifying an SPI bus, different vendors use different approaches
to accomplish SPI communication.

There are the possibilities of using three or four-wire SPI. Four-wire SPI, which is
the most common, uses CS, SCLK, MOSI and MISO signals. In three-wire protocol,
MISO and MOSI are combined into one bidirectional wire [12] that sends and receives
between the nodes. The idle state of chip select signal is high. To activate communi-
cation with a peripheral, the controller unit pulls CS signal to low. After that, data
is read and sent at serial clock rate.

The clock signal can be generated in several ways. Clock polarity (CPOL) and clock
phase (CPHA) are two parameters to consider when working with SPI protocol, the
choice of which will affect the order of how data is sampled and shifted out. Table 2.1
describes the different clock polarity and phase combinations.

Table 2.1: SPI modes with serial clock.

SPI
mode CPOL CPHA Clock idle state Data

0 0 0 Low Data sampled on rising edge.
Shifted out on falling edge

1 0 1 Low Data sampled on falling edge.
Shifted out on rising edge

2 1 0 High Data sampled on rising edge.
Shifted out on falling edge

3 1 1 High Data sampled on falling edge.
Shifted out on rising edge

As previously mentioned, there is no standard way of using SPI. However, most
of the datasheets for sensors and ADCs use mode 0 with CPOL 0 and CPHA 0. A
possible timing diagram can be found in Figure 2.2. Notice that the first bit in MOSI
specifies if the controller unit wishes to write or read data to a certain register in the
peripheral. Next bit is responsible for indicating data size (few/many bits). The rest
of the data stream contains the address and eventually the data that will be sent to
the peripheral. If a ”read” signal is read from the peripheral, then the current data

5


2. Technical background

of the specified register will be sent back through MISO after successfully reading the
address.

CS

CLK

MOSI rw 0/1 adr0 adr1 D0 D1 Dn

MISO D0 D1 Dn

Figure 2.2: SPI timing diagram. CPOL = 0, CPHA = 0.

2.4 AMBA shared bus
Advanced microcontroller bus architecture (AMBA) is an on-chip interconnect spec-
ification developed by ARM, used for the management of SoC functional block such
as controllers and peripherals [13].

2.4.1 AMBA AHB
Advanced high-performance bus (AHB) is essential in SoC systems. It is designed for
high performance synthesizable designs. Designs such as internal memory devices,
external memory interfaces and high-bandwidth peripherals are common examples
of designs that utilize the AHB protocol [14].

2.4.2 AMBA APB
Advanced peripheral bus (APB), is another protocol which as the name suggest, is
a standard used to handle lower-bandwidth peripheral devices on an SoC. Further-
more, this standard is less complex, also, optimized for minimal power consumption
compared to AMBA AHB [15].

2.5 GRLIB
GRLIB is a complete IP library containing different reusable VHDL design environ-
ments targeting ASICs and different FPGA platforms [9]. Below is a list of the four
categories inside GRLIB.

• Processors: LEON SPARC, NOEL-V RISC-V
• Memory Controllers: DDR2, DDR3, SDRAM, SRAM, NANDflash, QSPI
• Interfaces: SpaceWire, SpaceFibre and WizardLink controller, 32-bit PCI

bridge, 10/100/1000 Mbit Ethernet MAC, USB 2.0 host and device controllers,
CAN, SPI, I2C, UART

• SoC infrastructure: AMBA AHB and APB controllers, AMBA bridges

The idea of having an IP core library with several smaller blocks would be to flexibly
edit and create different SoCs for different applications as can be seen in Figure 2.3.

6


2. Technical background

Figure 2.3: Block diagram of the GR-XC3S-1500 FPGA development board [9].

2.6 JESD204 high-speed interface
To keep up with the newest applications, manufacturers of data converters have
been forced to continuously increase the sampling rates of their products. Addition-
ally, multiple converter channels are typically used in parallel in order to further
increase bandwidth. As a result, existing converter interface methods, such as the
Low Voltage Differential Signaling (LVDS) were under considerable pressure to deal
with the high bandwidth demands. The transmission speed of LVDS is limited to
around 1 Gbps per serial lane, necessitating the use of multiple lanes to achieve high
throughput. Multiple converter channels quickly add up the number of required pins
on the transmitter and receiver devices, increasing the overall area requirement and
complicating PCB routing. With these things in mind, the JEDEC Solid State
Technology Association decided 2006 to create the JESD204 standard, which is a a
high-speed serial interface standard. Nowadays, JESD204 is a well-known standard,
that is widely used across telecommunications, aerospace, and industrial automation
when interfacing with converters [7]. Since its launch, the JESD204 standard has
seen multiple revisions, shown in Figure 2.4.

Revision JESD204 (2006) JESD204A (2008) JESD204B (2011) JESD204C (2017)

Max. serial bit rate 3.125 Gbps 3.125 Gbps 12.5 Gbps 32 Gbps

Lanes / links Single Multiple Multiple Multiple

Deterministic
latency

No No YES YES

Encoding scheme 8B/10B 8B/10B 8B/10B
8B/10B, 64b/66B,
64B/80B

Subclasses No No Subclass 0, 1, 2 Subclass 0, 1, 2

Figure 2.4: JESD204 revisions.

7


2. Technical background

Revision A and B
The initial version of JESD204 supports speeds up to 3.125 Gbps [7]. In 2008,
revision A of the standard was done in order to support interface with multiple
converter channels. Four years later, the standard was upgraded again, to version
JESD204B. This revision is divided into subclass 0, 1 and 2, each of which supports
an upgraded transfer rate of 12.5 Gbps [16].

Revision C
Another modification to the standard, JESD204C, was launched in 2017. JESD204C
offers an even greater bandwidth, up to 32 Gbps. Another focus point of revision
C is the improved link resilience, which is made possible by more advanced error
detection mechanisms. Also, revision C introduces support for a new and more
efficient 64b/66b encoding scheme, compared to the 8b/10b encoding that is used
in JESD204B [17].

2.6.1 JESD204B overview

Device clock

In a JESD204B system, the device clock is the main timing reference for each layer
within a JESD204B transmitter or receiver. Device clocks of different converters and
logic devices do not have to be identical, however, they need to be phase aligned
[18].

Local multiframe clock (LMFC)

Packets are transmitted over a JESD204B link in multiframes. Multiframes con-
sist of frames, which carries sample data, link configuration data and alignment
characters. To synchronize the transmitter and receiver in relation to a multiframe,
the JESD204B uses a LMFC. The LMFC acts as a low frequency cross-device timing
reference, allowing for the link to have a deterministic latency (discussed in subsec-
tion 2.6.2) [18].

SYSREF

SYSREF is an externally applied signal that is active high, and is used in subclass
1 (discussed in subsection 2.6.3) to align the LMFC of all transmitter and receiver
devices. SYSREF could be a one-shot, periodic, or gapped periodic signal [18].

SYNC∼

SYNC∼ is an active low signal used to establish the initial synchronization between
a transmitter and receiver. Moreover, it is used in subclass 2 (discussed in subsec-
tion 2.6.3) to fulfil deterministic latency. The clocks that generate SYNC∼ should be
phase-locked to the LMFC of the receiver device [18].

8


2. Technical background

2.6.2 Deterministic latency
Link latency is typically measured in frame clock or device clock periods, and
defined in a JESD204 link as the time difference between when a converter sample
enters the serial transmitter and when the same sample is output from the serial
receiver. When setting up a JESD204 link, it may be required that the link latency
is deterministic, meaning that one can reliably determine when the next sample will
arrive. The JESD204 standard allows for various ways to achieve deterministic link
latency. Other than the link latency, the overall latency of the system would also
be dependent on the core latency of the converter device [18].

Elastic buffer

A key module for achieving deterministic latency across all lanes in JESD204 systems
is the elastic buffer. Factors such as unequal lengths in the transmission medium or
cross-talk, could cause skew in the arrival of the lanes at the receiver. The purpose of
an elastic buffer is to align all lanes, so that they can be output from the receiver at
the same point in time. Instead of releasing all lanes at the next boundary of LMFC,
the release of the lanes would be delayed by the release buffer delay RBD, specified
by the implementer, to allow for the slower lanes to catch up [18]. An illustration
of the elastic buffer is depicted in Figure 2.5.

CGS

TX

RX

R D D A R D D A R D D A

LMFC

LMFC

Elastic
Buffer

CGS R D D A R D D A R D D A

CGS R D D A R D D A R D D A

L
Lanes

CGS R D D A R D D A R D D A

Multiframe

Lanes
Aligned

Earliest
Arrival

Latest
Arrival

RBD All Lanes
Released

Figure 2.5: Timing diagram illustrating the elastic buffer.

9


2. Technical background

2.6.3 JESD204B subclasses
Subclass 0

Subclass 0 supports backwards compatibility with JESD204A. No additional signals
are introduced in subclass 0, which makes it the relatively simple to implement,
compared to the other subclasses [7].

Subclass 1

Out of the three classes, subclass 1 offers the most stable performance at very high
device clock rates. Subclass 1 is not backwards compatible with JESD204A. It does
however offer support for deterministic latency. To achieve deterministic latency,
the subclass uses an external SYSREF signal. The purpose of SYSREF is to align the
LMFC of the transmitter devices and the receiver device [18].

Subclass 2

Subclass 2 works well with device clock rates up to 500 MHz. The class does
not support compatibility with JESD204A. Similar to subclass 1, subclass 2 also
allows for deterministic latency. Subclass 2 however, uses the SYNC∼ signal for LMFC
alignment. In contrast to the external SYSREF signal, SYNC∼ is already used for code
group synchronization (discussed in subsection 2.7.4) and frame synchronization,
meaning that fewer pins are needed on the transmitter and receiver [18].

2.7 JESD204B layers
The JESD204B is a standard that is built using several layers, each designed with
distinct functionalities. The layers remain the same whether implemented in a
receiver or in a transmitter, differing only in executing the opposite operations. The
layers used in a JESD204 are as follows: Physical layer, transport layer, data link
layer and an application layer that is unique for every implementation. The layers
within the JESD204 standard follow a similar layout to the OSI-model, described
in [19].

2.7.1 Physical layer
The physical layer’s primary role is to interact with the other components outside
the JESD204 standard. It is where the data is serialized/deserialized using (SerDes)
at line rate speed. For a transmitter, the physical layer takes the parallel data frame
and sends it as a serial stream using a serializer. The serial output is sent at Gbps
rates. For the receiver part, the physical layer captures the data and uses it to recover
the serial clock, to later deserialize data from Gbps to Mbps for it to be handled by
an FPGA. The deserialized data in the receiver becomes parallel and spread across
a number of output pins [20]. Additionally, a SerDes contains advanced yet crucial
phased locked loop (PLL) and clock data recovery (CDR) circuitry. CDR enables
clock recovery from the incoming data, ensuring proper synchronization and jitter

10


2. Technical background

reduction. Meanwhile, the PLL is used for high-frequency clock used for serialization,
ensuring phase alignment. In Figure 2.6, a block diagram of the physical layer of
the JESD204 standard is presented.

Serializer

P
er

 fr
am

e 
la

ne
 d

at
a

(p
ar

al
le

l)

FIFO

PLL-based
clock

generator

Mbps

Gbps

 
Transmitter
Tx

Receiver
Rx

Clock data
recovery

De-
serilaizer

FIFO

P
ar

al
le

l o
ut

pu
t

Gbps

Mbps

Interface layer
(IF)

Figure 2.6: Block diagram of a serializer and deserializer for the physical layer.

2.7.2 Transport layer
The transport layer is the layer responsible for data mapping in the JESD204 stan-
dard. Data gets reframed and prepared before being sent from transmitter to re-
ceiver. All control bits and tail bits are added to the data stream. They are later
split into octets and lanes. It is important to note that a transport layer delivers
the samples to the link layer in a JESD204 transmitter. The transport layer in the
receiver does the exact opposite. It cleans the received data from all control bits
and tail bits to get the original content of the data stream [17]. Below is a list of
parameters needed for a reader to understand the transport layer functionality.

• M = Number of converters
• N = Converter resolution
• S = Sample per frame
• FC = Sampling frequency
• CS = Control bits per sample per lane
• N ′ = Number of bits per sample in user data format
• L = Number of Lanes
• F = Number of octets "8 bits" per frame
• K = Frames per multiframe

Designers may want to configure the transmitter and receiver setup to match their
implementation requirements. Figure 2.7 shows what parameters need to be defined
for a transport layer to reframe data.

11


2. Technical background

Receiver

Transmitter

L
M
F
N'
N
K

CS

User
configuration

ADC sample
converter 0

Converter M

..

.. [N : 0] [N : 0] + CS [N : 0] + CS + Tail [Octet 0]
..
F

To
 A

D
C

Data reframing

[N : 0] [N : 0] - CS [N : 0] - CS - Tail [Octet 0]
..
F

Data deframing

Octet Octet Octet Octet
Lane 1

Octet Octet Octet Octet

..
Lane L

Octet Octet Octet Octet
Lane 2

Octet Octet Octet Octet
Lane 1

Octet Octet Octet Octet

..
Lane L

Octet Octet Octet Octet
Lane 2

To
 F

PG
A

Data sample to
application layer

Figure 2.7: Design of a transport layer in Rx (FPGA) and Tx (ADC).

Some of the parameters are hardware dependent, for instance FPGA maximum
clock, or number of converters in the ADC as well as the number of bits in the
ADC. However, by using a high-speed serial link with SerDes, data can be sent split
into multilane links. This is specially handy for GSPS ADCs. Therefore, designers
can increase the data rate and use multiple lanes to catch up with the high speed
of the link. Hence, the number of lanes and octets needs to be calculated for each
JESD204 implementation. Equations 2.1 and 2.2 which can be found in [21] show
how to calculate the number of needed lanes L.

δ = M · N ′ · S · 1.25 · FC (2.1)

L = δ

LaneRate
(2.2)

From this point the number of octets per frame F can be calculated. To determine
this parameter, equation 2.3 can be used.

F = M · S · N ′

8 · L
(2.3)

Taking all these parameters into account, transmitter user data can be formatted in
different ways. Multiple lanes, line, oversampling or the number of octets can affect
how data formation looks like. Generally, one device with M converters produces
N bits per sample. A set of samples contains F octets. Each octet is transmitted
as a group of N ′ bits, where N ′ is N bits plus any additional tail bits T and control

12


2. Technical background

bits CS. The numbering starts in order with converter 0, followed by converter 1
up to converter M -1 as can be seen in Figure 2.8. A new word is created for each
converter, containing data identical to the corresponding. Control bits and tail bits
are yet to be added. At the next stage, additional bits are added to fill up empty
slots when forming the octets.

Converter Device MxN bits

Converter 0 Converter
M-1

Sample 0 Sample S-1 Sample 0 Sample S-1

Octet 0 Octet f-1 Octet 0 Octet f-1

Lane 0 Lane j-1

Control bits appended to each sample

Word 0 Word 1 Word 0 Word 1

Figure 2.8: User data format for general use. This can be used with values as low
as L = 1, S = 1 and M = 1, inspired from the JESD204C standard [17].

The figure shown above describes the structure of the transport layer, which is
applicable for any type of JESD204 implementation. For example, a device with M
= 4, N = 12, S = 1, L = 1 can be used, as illustrated in Figure 2.9. Data coming
from the four converters is mapped into one single lane, following the approach
previously seen in Figure 2.8. Note that the upper 8 bits of the data are getting
mapped to the first octet, and the last 4 bits are mapped to the second octet. The
second octet of each converter is eventually filled up with one control bit and three
tail bits.

Cr0[11-4]
Cr0[3-0]

C
TTT

Cr1[11-4]
Cr1[3-0]

C
TTT

Cr2[11-4]
Cr2[3-0]

C
TTT

Cr3[11-4]
Cr3[3-0]

C
TTT

F = 8 0ctets

Converter 0 Converter 1 Converter 2 Converter 3

Figure 2.9: Grouping control bits and tail bits to octets and converters.

13


2. Technical background

2.7.3 Scrambling

Scrambling is an optional block in the JESD204 standard. It is mainly used to
avoid spectral peaks that would occur by specific data patterns that may affect the
performance of the serial link [18]. Spectral peaks may affect the link by means of
EMI in sensitive applications, DC balance and signal integrity.

Scrambling is issued deterministically. The standard implies that the scrambling
polynomial shall be:

1 + x14 + x15

Introducing scrambling means that Hex values for alignment characters will be dif-
ferent depending on if scrambling is used or not.

2.7.4 Data link layer

Data link layer is a crucial layer where synchronization, data encoding, error detec-
tion and general flow control happens in the standard. It can easiest be described
as the controller due to its variety of tasks.

8b/10b encoding

Dealing with serial high-speed signals introduce some noteworthy challenges com-
parably to traditional serial interfaces such as SPI, UART or I2C. One of these
challenges is the DC biasing, which leads to signal integrity and metastability issues
[22]. This occurs due to the count imbalance between 1’s and 0’s of the data sent
across the link. By having this imbalance, the cable charges up if there are many
consecutive ones in the data stream, leading to uncertainty in reading an incoming
zero signal.

Consequently, the goal of ensuring a relatively constant average voltage is required
for signal integrity. Encoders and decoders can be used to accomplish a balanced
data stream. By following a look up table (LUT), the encoder maps all possible
outcomes of the data stream into corresponding encoded values. Some input values
have two valid codes. To determine which one to use, a running disparity (RD)
checker between ones and zeroes is used as a signal in the algorithm. This signal
acts as an input to the next code encoding to reverse disparity and maintain an
overall DC balance if needed [22].

8b/10b encoding is used in the JESD204 standard to encode data before being trans-
mitted. As the name suggests, it encodes an 8-bit value into 10-bits. In Figure 2.10,
we illustrate a typical implementation of the 8b/10b encoding algorithm.

14


2. Technical background

Slicer
block

Input {7:0}

5b/6b
LUT

3b/4b
LUT

K {7:0}
Running
Disparity
Controller

3b/4b
LUT

Input{7:3}

Input{2:0}

Disparity in

Encoder/
Decoder
controller

Output {9:0}

data[MSB] != RD data[MSB] == RD

check RD

If nr_zeroes[data] != nr_ones[data]

Flip all bits
to

maintain
DC

balance

Keep the
encoded

value from
LUT

Figure 2.10: Block design of the 8b/10b encoding.

While 8b/10b encoding introduces 25% overhead in terms of additional bits, the
benefits still outweigh the drawback. An achieved DC balance, and therefore re-
duced electromagnetic interference (EMI), as well as introduced error detection and
correction mechanisms for increased reliability and integrity, are the benefits of this
encoding algorithm [22]. The 8-bit input is spitted into two encoders, a 3b/4b en-
coder as well as a 5b/6b encoder, that gets merged back again at the end with LSB
first, see Figure. 2.11 for clarification.

H G F E D C B A

LSBMSB

j h g f e i d c b a

5b/6b encoding3b/4b encoding

Figure 2.11: 8b/10b conversion.

The encoded output has three states. It can either have disparity zero, meaning

15


2. Technical background

equal number of 1’s and 0’s. A positive disparity indicates more ones than zeroes,
and vice versa to obtain negative disparity. In Table 2.2, a few examples of each
encoding LUT is found. Note that the RD signal is the determining factor for inputs
with two valid codes.

Table 2.2: Encoding table examples for 5b/6b on the left, and 3b/4b on the right.

input RD= -1 RD = +1
code EDCBA abcdei
D.01 00001 011101 100010
D.03 00011 110001
D.28 11100 001110

Input RD = -1 RD = +1
code HGF fghj
D.01 001 1001
D.03 011 1100 0011
D.06 110 0110

As part of the 256 possible data characters, the 8b/10b encoding reserve some char-
acters as so called special comma characters, that are used for control (Kx.y) . Here
the x represents the 5-bit group, and the y value corresponds to the 3-bit group.
Table 2.3 describes the important comma characters used in the JESD204 standard.

Table 2.3: Comma characters for special control.

/K/ k28.5 code group synchronization
/R/ K28.0 Start of multiframe
/Q/ K28.4 Start of configuration data
/A/ K28.3 End of multiframe
/F/ K28.7 Frame alignment synchronization

As described before, error detection and correction comes automatically when using
the 8b/10b encoding algorithm. Using the control characters and having the ability
to check the LUTs for an expected encoded value, designers have the ability for
increased reliability and check for

• Disparity errors.
• Not in table code errors.
• Wrong control character/control character in wrong position.
• Code group synchronization error.

Code group synchronization (CGS)

Code Group Synchronization (CGS) is a crucial operation in the JESD204 standard.
When a JESD204 receiver is ready to collect data, it issues a synchronization re-
quest by asserting the SYNC∼ signal to the transmitter. Immediately upon reading
the SYNC∼ signal from the receiver, the transmitter transmits a number of /K/ char-
acters through the data stream. When the receiver has read the required amount
of /K/ characters, SYNC∼ gets deasserted again, indicating a synchronized transmit-
ter and receiver. Figure 2.12 visualizes the CGS process written in the JESD204B
standard. [18]

16


2. Technical background

sync_request

synchronizedwait for next LMFC

lane/frame
synchronization

RxTx

Send initial lane
alignment

or user data

/K28.5/

DATA

SYNC~

SYNC~

DATA

Figure 2.12: State machine of CGS.

Initial lane alignment (ILA)

The CGS and the initial frame synchronization (discussed in section 4.2) are suc-
ceeded by the ILA. The purpose of ILA is to synchronize the lanes of a multi-lane
link. The format of the ILA is shown in Figure 2.13. The length of the ILA is always
four multiframes long, where the second multiframe will include link configuration
data [18].

R D D A R Q C D D A

Multiframe

K K K K C R D D A R D D AD D A A AR RD D D D

Figure 2.13: Initial Lane Alignment.

Frame alignment and correction

Apart from the previously discussed /K/ characters, there are also various alignment
characters that are inserted into the serial data stream by the transmitter under
certain conditions [18]. The /R/ character is used to mark the beginning of a multi-
frame, and /A/ is used to mark the end of a multiframe. /A/ and /R/ characters are
always embedded in the ILA, to align the multiframes. A new frame starts once
every F octets. The /A/ character is also inserted into the serial data stream by the
transmitter, if the last octet of the last frame in a data multiframe, is equal to the
last octet of the previous frame. In this case, the /A/ character would replace the
octet in the last frame of the multiframe. This would happen even if the last octet
of the previous frame was an /A/ character as specified in the standard [18].

According to the same standard, the /Q/ character is used exclusively during the ILA,
and will mark the beginning of the receiver parameter data. The receiver parameter

17


2. Technical background

data is always placed in the second multiframe of the ILA.

Additionally, there is an /F/ character. This character is used similarly to the /A/
character, however, only on the last octet of the current frame, which is not the last
frame of the multiframe. Another key difference from the /A/ character is that for
the /F/ character, if the last octet from the previous frame was an /F/ character, the
octet replacement would not be performed.

2.7.5 Application layer
The application layer is the highest level of abstraction in the JESD204 protocol. It
is the layer responsible for further processing the received data stream from previous
JESD204 layers [18]. In the case of this master thesis, the application layer consists
of the data reduction algorithm, where reduced data size is the goal before further
processing or storage.

2.8 Data Reduction
Data reduction is the process of reducing the amount of information to make it easier
to store in terms of physical space. Technology in the 1800s had yet to be developed
to transmit human voice over long distances. Engineers could nevertheless transmit
short electric signals through wires. Morse code was one of first methods used for
long-distance communication, using an encoding scheme to encode letters into dashes
and dots. Morse code is viewed as one of the earliest communication methods, which
uses a form of data compression [23]. This allowed telegraph operators to compress
human language to binary format and restore them to full format at the other
station.

In the current age of computers, data compression is used increasingly, and can be
found on most of the files that can be found on modern computers [23]. JPEG,
PNG, GIF, MP3 and MP4 are examples of file formats commonly used on a daily
basis, that apply various data compression algorithms.

Data compression algorithms like the above mentioned are divided into two primary
categories: lossy and lossless. While some experts in the computer science filed
indicate that it is mathematically impossible to have a totally lossless algorithm,
lossless algorithms show no noticeable change in the data over time [23]. Lossless
algorithms transform the input data into a more compact representation in a differ-
ent format, ensuring efficient storage. PDF, SVG, ZIP and GZIP are examples of
lossless compression algorithms using techniques from Lempel-Ziv and Huffman to
primary reduce redundancy on the input data [23] [24].

Lossy compression is the process of reducing input size by permanently removing
some information. The size of the output file produced by such algorithms is signifi-
cantly compact. However, data quality and purity may be affected and downgraded
over time. MP3, MP4 and JPEG are examples of lossy algorithms. Such algorithms
can be used on the hardware side if area and power dissipation are two major con-
siderations.

18


2. Technical background

2.8.1 Run-length encoding
Run-length encoding (RLE) is a type of lossless data compression in which data
with redundant occurrences are stored as a single data value and count, rather than
its original sequence. This algorithm is most useful when data is repeated inside the
data stream, hence producing high compression ratio [24]. The compression ratio
for varying input data should therefore be minimal.

RLE can be used in various applications, like image and video compression. It can
be used to minimize the size of the incoming image matrix. The algorithm would
read the binary pixels of each row and group them if a sequence of redundant values
is located. Therefore, RLE is a well suited for hardware implementation due to the
fact that it doesn’t require complex logic or large storage. Only a number of logic
gates and flip flops to perform the counting sequence are required. Figure 2.14 shows
how RLE can be implemented to compress binary images.

Binary Image

0 1 1 1 1 1 0

RLE encoder

0 5 1 0

Row reader

Figure 2.14: Block diagram of RLE algorithm used on binary image compression.

2.8.2 Moving average
While not traditionally known to be a data compression algorithm, moving average
is a statistical model that is used to calculate the average of a fixed-sized window
of samples. Therefore, moving average can be seen as a lossy data compression
algorithm. In digital signal processing, moving average is often used to smoothen
out noise or fluctuations in time series data [25].

The implementation of moving average on hardware can be very cheap depending
on the desired accuracy. For low accuracy applications, a shift register with a small
window size and the simple arithmetic operation as seen in equation 2.4 are required.
If higher accuracy is needed then a larger shift register is required, which is costly
on hardware in terms of area, due to the linear scaling.

y[n] = 1
N

N−1∑
i=0

X[n − i] (2.4)

In the moving average formula, y[n] represents the moving average at n-th point in
time. The input sample is symbolized by x[n], and N represents the sample window.

19


2. Technical background

20


3
Methods

The method to be used in this project are divided into the following six sections:
literature study, define data acquisition use case, designing the receiver, creation of
a subsystem, verification through simulation, and, finally hardware verification.

3.1 Literature study
In order to set the specification requirements of a JESD204 receiver, literature study
of the different versions provided by JEDEC is required. Such study looks into the
evolution and the differences between the various JESD204 versions. The main goal
behind this study is for us to gain knowledge about the pros and cons of each version
implementation, to later be developed on a Gaisler’s platform.

Another source of valuable information can be found by studying academic pa-
pers, industry publications and official documentations of hardware supporting the
JESD204 standard.

3.2 Define data acquisition use case
To be able to understand the needs of a JESD204 receiver, we need to engage with
the company to get insight of their specific requirements and use cases for a JESD204
implementation is needed. This can also include discussions with customers and the
need to get external data for simulation of the receiver, as well as obtaining a sensor
for reading real input data to test on hardware implementation.

3.3 Designing the receiver
Designing the receiver will start by evaluating the above mentioned JESD204 ver-
sions, and selecting the most suitable version based on the company’s use case.
Thereafter, we should define the specification requirements of the selected standard,
by the means of data rate, number of converters, lane configurations, clocking alter-
natives and protocol features.

21


3. Methods

RTL coding guidelines
All RTL modules designed within this project follow the ”two-process method”,
advocated by Jiri Gaisler [26]. The two-process method is appropriate for any single-
clock design, where the aim of the method is to improve readability of complex
designs. Additionally, simulation time is improved by optimizing the use of variables
over signals within VHDL processes, as variable assignments are significantly faster
than signal assignments [26].

Designs following the two-process method use only two processes per entity. The
purpose of using two processes is to separate the logic from the registers. One
process is fully combinational, and contains all the non-synchronous logic, and the
other process is clocked, and contains all registers. Signals and ports are declared
using record types, which simplifies maintainability. A block schematic of the two-
process method is shown is Figure 3.1.

comb. proc.

seq. proc.

r = rin

Q = fq(D,r)

rin = fr(D,r)

Q

rin

clk

D

r

Figure 3.1: Jiri Gaisler’s two process method.

3.4 Creation of a subsystem
GRLIB is a large set of IP-cores. Some IP-cores require larger hardware utilization,
and therefore, make the EDA tools run significantly slower. This means that devel-
oping the receiver on a complete SoC system that contains blocks such as LEON5
processor, Ethernet and the Xilinx memory interface, might not be an efficient ap-
proach. Instead, this project might be its own minimal SoC that runs on the same
AMBA shared bus, but with a reduced number of IP blocks.

3.5 Verification through simulation
The company uses VHDL exclusively for RTL writing. Simulation of the RTL code is
done using Mentor modelsim 10.6a, making it easier to spot typing and behavioral
errors at early stages of the design. Due to the large folder hierarchy of a SoC,
makefiles are used to compile and verify different parts of the system.

22


3. Methods

3.6 Hardware verification
The designed subsystem is to be synthesized and implemented on an FPGA using
Xilinx Vivado 2019.2, to later be connected to an ADC to read incoming data. A
receiver’s foundational part is the physical layer, which serves as a port for incoming
data. It is worth planning a robust infrastructure on how verification of the hardware
needs to be carried out. Oscilloscope and logic analyzers can be used to measure
the incoming clocks. Additionally, a signal generator might come in handy to give
stimuli inputs to the ADC.

GRMON

Finally, there is Gaisler Research monitor (GRMON), which is a Gaisler-made debug
environment used for SoC designed based on the GRLIB IP library [27]. GRMON
facilitates read and write accesses to internal registers and memory blocks of IP-
cores, which are interconnected via the APB and AHB bus architectures. GRMON
can be used via a tcl environment or using a GUI, with supported debug interfaces
via JTAG, UART and Ethernet.

23


3. Methods

24


4
Design and Implementation

In this chapter, the system is introduced in detail. We will discuss the design
decisions made for the receiver and the data reduction algorithm. Furthermore, we
describe the hardware components used in the project. Finally, specifications of the
key parameters in the system are presented. Figure 4.1 shows a simple overview of
the design. Employing a separate subsystem for the receiver IP development offers
many advantages over working from a complete SoC. As stated previously in the
method chapter, some IP-cores are large, and extend the synthesis time of the EDA
tools. Additionally, creating the receiver inside a subsystem minimizes the likelihood
of introducing unexpected errors associated with integrating several blocks into one
design.

ADC

JESD204 Rx

Deserializer

AMBA
APB/AHB

Bridge

AMBA AHB

AHB

SPI ADC
Controller Debug

Support Unit

APB

Memory
Controller

JTAG Debug
Link

GRMON

High-Speed
 Serial Data

Configuration

JESD204 Tx

Serializer

SoC sybststem

SPI Interface

Figure 4.1: Simple overview of the designed subsystem on FPGA.

25


4. Design and Implementation

4.1 Design overview
In this section, a more sophisticated overview of the JESD204 standard and hardware
selection is mentioned. We will examine each component of the minimally designed
SoC, referred to as the subsystem, in greater detail.

4.1.1 Choice of the JESD204 version
After studying the different revisions of the JESD204 protocol and the infrastructure
of GRLIB, we chose revision JESD204B. One reason for that is the similarity with
the existing high-speed serial link IP provided by Gaisler, called GRHSSL [28]. Addi-
tionally, the selected version had to be compatible with the JESD204 transmitter of
the ADC, so the receiver IP can be verified together with a third party transmitter.
Revision C supports faster lane rates and uses the more effective 64b/66b encoding
compared to the 8b/10b encoding in revision B, which introduces 25% overhead.
However, SpaceFibre inside GRHSSL uses 8b/10b encoding to maintain running
disparity, and since it is the default decoding scheme for revision B, an encoding
scheme does not have to be implemented from scratch. Additionally, revision B has
been around for longer than revision C, meaning that documentation and previous
implementations are more accessible for B, compared to C. This is also visible in
the market for data converters, where a clear majority applies JESD204B [29]. Ad-
vantages for B over A includes higher data rates and the support for deterministic
latency.

4.1.2 Hardware selection and setup
A very important factor when selecting the hardware, was to acquire compatible
hardware with reasonable delivery time. The FPGA used within this project is the
Kintex UltraScale XCKU040 on the KCU105 board, from Xilinx [30]. This was a
straightforward choice for us because of two reasons: the availability of that card at
the company’s inventory, but most essentially the GRHSSL IP was developed and
implemented on that same FPGA, which has been previously verified to achieve a
successful SerDes communication up to 6.25 Gbps.

The ADC evaluation board selected is the ADC32J44EVM from Texas Instruments
[31]. The ADC sample rate needed to be sufficient for the onboard serializer to
be able to generate a serial lane rate of 3.2 Gbps. The system designed includes
two converters, indicating the use of a dual port configuration. This setup allows
for a larger set of debugging opportunities, but also enables further development
using multi-lanes and oversampling. A critical factor is the generation of the data.
The evaluation board has test patterns stored on the memory chip, which can be
routed to the JESD204B transmitter on the chip, eliminating the need to insert data
through the analog channels.

In Figure 4.2, an overview of the hardware setup is presented. Note that the data
stream is moving at speeds of Gbps, meaning that some verification infrastructure
needs to be developed, in order to verify that ADC data is being fetched on the

26


4. Design and Implementation

FPGA’s physical layer (SerDes). The design methodology seen in the figure below,
shows a FIFO module, underneath the SerDes. This FIFO stores the incoming high
speed data and provides an AMBA APB interface, which allows us to monitor the
data via the GRMON. The main purpose of the so called APBFIFO is to act as a
buffer between the slower read-rate of the FPGA compared to the faster write-rate
of the SerDes data. Being able to read actual ADC data via the APBFIFO , indicates
a functional JESD204B physical layer.

2.
5 

G
B

P
S

capability
reg

Data
reg

GRMON
PC

monitor
apbfifo.vhdl

Const
"DEADBEEF"

SerDes

JESD204B Tx

JESD204B
Rx

LMK04828 
ADC clock

SerDes ref clockC
or

e 
cl

oc
k

Data in

ADC32J44EVM

UltraScale KCU105 FPGA

In1

In2

Memory: Test
patterns

SPI interface

Figure 4.2: Overview of hardware setup.

4.2 JESD204B receiver operation
As previously discussed, the trigger for link synchronization is the SYNC∼ assertion.
According to the standard [18], the CGS on the receiver side is divided into three
states, shown in Figure 4.3. In the initialization state, the receiver checks for the
recognition of four /K/ characters. The receiver would then enter the check state,
where it has to detect four new 8b/10b characters, before it enters normal operation
mode. If the receiver detects an illegal character, it will remain in the check state
until four valid characters are detected. In the case of identifying three invalid char-
acters, the receiver has to re-assert the SYNC∼ signal to re-initialize synchronization.

Once SYNC∼ is deasserted, the initial frame synchronization will align the edge
of the frame with the next non /K/ character. The process of the initial frame
synchronization is presented in Figure 4.4. Inside the FS_DATA state, after it
detects the ILA sequence, the receiver performs alignment checking on the incoming

27


4. Design and Implementation

multiframes, as explained in subsection 2.7.4. In the event of a synchronization error,
a SYNC∼ assertion or the reception of four consecutive /K/ characters, the receiver
will return to CS_INIT and FS_INIT, and launch a new CGS.

CS_INIT

INVALID

Vcounter == 4

CS_CHECK
VALIDCS_DATA

Kcounter == 4
Icouter = 0;
Vcounter = 0;

Kcounter < 4

Icounter == 3

Vcounter < 4 & Icounter < 3

reset

Icounter = 0;
Vcounter = 0;
sync_request = 1;
if (K_received & VALID) then

Kcounter ++;
else

Kcounter = 0;
end if;

sync_request = 0;
Kcounter = 0;
if INVALID then

Icounter ++;
Vcounter = 0;
else if VALID then

Vcounter ++;
end if;

Figure 4.3: Code Group Synchronization FSM.

FS_INIT

FS_CHECK FS_DATA

reset

Ocounter = 0;

sync_request | K_received

Kcounter == 4

! (sync_request | K_received)

Kcounter = 0;
Ocounter = Ocounter + 1 mod F;
CHECK_ALIGHMENT

K_received & Kcounter < 4

K_received

! K_received

! K_received

Kcounter ++;
Ocounter = Ocounter + 1 mod F;
CHECK_ALIGNMENT

Figure 4.4: Initial frame synchronization FSM.

28


4. Design and Implementation

4.3 JESD204B receiver design overview
Figure 4.5 shows an overview of the JESD204B receiver design. As previously ex-
plained, the receiver is divided into three layers. The first layer deserializes the
high-speed data, and routes the data in parallel at a lower rate to the link layer.
The link layer checks for alignment characters, and reports back to the transmitter
via the SYNC∼ interface, if any errors are detected. The SYNC∼ interface is also used
to synchronize the transmitter with the receiver, and allows for deterministic latency.
Valid data is then mapped from the link layer to the transport layer, where control
bits and tail bits are extracted from the octets, and the original sample data is then
ready for use in an application. Scrambler is not enabled in this project. The reason
for this is that the selected ADC board operates with a JESD204B throughput at
lower bit rates compared to the maximum bit rate specified in the standard. This
means that spectral peaks occurring due to higher frequency are not an issue, that
otherwise must be handled by a scrambler.

High speed
 serial data

Deserilizer

Physical layer

8b/10b
decoding

Frame/lane
alignment

Data link layer

Sync request

Data deframing
and cleaning

Transport layer

Descrambler
(optional)

Application layer

Data reduction
algorithm

SYNC~

Figure 4.5: Layers of the JESD204 receiver.

4.3.1 Link parameter configuration
The standard supports multi-lane and multi-sample setup. For simplicity, we will
proceed with one lane and one sample per frame cycle. The selected JESD204B
ADC family supports two link configuration setups, presented in Table 4.1.

Table 4.1: Link parameters and interface rates.

L M F S
Min. ADC

sampling rate
(MSPS)

Min. SerDes
frequency
(Mbps)

Max. ADC
sampling rate

(MSPS)

Max. SerDes
frequency

(Gbps)

Serialization
factor

2 2 2 1 16 300 160 3.2 20X
1 2 4 1 10 400 80 3.2 40X

Given that the model chosen for the above mentioned ADC family provides a max-
imum sampling frequency of 125 MSPS, this results in a maximum sampling rate
of 125 and 62.5 respectively for the two different configurations. The second setup

29


4. Design and Implementation

from Table 4.1 was selected, yielding a serial lane rate of 2.5 Gbps, according to
equations 2.1 and 2.2. Below, a calculation of the expected lane rate is presented.

2 · 16 · 1 · 1.25 · 62.5 · 106

1
= 2.5 Gbps (4.1)

Octets per frame (F) is calculated using Equation 2.3, which resulted in 4 octets
per frame, to be able to uphold a lane rate of 2.5 Gbps, for our link. Calculation is
shown below.

2 · 1 · 16
8 · 1

= 4 octets/frame (4.2)

A lane rate of 2.5 Gbps is below the limit of 6.25, which is the maximum frequency
that the SpaceFibre has been tested with on the UltraScale FPGA. Deterministic
latency between transmitter and receiver is to be established via the SYNC∼ signal,
following subclass 2 of the standard.

4.3.2 High-speed SerDes IP
A proper physical layer is essential to be able to establish efficient data transmission
between the FPGA and the ADC. Dealing with high-speed data introduces jitter,
synchronization and phase alignment issues, which affects signal integrity. Therefore,
PLL and CDR circuits are used in the SerDes architecture as previously mentioned.
The UltraScale FPGAs Transceiver IP from Xilinx, is an IP that utilizes the on-
chip transceiver on the intended FPGA used in this project [32]. The IP is quite
configurable to support various industry standards that require a physical layer.
However, having this large set of variation means that configuring and establishing
the SerDes involves some trial and error.

4.3.3 Design of link layer
Our implemented link layer module follows a state transition flow, similar to Figure
4.6. The purpose of the link layer is to pass the incoming octets to the transport layer,
and maintain link synchronization between the receiver device and the transmitter
device. The link layer consists of four states; idle, cgs, ila and data. In the cgs state,
the receiver will look for four consecutive /K/ characters before it enters ila. In ila,
the receiver will scan every multiframe for start and end characters, fetch possible
configuration data in the second multiframe, and proceed to the data state if all
multiframes have valid start and end characters. If the receiver at any point fails
to detect valid start or end characters, we issue a re-synchronization request, and
re-enter the idle state. Otherwise, after a successful reception of four multiframes
at correct positions, the receiver enters the data state, where it will remain for as
long as there are no alignment errors, and no re-synchronization requests.

30


4. Design and Implementation

idle

cgs

ila

data

synced=false

synced=false

synced=false

synced=false

kcounter=4

mcounter=4
fcounter=k
ocounter=f
din=/A/

cgs
if (kcounter = 4) then
  state = ila
  syncn = 1
  kcounter = 0
else if (din = K28.5) then
  syncn = 0
  kcounter += 1
else
  syncn = 0
  kcounter = 0
end if

data
if (wrong_position_count = 4) then
  state = ila
  synced = false
end if

if din = /A/ OR /F/ then
  if (LAST OCTET OF A FRAME)
    PERFORM CHARACTER REPLACEMENT
    wrong_position_counter = 0
  else
    wrong_position_counter += 1
  end if
end if

ila
if (mcounter = 1) then
  if !(first_octet = /K/ AND last_octet = /A/) then
    state = idle
    synced = false
  end if

else (if mcounter) = 2
  if !(first_octet = /K/ AND last_octet = /A/) then
    state = idle
    synced = false
  end if
  if (second_octet = /Q/) then
    FETCH CONFIGURATION BYTES
  end if

else (if mcounter = 3) then
  if !(first_octet = /K/ AND last_octet = /A/) then
    state = idle
    synced = false
  end if

else (if mcounter = 4) then
  if !(first_octet = /K/ AND last_octet = /A/) then
    state = idle
    synced = false
  else
    state = data
  end if
end if

Figure 4.6: Link layer FSM.

The 8b/10b encoding scheme is a module that is already implemented in the existing
GRHSSL. Therefore, we excluded the development of such decoder inside the link
layer of the JESD204B receiver. Instead, the link layer will only consist of the
controller that controls the flow of the receiver, as previously explained with the
state machines above.

4.3.4 Design of transport layer
Valid data is transmitted at octet rate from the link layer to the transport layer.
The role of the transport layer is to filter out control bits and tail bits from each
octet, and merge the data from F octets, in order to reconstruct the original 14-
bit samples, that were sampled by the data converter. Refer to Figure 2.9, for an
illustration of the octet structure.

The transport layer should route the valid data out to the application layer when two
conditions are fulfilled: data is valid from the link layer, and the data is de-framed
and merged successfully in the transport layer.

31


4. Design and Implementation

4.4 Testbench setup
Behavioral verification methods are required to be able to verify that the desired
output is obtained. It is a very important step when dealing with complex and large
systems that are integrated into other major blocks such as SoC systems. In order
to verify the designed JESD204B receiver, a VHDL testbench was implemented.
Since the receiver dictates the transmitter state transitions, a dummy ADC, which
is sensitive to the SYNC∼ signal was needed. This dummy ADC acts similarly to the
transmitter in the actual hardware.

The intended input to the testbench were 14-bits signals generated via python and
saved into a file. This input is then automatically fed into the testbench from an
external file. The purpose of this setup is to mimic a real world scenario with a large
set of data coming from a data converter. Following this approach, the data could
also come from a data dump from a physical sensor, for a more authentic simulation.

The receiver implementation is able to assert synchronization requests, but also
catch data from the dummy ADC. The outputted results of the JESD204B receiver
is saved into a log file that can be used for verification, comparing it to a golden
reference file. The log file can also be used into a high level application layer. A clock
generation module was provided into the testbench, to generate a SYNC∼ detection
clock, as well as a sample clock used for octet sampling. An overview of the designed
testbench environment is shown in Figure 4.7.

Testbench

JESD204B_top

Transport
layer

Data link
layer

Clock
generation

Dummy
ADC
(Tx)

sine
wave
file

8b

Sync

Figure 4.7: The testbench used to verify the behavioral RTL of the receiver.

4.5 Application layer
The intention of the data reduction algorithm is to be implemented on the FPGA
fabric of the upcoming SoCs. We have chosen two data reduction algorithms to be
investigated, keeping in mind how well they translate into hardware implementation,
especially in terms of area. RLE and moving average are two applications that
can be used as the receiver application layer, which are well-suited for hardware
implementation. They can be implemented using flip flops, shift registers and basic

32


4. Design and Implementation

arithmetic operations. The algorithms read the output from the JESD204B as input,
to later perform either RLE or moving average. In this investigation, the algorithms
in the application layer were implemented and studied in software only, due to time
limitation.

4.5.1 Run-length encoding

The key here is to find consecutive 1’s or 0’s then group them together. For that
a counter is used to keep track of the current and previous index in a given word.
Below is a pseudo code that describes the implemented RLE algorithm.

Algorithm 1 Run-Length Encoding (RLE) Algorithm
1: count = 1
2: currentChar = data[0]
3: for i from 1 to len(data) - 1 do
4: if data[i] = currentChar then
5: count = count + 1
6: else
7: if count > 1 then
8: append (count, currentChar) to compressedData
9: else

10: append currentChar to compressedData
11: end if
12: count++
13: currentChar = data[i]
14: end if
15: end for
16: if count > 1 then
17: append (count, currentChar) to compressedData
18: else
19: append currentChar to compressedData
20: end if

4.5.2 Moving average

The algorithm is built upon reading a sampling window, where the samples are
summed together. The average is then computed through the sample window. The
sampling window can be built in hardware through shift registers, meanwhile the
other operations can be accomplished by simple arithmetic in hardware. Figure 4.8
shows the implementation of moving average.

33


4. Design and Implementation

Register(n-1) Register(n-2) Register(n-M)
X(n)

+

x

1/M

Y(n)

Figure 4.8: Block diagram of moving average algorithm.

34


5
Results

The following chapter presents the results obtained throughout the project. Initially,
we discuss the outcomes from the behavioral simulation of the receiver, highlighting
both the output and the synchronization request initialization. Subsequently, we
present the results from the FPGA implementation, including the ADC configura-
tion and SerDes setup, as well as the outcomes from the JSED204B implementation.
Finally, we present the results from our investigation of the selected data reduction
algorithms.

35


5. Results

5.1 Behavioral simulation of receiver RTL
Simulation of our IP, containing the link layer and transport layer was performed
using a sine wave of 14-bits resolution as stimuli, to represent ADC samples. The
samples were packaged into octets and fed into the IP. A successful synchroniza-
tion sequence of CGS and ILA can be seen in Figure 5.1a. One can clearly see
the sequence of receiving K28.5 characters during CGS (0xBC), followed by four
multiframes during the ILA state, where each multiframe starts with /R/ and ends
with /A/ characters (0x1C and 0x7C). The output from the transport layer is shown
in Figure. 5.1b. SYSREF

(a) CGS and ILA.

(b) Output from transport layer.

Figure 5.1: Behavioral simulation results.

As seen in Figure 5.1b, noticeable spikes are present on the sine wave output, fur-
ther shown in Figure 5.2. The spikes occur inside the link layer, and come as a
consequence of receiver character replacement.

Figure 5.2: Sine wave spikes from the receiver output.

As stated earlier, 8b/10b encoding and decoding were excluded from our testbench
environment, resulting in all inputs being 8 bits long. This meant that our receiver
misinterpreted data values to be control characters. An actual transmitter however,
would encode each 8-bit octet into a 10 bit word. The way 8b/10b encoding encodes
the data octets, makes it impossible for a 10-bit data word to be identical to any of
the 10-bit control characters. If an 8b/10b encoder and decoder would have been
implemented on the transmitter and the receiver side, we would not expect to see
any spikes on the output.

A working re-synchronization mechanism inside the receiver was verified, by inserting
/A/ and /F/ characters at faulty positions into the data stream, to replicate a scenario
where the transmitter and the receiver are out of phase. Figure 5.3 illustrates a re-
synchronization request from the receiver, after it has received a certain amount
consecutively misplaced alignment characters.

36


5. Results

Figure 5.3: Re-synchronization process. The receiver suspects misalignment after
detecting faulty placed comma characters. The receiver issues a synchronization
request and goes through ILA state.

5.2 FPGA implementation
The FPGA was connected to the ADC through the FPGA Mezzanine Card (FMC)
connector. Signals such as: SPI pins for ADC configuration, core and SerDes clocks,
SYSREF , SYNC∼ and most importantly the data pins are routed through the FMC
connector. Figure 5.4 shows the hardware parts used in the projects.

Figure 5.4: FPGA and ADC connected together on FMC connector.

5.2.1 Apbfifo and GRMON
The APBFIFO, used for storing the SerDes samples was successfully synthesized and
implemented on the FPGA. Inside the JESD204B wrapper module, two APBFIFO
registers are found. One of them is reporting a constant value to simplify the
verification of AMBA communication with our JESD204B receiver. The second
register reports values coming from the data stream of the SerDes through the same
AMBA interface. Being able to read the registers indicates a working connection
between the receiver and the AMBA bus. As expected, when doing a read request on
GRMON at startup, the real data register only shows zeroes, while the capability

37


5. Results

register reports the given constant value. Figure 5.5 shows the GRMON debug
monitor output, verifying that it was able to detect the receiver module on the
AMBA APB bus.

Figure 5.5: GRMON read access result. IP-cores shown are detected on the AMBA
bus.

38


5. Results

5.2.2 ADC configuration and SerDes
Regarding ADC configuration, the registers of the ADC chip were successfully con-
figured. We could verify the configuration by reading the registers before and after
configuration. The evaluation board has both an ADC chip, and an LMK04828
clock generation chip. The LMK04828 chip is responsible for generating the SerDes
clock, core clock to FPGA, core clock to the ADC chip and an external clock to
an output pin for debugging. The chip is configured using the same SPI interface
on the evaluation board. The LMK04828 chip can be configured using a larger set
of parameters compared to the ADC. Many of them are used for PLL and VCO
configuration. This led to an iterative workflow to set up the clocks and getting the
board in operation.

The ADC contains a set of stored patterns in the memory that can be sent to
the receiver via the SerDes. Therefore, the first registers to be configured on the
LMK04828 chip were the registers controlling the ADC clock as well as the external
pinout clock. The external debug clock was successfully generated and captured via
an oscilloscope using the external pinout on the evaluation board. Afterwards, the
registers controlling the ADC clock, receiver core clock and receiver SerDes clock
needed to be configured and generated. There has to be a specific relationship
between the clocks to obtain a correct SerDes communication. Since the clocks are
transmitted via the FMC connector, they are complicated to measure and debug.

Our strategy was to, through trial and error, configure the registers of the evaluation
board, and inspect the incoming data stream after the FPGA SerDes using APBFIFO
module. However, due to limitation in time, and the large set of configurations
for the LMK chip, we were unable to verify the correct clocks and capture any
SerDes data on the FPGA board. Looking back at our work methodology, we would
have divided the tasks so more time could be spent on configuring the hardware.
According to our technical supervisor, a physical layer typically requires notable
time and extensive experimentation to achieve proper functionality.

39


5. Results

5.2.3 JESD204B receiver performance evaluation

The simulated JESD204B receiver was synthesized and implemented in Vivado
2019.2. In this section we present the obtained implementation results. Area, tim-
ing and power are presented for the created SoC subsystem, as shown in 5.6. The
results are presented for the subsystem with and without the receiver initialization.

System wrapper

SoC Subsystem

SerDes IP

AHB/APB
interfaces

JTAG

RAM
memory

timer

UART

JESD204B

Link layer

Transport
layer

apbfifo

Figure 5.6: Design hierarchy of the implemented subsystem with JESD204B.

Utilization

From the Vivado resource usage report, presented in Table 5.1 and Table 5.2, it was
seen that the addition of the SerDes, APBFIFO and the JESD204B receiver, resulted
in a total increase of resource-utilization of around 50%, for the whole design.

Table 5.1: Subsystem without JESD204B.

Name CLB LUTs CLB Registers CARRY8 F7 Muxes BRAM
system_wrapper 680 609 12 2 2
subsystem 679 603 12 2 2
ahbctrl 57 20 0 0 0
apbctrl 86 88 0 2 0
jtag 123 194 0 0 0
uart 272 179 8 0 0
gptimer 121 106 4 0 0
ahbram 20 16 0 0 2
total 2038 1815 36 6 6

40


5. Results

Table 5.2: Subsystem including JESD204B receiver and APBFIFO.

Name CLB LUTs CLB Registers CARRY8 F7 Muxes BRAM
system_werapper 956 993 18 13 3
serdes_ip 99 212 4 0 0
subsystem 848 763 14 13 3
ahbctrl 58 20 0 0 0
apbctrl 135 89 0 13 0
jtag 122 194 0 0 0
uart 277 179 8 0 0
jesd_wrapper 119 159 2 0 1
apbfifo 54 87 2 0 1
jesd_top 65 72 0 0 0
gptimer 118 106 4 0 0
ahbram 20 16 0 0 2
total 2871 2890 52 39 10

From the above tables, it is visible that APBCTRL is significantly bigger when the
receiver is connected to the subsystem. The reason is that the JESD204B contains
the APBFIFO module, which has an APB interface that is connected to the APB
controller. This addition causes a significant resource increase in CLB LUTs.

Timing

The main clock in the subsystem is the AMBA clock at 100 MHz. From the timing
report provided by Vivado, no timing violations are seen. Tables 5.3 and 5.4 show
the timing report for the SoC subsystem with and without the receiver.

Table 5.3: SoC subsystem No JESD204B is connected.

Setup Hold Pulse Width
Worst Negative Slack: 6.283 ns Worst Hold Slack: 0.016 ns Worst Pulse Width Slack: 0.500 ns
Total Negative Slack: 0.000 ns Total Hold Slack: 0.000 ns Total Pulse Width Negative Slack: 0.000 ns

Table 5.4: SoC subsystem, including JESD204B receiver and APBFIFO.

Setup Hold Pulse Width
Worst Negative Slack: 5.375 ns Worst Hold Slack: 0.024 ns Worst Pulse Width Slack: 0.486 ns
Total Negative Slack: 0.000 ns Total Hold Slack: 0.000 ns Total Pulse Width Negative Slack: 0.000 ns

Power

Power dissipation of the design is evaluated using the Vivado tool. Again, power is
presented for the subsystem with and without the receiver design. The subsystem
without a receiver and APBFIFO, would have an estimated power consumption of
604 mW. With the inclusion of the receiver and APBFIFO, the total on-chip power
consumption reached 977 mW. See Figure 5.7 for a detailed power breakdown.

41


5. Results

Subsystem Subsystem with JESD receiver
0

200

400

600

800

1000

1200

Po
w

er
(m

W
)

Dynamic Static GTH

(a) On-chip power (W).

0

50

100

150

200

250

300

Subsystem Subsystem with JESD receiver

Po
w

er
 (m

W
)

Clocks Signals Logic BRAM MMCM I/O

(b) Dynamic power breakdown (W).

Figure 5.7: Subsystem power consumption, including JESD204B receiver.

As seen in Figure 5.7, there was not a significant change in static power after the
addition of our receiver prototype to the subsystem. A notable increase in dynamic
power was seen after the receiver was added, but the majority of the power increase
was caused by the on-board transceiver, housing the SerDes.

5.3 Application layer
It is crucial to note that the application layer of the JESD204B receiver varies for
each implementation. Specifically, this master thesis focuses on studying two distinct
data reduction algorithms as the application layer. The results of these algorithms
are presented here. Various stimuli inputs were utilized to evaluate the suitability
of each algorithm for different applications. It is important to highlight that all
tests were conducted using Python, and no hardware implementation was carried
through.

42


5. Results

5.3.1 RLE results

In theory, RLE is expected to achieve the best data reduction when the data demon-
strates repetitive patterns. By contrast , RLE is likely to produce minimal reduction
for inputs characterized by randomness and infrequent occurrences. To thoroughly
evaluate the RLE algorithm across different contexts, it is important to utilize a
diverse dataset that encapsulates various characteristics of input data.

The intuition behind this approach was to evaluate the algorithms performance
across different scenarios. Therefore, the test data was categorized into four cat-
egories: Inputs with repeating data, inputs with less repeating data, random or
complex inputs to ensure real-world data evaluation, and finally, inputs with differ-
ent sizes to understand how RLE scales with image size. A set of binary images were
initially selected to stimulate the RLE algorithm, which can be shown as results in
Figure 5.8.

(a) Angled pattern (b) Big astronaut (c) Earth

(d) Horizontal pattern (e) Small astronaut (f) Sweden map

Figure 5.8: Binary image inputs to the RLE algorithm.

The resulting output after running the RLE algorithm was saved into a log file that
was then compared to the reference input. To determine the effectiveness of data
reduction, we selected two benchmarks that quantified the performance of RLE.

The first benchmark; average word size after reduction, provides insight into the
efficiency of the reduction process. The second benchmark, average reduction rate,
is calculated by dividing the original file size by the compressed file size, offering
a clear ratio of size reduction. Table 5.5 shows the corresponding information and
benchmark values for the different inputs.

43


5. Results

Table 5.5: RLE reduction results for different input types and sizes.

Input
description

Input
type

Input size
(pixels)

Average word size
after reduction

Average
reduction rate

Sine wave Signal 14x3072 10.33 1.48
Triangle wave Signal 16x100,000 12.47 1.32
Map of Sweden Image 88x88 8.61 11.18
Angled pattern Image 88x88 34 2.6
Horizontal pattern Image 88x88 3 29.3
Earth Image 250x250 31 16.54
Big astronaut Image 488x467 31.4 31.4
Small astronaut Image 88x88 12.23 11.48
Audio recording Signal 16x191,000 9.77 1.75

One can clearly see that RLE gave the highest reduction rate when the rows of the
inputs were following a repeated pattern. One obvious example is Figure 5.8 (d)
which features horizontal lines, with every line containing identical binary values.
Additionally, inputs with larger pixels that are similar in nature yielded a higher
reduction rate. This is expected, because longer rows of binary values have a sta-
tistically higher probability of containing repeated values. Example of that are the
two inputs for big astronaut and small astronaut. From the table, one can see that
the input sizes are 88 compared to 488.

RLE was also evaluated using signals, specifically a sine wave and an audio recording.
However, the reduction rate for these signals was noticeably low. Signals tend to have
a high information content with fewer repeating patterns, which is what RLE relies
on for reduction. Another challenge with signals is their resolution, as they typically
have a wordlength between 8-32 bits, which can further limit the effectiveness of
RLE. All things considered, RLE provided a solid lossless reduction for binary input
images. The expected reduction rate is estimated to be between 1 and 8.

5.3.2 Moving average results
The moving average algorithm was implemented in two ways: The first one is called
sliding moving average (SMA), which calculates the average of a sampling window,
and increments one sample at a time. The second approach is called block mov-
ing average (BMA). This second approach runs the same algorithm, with the only
difference being that it calculates the average of a sample window, then jumps k
number of samples and calculates the moving average of that block, providing a
downsampled result. This makes it an excellent method to accomplish frequency
reduction for signals.

Measuring reduction rate is not as applicable for SMA in the same way as it is for
RLE. SMA relies on frequently calculating averages of a window sample, and fading
away the odd values in that window sample to become as close as possible to other
neighbouring signals. The results of the SMA showed a significant granular and
detailed smoothing for noisy inputs. The noisy sine wave output obtained from the

44


5. Results

receiver was tested with SMA and a sample window of 100 was selected. Figure 5.9
shows the results. As expected, a nice and clean output was obtained after running
SMA algorithm on the noisy input.

Figure 5.9: A filtered sine wave using sliding moving average.

The goal of the BMA is to downsample a signal, which is particularly useful when
an application runs at a slower rate compared to the input rate. An average is
calculated once every k samples, resulting in a lossy type of downsampling and data
reduction. Figure 5.10 shows the reconstruction of a sine and a triangle wave using
the BMA algorithm.

(a) Sine wave input. (b) Sine wave output.

(c) Triangle wave input. (d) Triangle wave output.

Figure 5.10: BMA algorithm performed on a sine wave and triangle wave signals.

45


5. Results

By its nature, BMA is a lossy algorithm, which would not be optimal for systems
where data loss is intolerable. Running BMA on such systems corresponds in loss
of details and reduced quality. The audio recording was evaluated using BMA;
however, SMA proved to be the better choice for the audio file, due to the complex
and non-periodic characteristic of speech and music.

5.3.3 Algorithm-FPGA interaction
The theoretically studied BMA, SMA, and RLE algorithms offer distinct advantages
and trade-offs in performance. They will however be implemented on FPGA fabric,
resulting in different area affects on the hardware.

BMA on FPGA

BMA is particularly effective for downsampling signals, with a significant data reduc-
tion capabilities. However, its primary drawback is its lossy nature when applied to
complex signals that contain critical information. This makes BMA less suitable for
applications where signal integrity is of great importance. The simplicity of BMA’s
arithmetic operations translates well to FPGA implementation using flip-flops and
a small number of shift registers depending on the downsampling resolution.

SMA on FPGA

On the other hand, SMA offers precise and detailed granular smoothing for input
signals, making it more appropriate for complex and non-periodic signals that carry
essential information. Despite its advantages in maintaining signal integrity, the
increased complexity of SMA means it will occupy more FPGA resources in terms
of shift registers compared to BMA. SMA achieves a noticeably less data reduction
compared to BMA.

RLE on FPGA

RLE provides a robust lossless reduction primary for input images, with an expected
reduction rate typically ranging between 1 and 8 for larger sets of inputs. The algo-
rithm achieves the highest reduction rate with inputs that have repeated patterns.
However, it performs poorly with signals like sine waves and audio recordings due
to their pseudo random information, resulting in fewer repeating patterns. Imple-
menting RLE on FPGA requires control logic gates to detect repeated values, with
flip-flops to manage the counts.

In conclusion, the selection between BMA, SMA, RLE or any other algorithm for the
receiver application layer should be application specific to each space mission. This
means balancing trade-offs between performance, signal integrity, data reduction,
and hardware area. Integrating these algorithms into the FPGA fabric allows the
application layer of the JESD204B receiver to dynamically adapt to changes without
making modifications to the JESD204 receiver IP-core that can be part of any SoC.

46


6
Conclusion

In this project, we aimed to investigate the attainability of integrating a JESD204B
receiver into an SoC architecture, by the means of a minimal SoC referred to as a
subsystem on an FPGA. The investigation comprised of a comprehensive study of
the JESD204 standard, as well as preparatory calculations of the parameters for a
link between a JESD204B receiver and transmitter.

A prototyped JESD204B receiver containing a link layer and a transport layer was
developed and verified in simulation. According to our initial time plan, we esti-
mated that the receiver IP would be quick to implement. However, we later realized
that the scope was much larger than anticipated, leading to adjustments in our time
planning. Despite that, the development of the subsystem prototype went according
to redefined plan. It is however worth noting that the prototype and the testbench
environment were relatively simple, lacking certain functionalities that need to be
addressed in future development. Implementing a fully-featured JESD204 receiver
IP, which supports the wide range of configuration options would take a significant
amount of time.

A third party transmitter was part of an ADC evaluation board, which was config-
ured via an SPI interface. An UltraScale XCKU040 FPGA was the platform for
our JESD204B receiver implementation, and the link was to be established through
an FMC connector. The foundation for a physical layer in the receiver was created,
where much time was spent on trying to set up communication over the JESD204B
link between the transmitter and the receiver. This phase required extensive debug-
ging, which involved the GRMON debugger, and the created APBFIFO to capture
SerDes data.

Ultimately, after several iterations of trial and error, communication over the con-
figured JESD204B link could not be established. In hindsight, the project timelines
should have been more conservatively estimated to make room for potentially un-
foreseen technical challenges. It was anticipated that the setup of the ADC could
be accomplished within a relatively short time-frame, which in reality proved to
be overly optimistic, and very time-consuming. Even if a working JESD204B link
was not achieved, a solid infrastructure for further development of the JESD204B
receiver was created, which will potentially simplify future work surrounding this
project.

47


6. Conclusion

Two data reduction algorithms were studied and developed in software with the
aim of using them inside the application layer of the JESD204B: RLE; and Moving
Average, which includes both SMA and BMA. Both algorithms are well-suited for
hardware implementation. The designer should be aware of the trade-offs between
the algorithms performance and area utilization when implemented in hardware. In
terms of performance, RLE is recommended when the inputs are known to contain
repeated patterns. For the moving average: SMA is best when a smoother output
is desired, meanwhile BMA is best used for downsampling.

During the planning phase, we allocated the same amount of time for the data
reduction algorithms and the receiver IP. Upon reflection, it would have been more
advantageous to spend more time on the receiver due its complexity, but also because
the algorithms were only studied in software, not implemented in hardware, and
required less time that planned.

In future work, the system can be improved in several ways: first and foremost is to
read ADC data and to establish JESD204 communication on hardware. Regarding
the RTL and the receiver itself, the prototype is a simplified version of what to
expect from a complete JESD204B receiver IP. Therefore, one essential part of the
future work is to add several features to the receiver and include it on GRLIB.
The limitations are the currently designed transport layer, which only works for
the exact same setup proposed in the results section. For the link layer, the 8b/10b
decoder from GRHSSL needs to be integrated together with the controller. This will
eliminate the misinterpretation of actual data and control characters. The current
testbench can be improved by making it self checking, and not being dependent on
the wave window for verification.

48


Bibliography

[1] Using FPGA in space applications everything you need to know, vorago. [On-
line]. Available: https://www.voragotech.com/blog/fpga- in- space-
applications.

[2] T. Nguyen, C. MacLean, M. Siracusa, D. Doerfler, N. J. Wright, and S. Williams,
“FPGA-based HPC accelerators: An evaluation on performance and energy
efficiency,” Concurrency and Computation: Practice and Experience, vol. 34,
no. 20, e6570, Sep. 10, 2022, issn: 1532-0626, 1532-0634. doi: 10.1002/cpe.
6570. [Online]. Available: https: / / onlinelibrary . wiley . com/ doi / 10 .
1002/cpe.6570 (visited on 04/11/2024).

[3] Gr765 octa-core processor, Gaisler. [Online]. Available: https://www.gaisler.
com/index.php/products/components/gr765.

[4] Lms instrument, Universität Bern. [Online]. Available: https://www.space.
unibe.ch/research/research_groups/space_science_group/science/
lms_instrument/index_eng.html.

[5] A. Riedo, S. Meyer, B. Heredia, et al., “Highly accurate isotope composition
measurements by a miniature laser ablation mass spectrometer designed for
in situ investigations on planetary surfaces,” Planetary and Space Science,
vol. 87, pp. 1–13, 2013, issn: 0032-0633. doi: https://doi.org/10.1016/j.
pss.2013.09.007. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S0032063313002341.

[6] Gr712rc dual-core leon3ft sparc v8 processor, Gaisler. [Online]. Available: https:
//www.gaisler.com/index.php/products/components/gr712rc.

[7] H. Saheb and S. Haider, “Scalable high speed serial interface for data con-
verters: Using the JESD204B industry standard,” in 2014 9th International
Design and Test Symposium (IDT), Algeries, Algeria: IEEE, Dec. 2014, pp. 6–
11, isbn: 978-1-4799-8200-4. doi: 10 . 1109 / IDT . 2014 . 7038577. [Online].
Available: http://ieeexplore.ieee.org/document/7038577/ (visited on
03/20/2024).

[8] Gaisler, “Gaisler processors,” 2023. [Online]. Available: https://www.gaisler.
com/index.php/products/processors.

[9] Grlib vhdl ip library, Gaisler. [Online]. Available: https://www.gaisler.com/
index.php/products/ipcores/soclibrary.

[10] A. Amara, F. Amiel, and T. Ea, “Fpga vs. asic for low power applications,”
Microelectronics Journal, vol. 37, no. 8, pp. 669–677, 2006, issn: 1879-2391.
doi: https://doi.org/10.1016/j.mejo.2005.11.003. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0026269205003927.

49

https://www.voragotech.com/blog/fpga-in-space-applications
https://www.voragotech.com/blog/fpga-in-space-applications
https://doi.org/10.1002/cpe.6570
https://doi.org/10.1002/cpe.6570
https://onlinelibrary.wiley.com/doi/10.1002/cpe.6570
https://onlinelibrary.wiley.com/doi/10.1002/cpe.6570
https://www.gaisler.com/index.php/products/components/gr765
https://www.gaisler.com/index.php/products/components/gr765
https://www.space.unibe.ch/research/research_groups/space_science_group/science/lms_instrument/index_eng.html
https://www.space.unibe.ch/research/research_groups/space_science_group/science/lms_instrument/index_eng.html
https://www.space.unibe.ch/research/research_groups/space_science_group/science/lms_instrument/index_eng.html
https://doi.org/https://doi.org/10.1016/j.pss.2013.09.007
https://doi.org/https://doi.org/10.1016/j.pss.2013.09.007
https://www.sciencedirect.com/science/article/pii/S0032063313002341
https://www.sciencedirect.com/science/article/pii/S0032063313002341
https://www.gaisler.com/index.php/products/components/gr712rc
https://www.gaisler.com/index.php/products/components/gr712rc
https://doi.org/10.1109/IDT.2014.7038577
http://ieeexplore.ieee.org/document/7038577/
https://www.gaisler.com/index.php/products/processors
https://www.gaisler.com/index.php/products/processors
https://www.gaisler.com/index.php/products/ipcores/soclibrary
https://www.gaisler.com/index.php/products/ipcores/soclibrary
https://doi.org/https://doi.org/10.1016/j.mejo.2005.11.003
https://www.sciencedirect.com/science/article/pii/S0026269205003927


Bibliography

[11] F. Maloberti, Data Converters. Dordrecht: Springer, 2008, 440 pp., isbn: 978-
0-387-32485-2.

[12] P. Dhaker, “Introduction to SPI Interface,” Analog Dialogue, vol. 52, 2018.
[13] AMBA, ARM. [Online]. Available: https://developer.arm.com/Architectures/

AMBA.
[14] AMBA AHB protocol specification, ARM. [Online]. Available: https://developer.

arm.com/documentation/ihi0033/latest/.
[15] AMBA APB protocol specification, ARM. [Online]. Available: https://developer.

arm.com/documentation/ihi0024/latest/.
[16] F. Zhang, High-Speed Serial Buses in Embedded Systems. Singapore: Springer

Singapore, 2020, isbn: 9789811518676 9789811518683. doi: 10.1007/978-
981-15-1868-3. [Online]. Available: http://link.springer.com/10.1007/
978-981-15-1868-3 (visited on 03/20/2024).

[17] JESD204C Standard, 2021. [Online]. Available: https://www.jedec.org/.
[18] JESD204B Standard, 2011. [Online]. Available: https://www.jedec.org/.
[19] C. Panek, Networking Fundamentals. Indianapolis: John Wiley & Sons, Inc,

2019, isbn: 978-1-119-65074-4.
[20] A. Athavale, High-Speed Serial I/O Made Simple. Xilinx, Inc, 2005, ch. 3,

20-25.
[21] J. Harris, “Understanding JESD204B link parameters,” Planet analog, 2013.

[Online]. Available: https : / / www . planetanalog . com / understanding -
jesd204b-link-parameters/.

[22] “8b10b Encoder/Decoder MegaCore Function (ED8B10B) Data Sheet,” Al-
tera Corporation, 2001.

[23] T. Biscontini, “Data compression.,” Salem Press Encyclopedia of Science, 2023.
[Online]. Available: https://search.ebscohost.com/login.aspx?direct=
true&amp;db=ers&amp;AN=87321445&amp;site=eds-live&amp;scope=
site&amp;authtype=guest&amp;custid=s3911979&amp;groupid=main&
amp;profile=eds.

[24] A. Birajdar, H. Agarwal, M. Bolia, and V. Gupte, “Image compression using
run length encoding and Lempel Ziev Welch method,” in 2019 Global Confer-
ence for Advancement in Technology (GCAT), 2019, pp. 1–6. doi: 10.1109/
GCAT4