Accelerating Embedded Code of Simulink with Pipeline-Friendly Code Synthesis and Parallel Computing
Ladda ner
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Computer systems and networks (MPCSN), MSc
Publicerad
2024
Författare
Zhang, Yeben
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The model-based design (MBD) software development method is becoming more
common in the automotive engineering industry due to the growing importance of
software in vehicle development. With this approach, automotive engineers create
control models visually, and then a code generator in the MBD tool automatically
produces the code based on the visual model.
However, code generators often need to consider code compatibility on different
platforms. Therefore, the performance of these codes cannot be guaranteed to
be optimal. In order to optimize performance, this thesis first examines existing
performance optimization schemes. Moreover, two performance optimization schemes
are proposed based on the existing methods.
Modern processors are structured around pipelines, which consist of multiple stages.
Every instruction must go through each stage. Ideally, each instruction at every
stage can be processed in just one CPU cycle. When a task takes longer than one
cycle, it causes a pipeline stall, typically due to data hazards. These hazards happen
when the execution of the current instruction relies on the data value produced by
the previous instruction.
The first method analyzes the pipeline state to identify the cause of a pipeline stall
when there is only one pipeline on the CPU. In a pipeline, if the next instruction
does not depend on the result of the previous instruction, it does not have to wait
for the previous instruction to execute. When such an instruction is executed first, it
is referred to as out-of-order execution. A solution was found to reduce the pipeline
stall by implementing out-of-order execution.
Based on this approach, Mercury is being researched in academic studies. It proposes
an algorithm to minimize data hazards by reordering code statements at the code
level. However, the flaws of this method are evident in analysis of the evaluation
section. This method adversely impacts the performance of complex models that
heavily utilize cache for storing intermediate variables. The thesis investigates the
causes of negative optimization. When the data size of these variables exceeds the
cache, the performance becomes slower compared to the original Simulink-generated
code. This thesis improves the Mercury algorithm. The enhanced thesis method can
also optimize the performance of code running on embedded devices with limited
cache resources.
The second approach is based on the multi-core architecture of modern processors. It
decouples a task system into several small tasks and assigns these tasks to multiple
cores at a fine granularity based on the data dependencies relationship and the
v
execution time. This method minimizes barrier wait times and improves overall
execution time.
In the experiment, thesis conducted tests on hardware to compare a new method with
an existing one. The hardware used for the tests is the Infineon development board,
which is commonly used in the automotive industry to control specific functions in
vehicles. This board is equipped with three Tricore processors, each with a 6-stage
pipeline to process software instructions. The new method aims to improve the
performance of a single task running on a single processor by addressing pipeline
stalls. The task in thesis is the control program of two proportional integral derivative
controller. This control program is widely used in industrial systems. The program
has two inputs. The first input is the expected temperature of the controlled object,
and the second is the temperature detected by the sensor. The output of the controller
is the voltage of the air conditioning compressor. According to the control parameters,
the control program will output different voltages to stabilize the temperature of
the controlled object at the expected temperature. Compared to Mercury, the
thesis method achieves the same optimization when the cache has sufficient capacity.
When the cache resources are insufficient, the thesis method achieves about 20%
improvement compared to Mercury. The second approach increases the parallelism
rate of multiple tasks and reduces the processing time when multiple tasks are
scheduled to the three processors. Compared to the existing coarse-grain scheduling
method, the thesis fine-grain method achieves about 4% improvement.
Beskrivning
Ämne/nyckelord
Computer architecture , pipeline , data hazard , out of order , multi-core , parallelization