Accelerating Embedded Code of Simulink with Pipeline-Friendly Code Synthesis and Parallel Computing

Examensarbete för masterexamen
Master's Thesis
Computer systems and networks (MPCSN), MSc
Zhang, Yeben
The model-based design (MBD) software development method is becoming more common in the automotive engineering industry due to the growing importance of software in vehicle development. With this approach, automotive engineers create control models visually, and then a code generator in the MBD tool automatically produces the code based on the visual model. However, code generators often need to consider code compatibility on different platforms. Therefore, the performance of these codes cannot be guaranteed to be optimal. In order to optimize performance, this thesis first examines existing performance optimization schemes. Moreover, two performance optimization schemes are proposed based on the existing methods. Modern processors are structured around pipelines, which consist of multiple stages. Every instruction must go through each stage. Ideally, each instruction at every stage can be processed in just one CPU cycle. When a task takes longer than one cycle, it causes a pipeline stall, typically due to data hazards. These hazards happen when the execution of the current instruction relies on the data value produced by the previous instruction. The first method analyzes the pipeline state to identify the cause of a pipeline stall when there is only one pipeline on the CPU. In a pipeline, if the next instruction does not depend on the result of the previous instruction, it does not have to wait for the previous instruction to execute. When such an instruction is executed first, it is referred to as out-of-order execution. A solution was found to reduce the pipeline stall by implementing out-of-order execution. Based on this approach, Mercury is being researched in academic studies. It proposes an algorithm to minimize data hazards by reordering code statements at the code level. However, the flaws of this method are evident in analysis of the evaluation section. This method adversely impacts the performance of complex models that heavily utilize cache for storing intermediate variables. The thesis investigates the causes of negative optimization. When the data size of these variables exceeds the cache, the performance becomes slower compared to the original Simulink-generated code. This thesis improves the Mercury algorithm. The enhanced thesis method can also optimize the performance of code running on embedded devices with limited cache resources. The second approach is based on the multi-core architecture of modern processors. It decouples a task system into several small tasks and assigns these tasks to multiple cores at a fine granularity based on the data dependencies relationship and the v execution time. This method minimizes barrier wait times and improves overall execution time. In the experiment, thesis conducted tests on hardware to compare a new method with an existing one. The hardware used for the tests is the Infineon development board, which is commonly used in the automotive industry to control specific functions in vehicles. This board is equipped with three Tricore processors, each with a 6-stage pipeline to process software instructions. The new method aims to improve the performance of a single task running on a single processor by addressing pipeline stalls. The task in thesis is the control program of two proportional integral derivative controller. This control program is widely used in industrial systems. The program has two inputs. The first input is the expected temperature of the controlled object, and the second is the temperature detected by the sensor. The output of the controller is the voltage of the air conditioning compressor. According to the control parameters, the control program will output different voltages to stabilize the temperature of the controlled object at the expected temperature. Compared to Mercury, the thesis method achieves the same optimization when the cache has sufficient capacity. When the cache resources are insufficient, the thesis method achieves about 20% improvement compared to Mercury. The second approach increases the parallelism rate of multiple tasks and reduces the processing time when multiple tasks are scheduled to the three processors. Compared to the existing coarse-grain scheduling method, the thesis fine-grain method achieves about 4% improvement.
Computer architecture , pipeline , data hazard , out of order , multi-core , parallelization
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Teknik / material