Fault-tolerant scheduling of real-time parallel DAG tasks on multiprocessors
| dc.contributor.author | Bodin, Gustaf | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Jonsson, Jan | |
| dc.contributor.supervisor | Pathan, Risat | |
| dc.date.accessioned | 2025-09-10T13:30:45Z | |
| dc.date.issued | 2024 | |
| dc.date.submitted | ||
| dc.description.abstract | The imperative of maximizing hardware utilization compels innovation and a strive towards finding more efficient solutions in real-time systems. The directed acyclic graph (DAG) model of parallel tasks is commonly used to represent the data dependencies of real-world applications. Providing techniques for tolerating and recovering from hardware and software faults for such a model is paramount to its usability in critical systems. Sectors like avionics and automotive require fault tolerance to ensure certification and guarantees of safe operation, which is why this combination is important. This thesis considers transient faults. Examining all possibilities of faults occurring in a DAG task is computationally expensive. Therefore, developing efficient methods for bounding the worst-case makespan under faulty conditions is a non-trivial problem and one which is examined in this thesis. A fault-aware schedulability test for a taskset can be derived from finding the number of processors required to meet each task’s deadline in the taskset. This thesis introduces six novel fault-aware schedulability tests that explicitly account for the runtime overhead of using fault recovery through node re-execution. Further, a workconserving scheduler is assumed and the federated scheduling technique is employed to address the problem of guaranteeing the schedulability of DAG tasksets. To bound the worst-case interference caused by re-executed nodes, the tests employ new analytical techniques and build upon existing fault-unaware scheduling techniques for efficient scheduling of DAG tasks. To evaluate the effectiveness of the proposed tests, a simulation framework was developed that is capable of generating random DAG tasks whose structure and computational load reflect that of real-world applications. Simulation results indicate that exploiting structural information of multiple long paths of a DAG task significantly enhances the power of the proposed tests in determining schedulability, regardless of the variation in the simulation parameters. | |
| dc.identifier.coursecode | DATX05 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12380/310459 | |
| dc.language.iso | eng | |
| dc.relation.ispartofseries | CSE 24-196 | |
| dc.setspec.uppsok | Technology | |
| dc.subject | real-time scheduling, parallel tasks, fault tolerance, federated scheduling, work-conserving scheduling, computer science | |
| dc.title | Fault-tolerant scheduling of real-time parallel DAG tasks on multiprocessors | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | Computer systems and networks (MPCSN), MSc |
