A Fault-tolerant Distributed Library for Embedded Real-time Systems

Typ
Examensarbete för masterexamen
Program
Computer systems and networks (MPCSN), MSc
Publicerad
2020
Författare
Gudmandsen, Johanna
Hashem, Hashem
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
A distributed embedded control system (DECS) may have functionality that is safety-critical and time-sensitive, meaning if these systems malfunction the consequences could be devastating. In order to meet these requirements, a system must fulfill real-time constraints and guarantee correct functionality even in the presence of faults. In this thesis we present a software library providing clock synchronization, realtime scheduling and fault-tolerant decision making. It is intended for use with DECS communicating via controller area network (CAN). To achieve fault-tolerant decision making, we propose an early-stopping fault-tolerance algorithm solving up to t faults in a system of 2t + 1 nodes. We further propose an adaptation of this algorithm to real-world applications where there may be an interval of correct values instead of one correct value, as assumed in the base solution. The result is a lightweight and efficient library. The clock synchronization requires one message and has a precision comparable to other known solutions, but is not fault-tolerant. The scheduler runs in O(n2) time and uses a non-preemptive ratemonotonic policy. It can handle up to 63 user-defined tasks, and has a worst-case task delay of 2.5 ms for the lowest-priority task in a system with 60 tasks, assuming a task execution time of 0. The drawback is its inability to handle mixed-criticality task sets. Our proposed algorithm utilizes the properties inherent in CAN to provide an efficient way to rectify faults in the value domain. Due to the early-stopping property of the algorithm, the bus utilization increases linearly with the number of faults. We conclude that while the library is practical and efficient, fault-tolerant clock synchronization and fault handling in the time domain are necessary improvements before the library can be used in production systems.
Beskrivning
Ämne/nyckelord
Byzantine fault tolerance , Real-time scheduling , CAN , Distributed systems , Embedded control systems
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index