Preserving Semantics of Multi-Threaded Programs During Cross-ISA Dynamic Binary Translation
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Dynamic Binary Translation (DBT) is a method used to emulate programs on platforms on which they cannot execute natively. In the past, DBTs either did not emulate multi-core programs or did not parallelize their execution. This is no longer the case, as modern processors are often multi-core, necessitating better scaling in DBTs. Renode [1] is one such DBT that is able to emulate multi-core programs using parallel execution. However, Renode — like many other DBTs — fails to correctly emulate the semantics of certain atomic instructions. In particular, emulation of the RISC-V instructions Load-Reserved (LR) and Store-Conditional (SC) is currently incorrect. These semantics are paramount for program correctness.
In this thesis, we improve Renode’s correctness by applying the Hash table-based Store-Test (HST) — a scheme proposed by Zhao et al. [2] — to correctly emulate LR/SC instructions. Using model checking, we find that implementing HST as described by Zhao et al. in Renode results in a race condition. We show how to remediate this race condition in Renode.
Furthermore, we compare the performance of two HST implementations: one written directly in an intermediate representation (IR) similar to assembly, the other written in C using helper functions. Previous work suggests that IR is faster due to less runtime overhead, which we show holds in this case. We find that the IR implementation is 34% faster than helpers in microbenchmarks and 6–18% faster in the PARSEC [3] benchmark suite.
Our IR implementation of HST in Renode improves both correctness and scalability. We show that our implementation can boot Linux on an embedded platform with multi-core emulation enabled, which Renode in its current state (current Renode) cannot do due to correctness issues. Moreover, our implementation scales well when current Renode does not: in an 8-thread microbenchmark of LR/SC, our implementation is 15.6x faster than current Renode. We find that this scalability can be achieved with as little as 8 KiB of extra memory usage.
Beskrivning
Ämne/nyckelord
DBT, HST, Renode, Atomics, LR/SC, LL/SC, Cross-ISA, Qemu.
