Preserving Semantics of Multi-Threaded Programs During Cross-ISA Dynamic Binary Translation
Loading...
Download
Date
Authors
Type
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Model builders
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Dynamic Binary Translation (DBT) is a method used to emulate programs on platforms on which they cannot execute natively. In the past, DBTs either did not emulate multi-core programs or did not parallelize their execution. This is no longer the case, as modern processors are often multi-core, necessitating better scaling in DBTs. Renode [1] is one such DBT that is able to emulate multi-core programs using parallel execution. However, Renode — like many other DBTs — fails to correctly emulate the semantics of certain atomic instructions. In particular, emulation of the RISC-V instructions Load-Reserved (LR) and Store-Conditional (SC) is currently incorrect. These semantics are paramount for program correctness.
In this thesis, we improve Renode’s correctness by applying the Hash table-based Store-Test (HST) — a scheme proposed by Zhao et al. [2] — to correctly emulate LR/SC instructions. Using model checking, we find that implementing HST as described by Zhao et al. in Renode results in a race condition. We show how to remediate this race condition in Renode.
Furthermore, we compare the performance of two HST implementations: one written directly in an intermediate representation (IR) similar to assembly, the other written in C using helper functions. Previous work suggests that IR is faster due to less runtime overhead, which we show holds in this case. We find that the IR implementation is 34% faster than helpers in microbenchmarks and 6–18% faster in the PARSEC [3] benchmark suite.
Our IR implementation of HST in Renode improves both correctness and scalability. We show that our implementation can boot Linux on an embedded platform with multi-core emulation enabled, which Renode in its current state (current Renode) cannot do due to correctness issues. Moreover, our implementation scales well when current Renode does not: in an 8-thread microbenchmark of LR/SC, our implementation is 15.6x faster than current Renode. We find that this scalability can be achieved with as little as 8 KiB of extra memory usage.
Description
Keywords
DBT, HST, Renode, Atomics, LR/SC, LL/SC, Cross-ISA, Qemu.
