Preserving Semantics of Multi-Threaded Programs During Cross-ISA Dynamic Binary Translation

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Dynamic Binary Translation (DBT) is a method used to emulate programs on platforms on which they cannot execute natively. In the past, DBTs either did not emulate multi-core programs or did not parallelize their execution. This is no longer the case, as modern processors are often multi-core, necessitating better scaling in DBTs. Renode [1] is one such DBT that is able to emulate multi-core programs using parallel execution. However, Renode — like many other DBTs — fails to correctly emulate the semantics of certain atomic instructions. In particular, emulation of the RISC-V instructions Load-Reserved (LR) and Store-Conditional (SC) is currently incorrect. These semantics are paramount for program correctness. In this thesis, we improve Renode’s correctness by applying the Hash table-based Store-Test (HST) — a scheme proposed by Zhao et al. [2] — to correctly emulate LR/SC instructions. Using model checking, we find that implementing HST as described by Zhao et al. in Renode results in a race condition. We show how to remediate this race condition in Renode. Furthermore, we compare the performance of two HST implementations: one written directly in an intermediate representation (IR) similar to assembly, the other written in C using helper functions. Previous work suggests that IR is faster due to less runtime overhead, which we show holds in this case. We find that the IR implementation is 34% faster than helpers in microbenchmarks and 6–18% faster in the PARSEC [3] benchmark suite. Our IR implementation of HST in Renode improves both correctness and scalability. We show that our implementation can boot Linux on an embedded platform with multi-core emulation enabled, which Renode in its current state (current Renode) cannot do due to correctness issues. Moreover, our implementation scales well when current Renode does not: in an 8-thread microbenchmark of LR/SC, our implementation is 15.6x faster than current Renode. We find that this scalability can be achieved with as little as 8 KiB of extra memory usage.

Beskrivning

Ämne/nyckelord

DBT, HST, Renode, Atomics, LR/SC, LL/SC, Cross-ISA, Qemu.

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced