A Comparative study of the Cache Coherence and Moving Computation to Data Approach
Examensarbete för masterexamen
As multicore computers are gaining widespread use, the main challenge for the future is how to design multicore systems that scale to hundreds of processor cores. One of the problems is how to implement the so popular shared memory model efficiently for future many-core systems. A critical mechanism to realize shared memory is to use cache coherence which allows multiple copies of a single memory block to be distributed across the caches attached to each core. Unfortunately, it is challenging to scale cache coherence mechanisms to hundreds of cores due to the latencies associated with communicating values and due to the complexity of the mechanism. An interesting alternative that has been considered in a recent research project is to allow only a single copy of a shared data structure and to force the computations that manipulate that data structure to be executed on the core that owns the data structure. While some tentative ideas for how to envision such a systems has been isolated, it is not clear how to implement such a system efficiently and whether the performance will be competitive with standard cache coherence solutions. This thesis project aims at implementing a single-copy shared memory model on a Tilera system with 64 cores. In the project, a run-time system is designed and implemented, parallel applications are mapped to it and the performance of the system is established and compared with a system employing cache coherence. Single-copy memory model will perform better than the coherent-based shared memory model as long as shared data size is smaller than the primary cache size. Because, in the single-copy memory model, core that is responsible for modifying shared data will encounter cache hit each time after the first access. On the other hand, in the coherent-based shared memory model, cores have to bring shared data from the remote memory for each access. However, coherent-based approach will start performing better when shared data size will exceed the primary cache size. Because bringing data from the remote cache takes less CPU cycle than from the main memory. From the experimental result based on the critical sections of different applications (linear solver equation, Radiosity application of SPLASH-2 benchmark), I have found that single-copy memory model shows significant performance improvement over the coherent-based shared memory model. Because, in the single-copy memory model, core that modifies shared data structure encounters cache hit most of the time while executing critical section. Cache hit during accessing the shared data structures saves substantial amount of CPU cycles.
Data- och informationsvetenskap , Computer and Information Science