``Gaining insights into multicore cache partitioning: bridging the gap 
between simulation and real systems"

Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan

Proceedings of the 14th International Symposium on High Performance Computer 
Architecture (HPCA-14), Salt Lake City, Utah, February 16-20, 2008.


Cache partitioning and sharing is critical to the effective
utilization of multicore processors. However, almost all existing
studies have been evaluated by simulation that often has several
limitations, such as excessive simulation time, absence of OS
activities and proneness to simulation inaccuracy.  To address these
issues, we have taken an efficient software approach to supporting
both static and dynamic cache partitioning in OS through memory
address mapping. We have comprehensively evaluated several
representative cache partitioning schemes with different optimization
objectives, including performance, fairness, and quality of service
(QoS).  Our software approach makes it possible to run the SPEC
CPU2006 benchmark suite to completion. Besides confirming important
conclusions from previous work, we are able to gain several insights
from whole-program executions, which are infeasible from simulation.
For example, giving up some cache space in one program to help another
one may improve the performance of both programs for certain workloads
due to reduced contention for memory bandwidth.  Our evaluation of
previously proposed fairness metrics is also significantly different
from a simulation-based study.
The contributions of this study are threefold. (1) To the best of our
knowledge, this is a highly comprehensive execution- and
measurement-based study on multicore cache partitioning. This paper
not only confirms important conclusions from simulation-based studies,
but also provides new insights into dynamic behaviors and interaction
effects.  (2) Our approach provides a unique and efficient option for
evaluating multicore cache partitioning. The implemented software
layer can be used as a tool in multicore performance evaluation and
hardware design.  (3) The proposed schemes can be further refined for
OS kernels to improve performance.