TR-11-1.pdf

"ULCC: a user-level facility for optimizing shared cache performance  
on multicores",  

Xiaoning Ding, Kaibo Wang, and Xiaodong Zhang 

Proceedings of 16th ACM SIGPLAN Annual Symposium on Principles and Practice  
of Parallel Programming (PPoPP 2011), San Antonio, Texas, February 12-16, 2011.

Abstract

Scientific applications face serious performance challenges on multicore
processors, one of which is caused by access contention in
last level shared caches from multiple running threads. The contention
increases the number of long latency memory accesses,
and consequently increases application execution times. Optimizing
shared cache performance is critical to reduce significantly execution
time of multi-threaded programs on multicores. However,
there are two unique problems to be solved before implementing
cache optimization techniques on multicores at the user level. First,
available cache space for each running thread in a last level cache
is difficult to predict due to access contention in the shared space,
which makes cache conscious algorithms for single cores ineffective
on multicores. Second, at the user level, programmers are not
able to allocate cache space at will to running threads in the shared
cache, thus data sets with strong locality may not be allocated with
sufficient cache space, and cache pollution can easily happen.
To address these two critical issues, we have designed ULCC
(User Level Cache Control), a software runtime library that enables
programmers to explicitly manage and optimize last level
cache usage by allocating proper cache space for different data
sets of different threads. We have implemented ULCC at the user
level based on a page-coloring technique for last level cache usage
management. By means of multiple case studies on an Intel
multicore processor, we show that with ULCC, scientific applications
can achieve significant performance improvements by fully
exploiting the benefit of cache optimization algorithms and by partitioning
the cache space accordingly to protect frequently reused
data sets and to avoid cache pollution. Our experiments with various
applications show that ULCC can significantly improve application
performance by nearly 40% by reducing cache contention
and pollution in shared caches.

ULCC is open source software.