TR-98-2.ps.Z

A memory-layout oriented run-time technique for locality optimization 

Y. Yan, X. Zhang and Z. Zhang   

Proceedings of 1998 International Conference on Parallel Processing, 
(ICPP'98), August 1998, pp. 189-196. 
 
Abstract
--------

Exploiting locality at run-time is a complementary approach 
to a compiler approach for those applications with dynamic 
memory access patterns. This paper proposes a memory-layout 
oriented approach to exploit cache locality for parallel 
loops at run-time on Symmetric Multi-Processor (SMP) systems. 
Guided by application dependent hints and the targeted cache 
architecture, it reorganizes and partitions a parallel loop 
through shrinking and partitioning the memory access space of
the loop at run-time. In the generated task partitions, the 
data sharing among partitions is minimized and data reuse in 
a partition is maximized. The execution of tasks in partitions 
is scheduled in an adaptive and locality-preserved way to 
achieve balanced execution, for minimizing the execution time 
of applications by trading off load balance and locality. 

Based on simulation and measurement, we show our run-time approach 
can achieve comparable performance with the compiler optimizations 
for two applications, whose load balance and cache locality can be 
well optimized by the tiling and other program transformations. 
However, our experimental results also show that our approach is able 
to significantly improve the memory performance for the applications
with dynamic memory access patterns. This type of programs are usually 
hard to be optimized by compilers.