Adaptively scheduling parallel loops in distributed shared-memory systems   

Y. Yan, C. Jin and X. Zhang  

IEEE Transactions on Parallel and Distributed Systems, 
Vol. 8, No. 1, 1997, pp. 70-81. 

Using runtime information of load distributions and processor affinity,
we propose an adaptive scheduling algorithm and its variations from
different control mechanisms. The proposed algorithm applies different
degrees of aggressiveness to adjust loop scheduling granularities,
aiming at improving the execution performance of parallel loops by
making scheduling decisions that match the real workload distributions
at runtime. To verify the effectiveness of the algorithm and its
variations, we implemented them on the KSR-1 and on the Convex
Exemplar. We experimentally 
compared the performance of our algorithm and its variations
with several existing scheduling algorithms on the two machines.
The kernel application programs we used for performance
evaluation were carefully selected for different classes of
parallel loops. Our results show that using runtime information
to adaptively adjust scheduling granularity is an effective way to
handle loops with a wide range of load distributions when no prior
knowledge of the execution can be used. The overhead caused by
collecting runtime information is insignificant in comparison with the
performance improvement. Our experiments show that
the adaptive algorithm and its five   
variations outperformed the existing scheduling algorithms.