TR-02-2.pdf

Dynamic Cluster Resource Allocations for Jobs with Known and Unknown
Memory Demands

Li Xiao, Songqing Chen, and Xiaodong Zhang 
IEEE Transactions on Parallel and Distributed Systems, Vol 13, No. 3, 2002,  
pp. 223-240. 

Abstract

The cluster system we consider for load sharing is a compute farm which
is a pool of networked server nodes providing high performance computing
for CPU-intensive, memory-intensive, and I/O active jobs in a batch
mode. Existing resource management systems mainly target at balancing
the usage of CPU loads among server nodes.  With the rapid advancement
of CPU chips, memory and disk access speed improvements significantly lag
behind advancement of CPU speed, increasing the penalty for data movement,
such as page faults and I/O operations, relative to normal CPU operations.
Aiming at reducing the memory resource contention caused by page faults
and I/O activities, we have developed and examined load sharing policies
by considering effective usage of global memory in addition to CPU load
balancing in clusters.  We study two types of application workloads:
(1) memory demands are known in advance or predictable, and (2) memory
demands are unknown and dynamically changed during execution.  Besides
using workload traces with known memory demands, we have also made kernel
instrumentation to collect different types of workload execution traces
to capture dynamic memory access patterns.  Conducting different groups
of trace-driven simulations, we show that our proposed policies can
effectively improve overall job execution performance by well utilizing
both CPU and memory resources with known and unknown memory demands.