TR-94-03-01.pdf

X. Zhang and Y. Yan, 
``Comparative Modeling and Evaluation of CC-NUMA and
COMA on Hierarchical Ring Architectures"

IEEE Transactions on Parallel and Distributed Systems, 
Vol. 6, No. 12, 1995.  

Abstract 
-------- 
Parallel computing performance on scalable shared-memory
architectures is affected
by the structure of the interconnection networks
linking processors
to memory modules and on the efficiency of the memory/cache management systems.
Cache Coherence Non-Uniform Memory Access
(CC-NUMA) and Cache Only
Memory Access (COMA) are two effective memory systems,
and the hierarchical
ring structure is an efficient interconnection network in hardware.
This paper focuses on comparative performance modeling and evaluation
of CC-NUMA and COMA on a hierarchical ring shared-memory
architecture. Analytical models for the two memory
systems for comparative evaluation are presented.
Intensive performance measurements on data migrations
have been conducted on
the KSR-1, a COMA hierarchical ring shared-memory machine.
Experimental
results support the analytical models, and we present
practical
observations and comparisons of the two cache coherence memory systems.
Our analytical and experimental results show that a COMA system
balances the work load well. However the overhead of frequent
data movement may match the gains obtained from improving load
balance.
We believe our performance results could be further generalized to
the two memory systems on a hierarchical network architecture.
Although a CC-NUMA system may not automatically
balance the load at the system level, it
provides an option for a user to
explicitly handle data locality for a possible performance improvement.