Technical Report Series
		
	Computer and Information Science Research Center
	Department of Computer and Information Science
		The Ohio State University
		    2015 Neil Avenue,
		Columbus, OH  43210-1277

**********************NOTE*********************

	Abstracts for 1996

NOTE:	some titles may be available ONLY via paper (This is noted after
	the title/author section.)

If you have trouble accessing a file, please send email to
tfletch@cis.ohio-state.edu

1996


TR67	Object Selection Based on Oscillatory Correlation  by DeLiang Wang

   One of the classical topics in neural networks is winnter-take-all (WTA), 
which has been widely used in unsupervised (competitive) learning, cortical 
processing, and attentional control.  Because of global connectivity, WTA 
networks, however, do not encode spatial relations in the input, and thus 
cannot support sensory and perceptual processing where spatial relations are 
important.  We propose a new architecture that maintains spatial relations 
between input features.  This selection network builds on LEGION (Locally 
Excitatory Globally Inhibitory Oscillator Networks) dynamics and slow 
inhibition.  In an input scene with many objects (patterns), the network 
selects the largest object.  This system can be easily adjusted to select 
several largest objects, which then alternate in time.  We further show that 
a two-stage selection network gains efficiency by combining selection with 
parallel removal of noisy regions.  The network is applied to select the 
most salient object in real images.  As a special case, the selection network 
without local excitation gives rise to a new form of oscillatory WTA. 


TR66	Time Constrained Bandwidth Smoothing for Interactive Video-On-
	Demand Systems  by Wu-chi Feng 

   The use of a client-side buffer in the delivery of compressed prerecorded 
video can be an effective tool for removing the burstiness required of the 
underlying server and network by smoothing the bandwidth requirements for 
continuous delivery.  Given a fixed-size smoothing buffer, several bandwidth 
smoothing algorithms have been introduced in the literature that are provably 
optimal under certain constraints, typically requiring large buffer residency 
times to realize their optimal properties.  The large buffer residency times, 
however, make providing VCR functions harder to support.  In this paper, we 
introduce two new algorithms that, in addition to the size of the client 
side buffer, use a time constraint as a parameter in the bandwidth plan 
creation, making the plans more amenable to supporting VCR interativity.  Our 
results show that the buffer residency times can be reduced, while still 
allowing the bandwidth allocation to be smoother for continuous video delivery.


TR63	Inheriting and Modifying Behavior  by Neelam Soundararajan and 
	Stephen Fridella

   In many languages, the mechanism of protected members is critical to 
making inheritance work.  It allows access to some of the internals of a 
base class B to the designer of a derived class D, while denying it to 
clients of B.  We develop an axiomatic approach in which class behavior is 
specified in terms of an abstract specification AND a concrete specification, 
the former for use by a client, the latter containing information about how 
the protected members change, for use by a client, the latter containing 
information about how the protected members change, for use by a derived 
class designer.  We present proof rules that allow us to establish that a 
class meets both specifications. 


TR62	Logical Firewalls:  A Mechanism for Security in Future Networking 
	Environments  by Jun Xu and Mukesh Singhal 

   Although current firewalls serve the purpose of current network security 
needs, they have several drawbacks and are not suitable for future networking 
environments.  This paper examines the state of the art of the firewall 
technology and points out deficiencies in the current designs.  It advocates 
software firewalls for solving the security problems in future networking 
environements.  It explains the design of such software firewalls, called 
logical firewalls. 

Keywords:  Firewalls, network security, logical firewalls, protected group, 
	   protected network, protected host. 


TR61	Efficient Schemes for LImited Directory-based DSMs Using 
	Multidestination Message Passing  by Donglai Dai and 
	Dhabaleswar K. Panda 

   Many limited directory schemes have been proposed in the literature to 
build cost-effective DSM systems.  However, all these schemes are based on 
networks supporting only point-to-point (unicast) message passing.  New 
generation networks are providing architectural support to implement 
collective communication operations (broadcast, multicast, reduction, etc.) 
with reduced latency.  This paper explores the impact of such architectural 
support in designing efficient and more cost-effective limited directory 
schemes.  The study is carried out for womrhole k-ary n-cube networks 
supporting multidestination message passing mechanism.  The study in this 
paper is carried out along two major directions.  First, a variation of 
the diriB scheme to work with multidestination-based broadcast and 
gather operations is proposed.  This scheme is defined as m_diriB
scheme.  Next, two coarse vector schemes (m_diriCB1 and m_diriCB2) are 
proposed to work efficiently with selective broadcast operations with 
multidestination messages.  These schemes are evaluated for 32 and 64 
processor systems with SPLASH2 benchmarks and compared with limited 
directory schemes using unicast message passing ((u_diriB and u_diriCV).  It 
is shown that m_dir1B (with a single pointer can reduce the program execution 
time up to a factor of 3.47 compared to u_dir1B scheme.  Similary 
m-Idr1B scheme performs closer to u_dir3CV scheme (with three pointers) and 
fully-mapped scheme.  For 1-3 pointers, the m_diriCB1 and m_diriCB2 are shown 
to further reduce program execution time compared to the m_diriB scheme.  For 
1 pointer, m_dir1CB1 performs better than the u_dir1CV scheme by a factor of 
up to 1.66.  These results indicate that limited directory schemes with only 
a single pointer can be designed to work efficiently on new generation 
wormhole networks with multidestination message passing.  Such results provide 
strong guidelines to build future scalable DSM systems in a cost-effective 
manner. 

Keywords:  Distributed shared memory, wormhole routing, directory-based 
	   protocols, cache coherency, meshes, parallel architecture, 
	   interprocessor communication, multidestination mechanism, and 
	   path-based routing. 


TR60	A Coherent Family of Analyzable Graphical Representations for 
	Object-Oriented Software  by Mary Jean Harold and Gregg Rothermel 

   Many software engineering tools and techniques rely on graphical 
representations of software, such as control flow graphs, program dependence 
graphs, or system dependence graphs.  To apply these tools and techniques to 
object-oriented software, we cannot necessarily use existing graphical 
representatios of procedural software.  Representations of object-oriented 
software, like those for procedural software, must depict individual 
procedures (methods) and entire programs; however, they must also depict 
classes and their interactions, and account for the effects of inheritance, 
polymorphism, and aggregation.  These representations should facilitate the 
use of existing program analysis tools and techniques, rather than requiring 
us to create new techniques.  A system for constructing and managing these 
representatinos should take advantage of the code reuse inherent in object-
oriented software, by reusing representational components.  In this paper, 
we describe a coherent family of graphical representations of object-oriented 
software that meets the fore-going requirements, and we outline the 
architecture of an efficient, extensible system for constructing and managing 
those representations. 


TR59	Implementing Multidestination Worms in Switch Based Parallel Systems: 
	Architectural Alternatives and their Impact 
	by Craig B. Stunkel, Rajeev Sivaram, and Dhabaleswar K. Panda 

   Multidestination message  passing has been  proposed  as an attractive
mechanism for  efficiently implementing multicast and other collective
operations on direct networks.   However,  applying this mechanism  to
switch  based  parallel systems  is non-trivial.   In  this  paper  we
propose  alternative switch    architectures  with  differing   buffer
organizations to  implement   multidestination worms on  switch  based
parallel   systems.  First  we  discuss     issues  related to    such
implementation    (deadlock-freedom,  replication mechanisms,   header
encoding, and routing).  Next, we  demonstrate how an existing central
queue based switch architecture supporting unicast message passing can
be  enhanced fairly easily  to  accommodate  multidestination  message
passing.  Similarly, implementing  multidestination worms  on an input
buffer  based  switch   architecture   is  discussed.    Both    these
implementations are evaluated against each  other as well as against a
software-based scheme   using   the   central queue    organization.   
Simulation experiments  under a range  of traffic (multiple multicast,
bimodal, varying degree of  multicast, and message length)  and system
size are used for  evaluation.  The study demonstrates the superiority
of  the   central queue  based    switch  architecture for   efficient
implementation of  multidestination worms.  Such an implementation can
perform well  without saturating for applied  loads of 0.8 or greater. 
It also  indicates that under bimodal  traffic the central-queue based
hardware multicast implementation  affects background  unicast traffic
less adversely compared to a  software-based multicast implementation. 
Thus, multidestination message  passing    can easily be  applied   to
switch-based parallel systems to deliver good collective communication
performance.


Keywords: Parallel   computer  architecture,  wormhole  routing,
cut-through  routing, multicast, broadcast, collective  communication,
multistage interconnection networks, performance evaluation.


TR58	Analysis of Hierarchical Reliable Multicast Transport Protocols for 
	Dissemination and Collaborative Communications   
	by Walid Mostafa and Mukesh Singhal 

   Sender-initiated reliable multicast protocols rely on positive acknowledge-
ments (ACKs) that lead to an ACK implosion as the number of receivers 
increases.  Receiver-initiated protocols rely on negative acknowledgements 
(NAKs) that result in longer delays to detect missing packets by a receiver. 
Hierarchical approaches realize scalable communication by distributing 
error control functions (e.g., packet loss detection and retransmission) 
among a hierarchy of nodes.  In this report we derive analytical expressions 
for the throughput of different node types in hierarchical multicast control 
trees and analyze their sensitivity to configuration parameters.  We also 
derive expressions for the maximum throughput of collaborative multicast 
communication among R nodes where each node is actively involved in sending 
and receiving data.  We show that protocol variations that improve the 
throughput of 1xR dissemination communications can degrade the throughput of 
RxR collaborative communications. 

Keywords:  Hierarchical Reliable Multicast, Transport Protocols, Restricted 
	   Multicast, Collaborative Communication, and Dissemination. 


TR57	A Distributed Fault-Detection and Recovery Protocol for Reliable 
	Multicast Collaborative Communications   by Walid Mostafa and 
	Mukesh Singhal 

   Reliable multicast transport protocols support dissemination communication 
and their throughputs are either limited by the sender or by intermediate 
nodes that consolidate acknowledgements and retransmissions.  RMSP is a 
portable reliable multicast protocol that supports collaborative communication 
at the session-layer level and provides higher throughput by maintaining two 
transport connections per node irrespective of the number of nodes.  However, 
the throughput only of RMSP may degrade if one or more nodes or the 
centralized connection manager that monitors them fails.  In this report, 
we propose a fully distributed fault-tolerant protocol, DFT-RMSP, that 
detects and isolates faulty nodes and reinstates the multicast collaborative 
communications.  We derive analytic expresions for throughput degradation and 
analyze its sensitivity to node-failure rates, network delay, and multicast 
group size.  We show tha DFT-RMSP provides less down-time and achieves 
graceful throuput degradation under high failure rates compared with RMSP, 
simultaneous node failures.  

Keywords:  Distributed Faul-Tolerance, Reliable Multicast, Session Protocols, 
	   and Collaborative Communication 


TR56	Distributed Dynamic Carrier Allocation in Mobile Cellular 
	Networks:  Search vs. Update  by Xuefeng Dong and Ten H. Lai 

   There are two approaches for distributed dynamic carrier allocation 
algorithms:  update and search.  This paper investigates the fundamental 
differences between the two approaches.  We present two simple, representative 
schemes, namely the basic update scheme and the basic search scheme.  We will 
argue that for two reasons the update approach is more appropriate for mobile 
cellular networks that support real-time communications.  First, it is hard to 
support carrier reallocations with the search approach.  Second, the search 
approach usually incurs longer delay during carrier acquisitions than the 
update approach. 
   We then propose a new update scheme that is more efficient than the basic 
update scheme in terms of message complexity and carrier acquisition delay. 
Our simulation results show that, compared to the basic update scheme, the 
new update scheme not only cuts the average carrier acquisition delay by 
60% to 95%, but also reduces the average number of messages required per 
carrier acquisition by almost 50%. 


TR55-DIR	This entry is a number of slides used in lecturing on 
		ATM.  Each set of slides has its own filename within the 
		subdirectory 1996/TR55-DIR.  The abstracts for them appear 
		below: 

-----------------------------------------------------------------------
 o ATM Forum/96-1294: Performance of TCP over ABR on ATM backbone and
   with various VBR traffic patterns (October 1996) tcp_vbr.ps

We extend our earlier study of buffer requirements of TCP over ABR in
two directions. First, we study the performance of TCP over ABR in an
ATM backbone. On the backbone, the TCP queues are at the edge router
and not inside the ATM network. The router requires buffer equal to
the sum of the receiver window sizes of the participating TCP
connections. Second, we introduce various patterns of VBR background
traffic. The VBR background introduces variance in the ABR capacity
and the TCP traffic introduces variance in the ABR demand. Some simple
switch schemes are unable to keep up with the combined effect of
highly varying demands and highly varying ABR capacity. We present our
experiences with refining the ERICA+ algorithm to handle these
conditions.

-----------------------------------------------------------------------
 o ATM Forum/96-1270: Tutorial Paper on ABR Source Behavior (October
   1996) atm96-1270.ps 


-----------------------------------------------------------------------

 o ATM Forum/96-1267: ABR Switch Algorithm Testing: A Case Study with
   ERICA (October 1996): atm96-1267.ps

This contribution discusses the requirements of ATM Available Bit Rate
switch algorithms, and demonstrates how each of these requirements can
be tested. As a case study, the performance of the ERICA switch
algorithm is evaluated and the effect of some features and options
of the algorithm is examined. The requirements tested include:
efficiency, delay, fairness, transient and steady state performance,
scalability, and adaptation to variable capacity and various source
traffic models. The performance of the algorithm is evaluated for
various configurations and background traffic patterns.

-----------------------------------------------------------------------
 o ATM Forum/96-1269: Performance of TCP over UBR+ (October 1996) : atm96-1269.ps

ATM switches respond to UBR congestion by dropping cells when their
buffers become full. TCP connections using the UBR service experience
low throughput and low fairness. For 100% TCP throughput each switch
needs buffers equal to the sum of the window sizes of all the TCP
connections. Intelligent drop policies can improve the performance of
TCP over UBR with limited buffers. The UBR+ service consists of
enhancements to UBR for intelligent drop. We found that Early Packet
Discard (EPD) improves throughput but does not improve
fairness. Selective packet drop based on per-connection buffer
occupancy improves fairness. The Fair Buffer Allocation scheme further
improves both throughput and fairness.
-----------------------------------------------------------------------
o ATM Forum/96-1172: ERICA Switch Algorithm: A Complete Description
(August 1996) : atm96-1172.ps

The ERICA switch algorithm has been discussed extensively in TM group
in the past. However, over the last two years, the algorithm has been
substantially modified. This contribution describes the current
version of ERICA switch algorithm in complete detail. The algorithm
achieves both efficiency and fairness, and exhibits a fast transient
response. The development of the algorithm is traced, and the new
approaches it uses to achieve its objectives are highlighted. Several
design and implementation aspects of the algorithm are examined. In
addition, several enhancements to the algorithm are proposed and
studied. 

-----------------------------------------------------------------------
 o ATM Forum/96-0517: Buffer Requirements for TCP over ABR (April
 1996) : af_abr22.ps

In our previous study, it was shown that cell loss due to limited
buffering may degrade throughput considerably. The key question is how
much buffering is required to avoid cell loss. This contribution
attempts to answer that question.

We show that the maximum buffers required at the switch is
proportional to the maximum round trip time of any VC through the
link. The number of round-trips depends upon the the switch algorithm
used. With our ERICA (modified) switch algorithm, we found that the
buffering required is independent of the number of TCP sources. These
observations are valid even when VBR or two- way traffic is used. We
substantiate our arguments with simulation results.

Our simulations are carried out with a modified version of the ERICA
algorithm. We also give a brief description of the modifications to
ERICA which include avoidance of unnecessary spikes, correct counting
of bursty sources and enhanced fairness.

-----------------------------------------------------------------------
 o ATM Forum/96-0518: Performance of TCP over UBR and buffer
   requirements (April 1996): af_ubr22.ps

We study TCP throughput and fairness over UBR for several buffer and
maximum window sizes. TCP performs best when there is no loss. The
performance is severely degraded under loss (regardless of UBR or
ABR). Therefore, we study the switch buffer requirements for zero
loss. The required buffers are found to be the function of number of
VCs and their round-trip times.
-----------------------------------------------------------------------
o ATM Forum/96-0177: TBE and TCP/IP traffic (February 1996) :
af_rl6b2.ps

The effect of TCP traffic over ATM ABR is studied with the ERICA
switch algorithm. ABR implements its rate-based traffic control at
switches (via ER algorithms like ERICA) and at sources (via source
rules, such as, Rule 6, which uses TBE parameter). TCP implements its
own traffic controls via slow start window control. We study the
interaction between the two mechanisms. In particular, this
contribution concentrates on the effect of Transient Buffer Exposure
(TBE) parameter and Source End System Rule 6 on TCP/IP connections in
a Wide Area Network (WAN).

-----------------------------------------------------------------------
 o ATM Forum/96-0178: Comments on "Use-it or Lose-it" (Annex E of
   TM4.0) (February 1996) : af_rl5b2.ps

Annex E of T4.0 describes several optional NIC implementations for the
use-it or lose-it policies. A minor change in one of the options will
result in significant performance improvements.

-----------------------------------------------------------------------
 o ATM Forum/95-1660: A Fix for Source End System Rule 5 (December
 1995) : af_rl52.ps

In the October 1995 meeting of the ATM Forum, it was found that source
end system rule 5 does not handle ACR retention problem correctly. It
causes undesirable oscillations in many cases and significantly
reduces the performance of the bursty sources. In this contribution,
we analyze the solution that was proposed in October 1995 meeting and
finally present a better solution. While the case of small bursts
needs to be tackled, the rule works well for other simple
configurations.

-----------------------------------------------------------------------
 o ATM Forum/95-1661: More Strawvote Comments: TBE vs Queue sizes
   (December 1995) : af_rl62.ps

We analyze the role of Source End System rule 6 in depth. We find that
the queue lengths can grow considerably in some cases even if the
value of TBE (a.k.a. CIF) is small.

-----------------------------------------------------------------------
 o ATM Forum/95-1346: ERICA+: Extensions to the ERICA Switch Algorithm
   (October 1995) : af_erc22.ps

Several enhancements to ERICA are described. The resulting scheme,
ERICA+ provides congestion avoidance with 100% utilization. Even at
100% utilization, the queue lengths are bounded. The scheme gives
better control of end-to-end delay compared to other schemes and keeps
link utilization of expensive links high despite idle periods in the
input load. The scheme responds well to VBR traffic by estimating the
ABR capacity and employing a scheduling strategy between ABR and
VBR. Simulation results on a set of representative configurations are
presented.

-----------------------------------------------------------------------
 o ATM Forum/95-1345: Bursty ABR Sources (October 1995) : af_brst2.ps

Bursty traffic introduces idle times in the traffic pattern and this
has interesting effects on the source rules.  We examine this through
a definition of a new bursty model and a new metric.

-----------------------------------------------------------------------
 o ATM Forum/95-1344: New Source Rules and Satellite links (October
   1995) : af_cif2.ps

The effect of new CIF and ICR formulae on satellite networks is
studied. Source rule 5 (TOF dectease) seems to have mostly negative
effect on performance. Rescheduling, on the other hand, has positive
effect.

-----------------------------------------------------------------------
 o ATM Forum/95-1343: Straw-Vote comments on TM 4.0 R8 (October 1995):
  af_stvc2.ps

Several problems with Xrm and ICR computation using RTT and use of rule 5 are pointed out. 

-----------------------------------------------------------------------
 o ATM Forum/95-973R1: Out-of-Rate RM Cell Issues and Effect of Trm,
   TOF, and TCR (August 1995): af_trm2.ps

In this contribution, we examine the transient response time in going
from a very low rate to a high rate. The frequency of feedback is low
for low rate sources and this affects transients. Parameter Trm can
improve the frequency of feedback. The rise is also controlled by the
Tof parameter which may trigger an unnecessary decrease. We also
observe that out-of-rate cells are not optional for NICs because of
the ACR=0 case. We propose corrections to the End System pseudo code
to correct these and other discrepancies. Note: Most of these
corrections have been adopted in revision 8 of the TM 4.0 spec.

-----------------------------------------------------------------------
 o ATM Forum/95-972R1: Parameter Values for Satellite Links (August
   1995) : af_xrm2.ps

In this contribution, we examine the effect of the source parameters
Xrm and XDF on long delay paths like satellite paths. We observe low
throughput in the absence of congestion even for a simple one source
configuration. We propose an increase in the length of the Xrm
parameter from 8 bits to 32 bits to account for large bandwidth-delay
product links. Note: this proposal was accepted in the revision 8 of
TM version 4.0

-----------------------------------------------------------------------
 o ATM Forum/95-0467: Simulation Results for ERICA Switch Algorithm
   with VBR+ABR traffic (April 1995): af_vbr2.ps

This contribution discusses the issues raised by the presence of VBR
and presents simulation results for the ERICA switch algorithm.

-----------------------------------------------------------------------
 o ATM Forum/95-0178R1: A Sample Switch Algorithm, (February 1995): af_swco2.ps

Describes the switch algorithm developed at OSU. The algorithm is
fully compatible with the latest source and switch requirements and
has been thoroughly tested. The text to be included in the TM document
appendix is presented.

-----------------------------------------------------------------------
 o ATM Forum/95-0179: Simulation Results for the Sample Switch
   Algorithm (February 1995): af_swsi2.ps

Simulation results for various configuration for the sample switch
algorithm are presented. 

-----------------------------------------------------------------------
 o ATM Forum/94-1175R1: Current Default Proposal: Unresolved Issues
   (November 94): af_iss22.ps

Problems with the default baseline text as circulated after the
October interim meeting are pointed out.

-----------------------------------------------------------------------
 o ATM Forum/94-1173: Transient Performance of EPRCA and EPRCA++ (November 94)
: af_trans.ps

Transient performance of EPRCA and EPRCA++ (a modification of EPRCA+)
is compared. 

-----------------------------------------------------------------------
 o ATM Forum/94-0987: BECN: Why we need a timestamp or sequence nuber
   in the RM Cell (October 1994): af_becn.ps

BECN cells and FECN cells can arrive at the source out of order and
the effect of a BECN can be nullified by subsequent FECN cells that
actually bring older information. Knowing which RM Cell represents
newer information helps get full advantage of BECN. This can be done
by having a sequence number or time stamp field in the RM cell. The
Ordered BECN option of the OSU scheme is described and the effect of
BECN with and without this order is shown.

-----------------------------------------------------------------------
 o ATM Forum/94-0988: Simulation Results: The EPRCA+ Scheme (October 1994)
: af_osuc2.ps


Simulation results comparing EPRCA and EPRCA+ are presented. In
particular, EPRCA+ has fast transient performance, offers congestion
avoidance, has few parameters, and is parameter insensitive.

-----------------------------------------------------------------------
 o ATM Forum/94-0882: Rate Based Schemes: Mistakes to Avoid (September
   1994): af_9409-mistakes2.ps

One key advantage of the rate-based approach in general and the
explicit-rate approach in particular is that it provides considerable
freedom to switch manufacturers in terms of specific algorithms to
use. However, it is easy to misdesign a scheme unless the overload is
measured properly and the feedback is related to the control
properly. This contribution points out examples of such mistakes.

-----------------------------------------------------------------------
 o ATM Forum/94-0883: The OSU Scheme for Congestion Avoidance using
   Explicit Rate Indication (September 1994): af_osu2.ps

An explicit rate indication scheme for congestion avoidance in ATM
networks is proposed. The sources monitor their load and provide the
information periodically to the switches. The switches, in turn,
compute the load level and ask the sources to adjust their rates up or
down. The scheme achieves high link utilization, low delay, fair
allocation of rates among contending sources, provides quick
convergence and works for bursty traffic.


TR54	Source Behavior for ATM ABR Traffic Management: An Explanation 
	by Raj Jain, Shivkumar	Kalyanaraman, Sonia Fahmy and Rohit Goyal 

   This paper is a tutorial on TM4.0 source behavior.  The Available Bit 
Rate (ABR) service has been developed to support data applications over 
Asynchronous Transfer Transfer Mode (ATM) networks.  The network continuously 
monitors its traffic and provides feedback to the source end systems. 


TR53	ABR Switch Algorithm Testing:  A Case Study with ERICA 
	by Raj Jain, Sonia Fahmy, Shivkumar Kalyanaraman, and Rohit Goyal 

   This contribution discusses the requirements of ATM Available Bit Rate 
switch algorithms, and demonstrates how each of these requirements can be 
tested.  As a case study, the performance of the ERICA switch algorithm [1] 
is evaluated and the effect of some features and options of the algorithm is 
examined.  The requirements tested include:  efficiency, delay, fairness, 
transient and steady state performance, scalability, and adaptation to 
variable capacity and various source traffic models.  The performance of the 
algorithm is evaluated for various configurations and background traffic 
patterns.


TR52	Performance of TCP over UBR+ 
	by Rohit Goyal, Raj Jain, Shiv Kalyanaraman, and Sonia Fahmy 

   ATM switches respond to UBR congestion by dropping cells when their 
buffers become full.  TCP connections using the UBR service experience low 
throughput and low fairness.  For 100% TCP throughput each switch needs 
buffers equal to the sum of the window sizes of all the TCP connections.  
Intelligent drop policies can improve the performance of TCP over UBR with 
limited buffers.  The UBR+ service consists of enhancements to UBR for 
intelligent drop.  We found that Early Packet Discard (EPD) improves 
throughput but does not improve fairness.  Selective packet drop based on 
per-connection buffer occupancy improves fairness.  The Fair Buffer Allocation 
scheme further improves both throughput and fairness. 


TR51	ERICA Switch Algorithm:  A Complete Description  by Raj Jain, Shiv 
	Kalyanaraman, Rohit Goyal, Sonia Fahmy, and Ram Viswanathan 

   The ERICA switch algorithm has been discussed extensively in TM group in 
the past.  However, over the last two years, the algorithm has been 
substantially modified.  This contribution describes the current version of 
ERICA switch algorithm in complete detail. 
   The algorithm achieves both efficiency and fairness, and exhibits a fast 
transient response.  The development of the algorithm is traced, and the 
new approaches it uses to achieve its objectives are highlighted.  Several 
design and implementation aspects of the algorithmn are examined.  In 
addition, several enhancements to the algorithm are proposed and studied. 


TR50	Performance of TCP over ABR on ATM backbone and with various VBR 
	traffic patterns   by Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, 
	Rohit Goyal, and Jianping Jiang 

   We extend our earlier studies of buffer requirements of TCP over ABR [10, 
11,12] in two directions.  First, we study the performance of TCP over ABR in 
an ATM backbone.  On the backbone, the TCP queues are at the edge router 
and not inside the ATM network.  The router requires buffer equal to the sum 
of the receiver window sizes of the participating TCP connections.  Second, we 
introduce various patterns of VBR background trafic.  The VBR background 
introduces variance in the ABR capacity and the TCP traffic introduces 
variance in the ABR demand.  Some simple switch schemes are unable to keep up 
with the combined effect of highly varyijng demands and highly varying ABR 
capacity.  We present our experiences with refining the ERICA + switch scheme 
to handle these conditions. 


TR49	Range Image Segmentation Using a LEGION Network
	by Xiuwen Liu and DeLiang Wang 

   This paper presents a neural network approach for range image segmentation. 
We use a locally excitatory globally inhibitory oscillator network (LEGION) 
as the framework for range image segmentation.  Each oscillator in the LEGION 
network has excitatory lateral connections to the oscillators in its 
neighborhood as well as a connection with a global inhibitor.  The feature 
detector associated with each oscillator estimates the surface normal and 
curvature at its corresponding pixel location.  The lateral connection 
between two oscillators is established based on a similarity measure of their 
feature vectors.  The emegent behavior of the LEGION network gives rise to the 
segmentation result.  Unlike other methods, our scheme needs no assumption 
about the underlying structures in image data and no prior knowledge regarding 
the number of regions.  Experimental results using real range images are 
presented. 


TR48	Dynamic Carrier Allocation Strategies for Mobile Cellular Networks 
	by Xuefeng Dong and Ten H. Lai 

   In dynamic carrier allocation (DCA) strategies, carriers are assigned 
priorities.  Based upon how priorities are assigned to carriers, DCA 
strategies are classified into three classes:  static-priority, dynamic- 
praiority, and hybrid-priority strategies.  In this paper, we lay out a 
theoretical foundation for DCA strategies:  we first develop the concept of 
optimal reuse pattern and then propose a model for resource planning as well 
as three guiding principles for priority assignment in static- and hybrid-
priority DCA strategies.  These theoretical results will shed light on the 
strength and weakness of existing DCA strategies and lead us to developing 
more efficient strategies.  Specificially, we propose three new DCA strategies:
Column-Wise static-priority (CWSP), Column-Wise hybrid-priority (CWHP), and 
Two-Step dynamic-priority (TSDP).  The CWSP amd CWHP strategies are obtained 
from the Geometric strategy by following our guiding principles for priority 
assignment.  The TSDP strategy, on the other hand, was derived from the 
Nanda-Goodman strategy by applying our notion of optimal reuse patterns.  It 
not only attempts to reuse carriers as compactly as possible (just as other 
dynamic-priority strategies do), but also goes one step further by bringing 
the actual carrier reuse pattern closer to the optimal pattern. 
   The performance of the three new DCA strategies are simulated under both 
uniform and non-uniform traffic distributions.  The simulation results show 
that each one of the three new DCA strategies outperforms existing strategies 
in its own class, and the TSDP strategy has the best performance among all 
DCA strategies. 


TR47	Relaxation Oscillators with Time Delay Coupling 
	by Shannon R. Campbell and DeLiang Wang 

   We study networks of relaxation oscillators coupled with time delay 
synapses.  A pair of oscillators is analyzed and shown to attain loosely 
synchronous solutions for a wide range of initial conditions and time delays. 
Simulations of one and two dimensional oscillator networks indicate that 
locally coupled oscillators are also loosely synchronous.  Desynchronous 
solutions are possible when system parameters are varied.  To characterize 
loosely synchronous networks, we introduce a measure of synchrony, the 
maximum time difference between any two oscillators.  In locally excitatory 
globally inhibitory oscillator networks with time delays, we find that 
desynchronous solutions for different groups of oscillators are maintained, 
and the number of groups that can be segregated is related to the maximum time 
difference within each group.  To examine the maximum time difference, we 
display its histograms for oscillator networks in one and two dimensions.  
Also, a range of initial conditions is given so that the maximum time 
difference is contained as the system evolves. 

Keywords:  neural networks, relaxation oscillators, time delays, biological 
	   oscillations, synchronization 


TR46	Efficient Scatter Communication in Wormhole k-ary n-cubes with 
	Multidestination Message Passing 
	by Mohammad Banikazemi and Dhabaleswar K. Panda 

   This paper presents efficient ways to implement one-to-all scatter 
communication on k-ary n-cubes with multidestination message- passing. Earlier 
researchers have applied multidestination message-passing to support non-
personalized communications.  In this paper, for the first time in the 
literature, we demonstrate that multidestination message-passing can also be 
used to support efficient personalized communication.  Current parallel 
systems use either a sequential tree-based (ST) scheme or a binomial tree-
based (BT) scheme with unicast message-passing to implement scatter.  First, 
we show that the ST scheme achieves the lower bound just for a narrow range 
of message lengths.  Next, we show that the BT scheme has a better overall 
performance compared to ST but, it can never achieve the lower bound. Finally, 
we proposed a sequential multidestinatin tree-based (SMT) scheme using 
multidestination messages which minimizes the scatter latency as well as 
achieves the lower bound.  This scheme allows data for multiple destinations 
to be grouped together and sent as a single message using multidestination 
multicast/broadcast worms.  All three schemes are evaluated for a range of 
system sizes (k,n), communication start-up times (ts), propagation times (tp), 
and message lengths (l).  For small message sizes on 8x8 torus, it is shown 
that the SMT scheme can reduce the scatter latency up to a factor of 8 
compared to the ST scheme and up to a factor of 1.6 compared to the BT scheme. 
For a range of system sizes and (ts/tp) ratios, it is shown that the SMT 
scheme performs better than the ST and BT schemes for a wide range of 
message lengths.  Thus, these results demonstrate significant potential to 
be applied to design scalable collective communication libraries for current 
and future generation wormhole systems. 

Keywords:  Collecive communication, scatter, personalized communication, 
	   wormhole routing, k-ary n-cubes, path-based routing, and 
	   multidestination message passing. 


TR45	Reliable Hardware Barrier Synchronization Schemes  
	by Rajeev Sivaram, Craig B. Stunkel and Dhabaleswar K. Panda 

   Barrier synchronization is a crucial operation for parallel systems.  Many 
schemes have been proposed in the literature to achieve fast barrier through 
software, hardware, or a combination of these mechanisms.  However, none of 
these schemes emphasize fault-tolerant barrier operations.  In this paper, we 
describe inexpensive support that can be added to network switches for 
achieving reliable hardware-based barrier synchronization while recovering 
from lost or corrupted messages.  Necessary modifications to the switch 
architecture and the associated fault-tolerant message-passing protocols are 
presented.  Thes protocols are optimized for the no-fault case while providing 
means to detect the failure of any step of the operation and to recover from 
it.  The proposed scheme is evaluated with and without specialized support 
at the network interface and compared with similar approaches using software-
based schemes.  It promises significant potential to applied to switch-based 
parallel systems, espeically the emerging networks of workstations. 

Keywords:  Parallel computer architecture, collective communication, barrier 
	   synchronization, interprocessor communication, fault-tolerance, 
	   and networks of workstations. 


TR44	Classes as Assertions    by Neelam Soundararajan 

   How do we formally specify the relation between a base class and a derived 
class?  This question has two parts, a syntactic one, and a semantic one.  The 
syntactic part is of course the easier of the two and the answer to that part 
is the standard contra/co- variance requirement on the arguments and result 
of any base class method redefined in the derived class.  Our concern in 
the current paper is with the semantic part of the question, i.e., how do we 
specify the behavioral relation between the base class and the derived class? 
We show that the standard answer - which is the semantic counterpart of 
contra/co-variance - is too rigid, and does not allow some natural and common 
forms of inheritance.  We then propose a more flexible way to specify the 
relation, and show how different types of behavioral relations between base 
classes and derived classes may be specified using our notation. 


TR43	An Efficient Coterie-Based Mutual Exclusion Scheme With Fault-
	Tolerance Capability  by Guohong Cao and Mukesh Singhal 

   The performance of a mutual exclusion algorithm is measured by the 
number of messages exchanged per critical section execution and the delay 
between successive executions of the critical section.  There is a message 
complexity and synchronization delay trade-off in mutual exclusion algorithms. 
Maekawa's algorithm reduces message complexity to O(Square root of N); 
however, it increases the synchronization delay to 2T.  This paper presents 
an algorithm which reduces the synchronization delay to T (average message 
delay) and still has the low message complexity O(K) (K is the size of the 
quorum, which can be as low as log N)>  A correctness proof and detailed 
performance analysis are provided.  The paper identifies one possible 
deadlock in Maekawa's algorithm and proposes a fix for it. 

Key words:  Coteries, mutual exclusion, fault-tolerance. 


TR42	RMSP:  A Reliable Multicast Session Protocol for Collaborative 
	Continuous-Feed Applications  by Walid Mostafa and Mukesh Singhal 

   This report presents a portable reliable multicast session protocol for 
collaborative continuous-feed applications.  The protocol supports NxN group 
multicast communications at the session layer and runs on the top of 
existing reliable transport protocols (e.g., TCP).  It does not assume 
multicast support at the network layer.  The protocol is portable as a 
network-layer-independent library that is suitable for both LAN and WAN 
environments.  It multiplexes messages fropm different group members into 
transport packets to reduce the transport overhead per message.  It requires 
only two transport-level connections per node, irrespective of the multicast 
group size, to reduce the overhead of acknowledgements and retransmissions 
per node.  We compare the throughput of this protocol with the throughput of 
general sender-initiated and receiver-initiated reliable multicast transport 
protocols. 

Keywords:  Reliable Multicast, Transport Protocols, Session Multiplexing, 
	   Group Communication, and Distributed Systems. 


TR41	Alleviating Consumpton Channel Bottleneck in Wormhole-Routed k-ary 
	n-cube Systems   by Debashis Basak and Dhabaleswar K. Panda 

   NOTE:  This tech report is substantially different from a tech report of 
the same name published in 9/95, TR36. 

   This paper identifies performance degradation in wormhole routed k-ary 
n-cube networks due to limited number of router-to-processor consumption 
channels at each node.  Many recent research in wormhole routing have 
advocated the advantages of adaptive routing and virtual channel flow control 
schemes to deliver better network performance.  This paper indicates that 
the advantages associated with these schemes can not be realized with limited 
consumption capacity.  To alleviate such performance bottleneck, a new network 
interface design using multiple consumption channels is proposed.  To match 
virtual multiplexing on network channels, we also propose each consumption 
channel to support multiple virtual consumption channels.  The impact of 
message arrival rate at a node on the required number of consumption channel 
is studied analytically.  It is shown that wormhole networks with higher 
routing adaptivity, dimensionality, degree of hot-spot traffic, and number 
of virtual lanes have to take advantage of multiple consumption channels to 
deliver better performance.  The interplay between system topology, routing 
algorithm, nuymber of virtual lanes, messaging overheads, and communication 
traffic is studied through simulation to derive the effective number of 
consumption channels required in a system.  Using the on-going technological 
trend, it is shown that wormhole-routed systems can use up to 2-4 consumption 
channels per node to deliver better system performance. 

Keywords:  Parallel computer architecture, wormhole routing, k-ary n-cube, 
	   consumption Channel, virtual Channel, deterministic routing, 
	   adaptive routing, hot-spot traffic, performance evaluation, 
	   and interprocessor communication. 


TR40	Multicast on Irregular Switch-based Networks with Wormhole Routing 
	by Ram Kesavan, Kiran Bondalapati, and D.K. Panda 

   This paper presents efficient multicasting with reduced contention on 
irregular networks with switch-based wormhole interconnection and unicast 
message passing.  First, it is proved that for an arbitrary irregular network 
with a typical deadlock-free, adaptive routing, it may not be possible to 
create an ordered list of nodes to implement an arbitrary multicast in a 
contention-free manner with minimal number of communication steps.  Next, 
three different multicast algorithms are proposed with their respective node 
orderings to reduce contention:  switch-based ordering (SO), switch-based 
hierarchical ordering (SHO), and chain concatenation ordering (CCO).  A 
variation of a binomial tree-based communication pattern with unicast message 
passing is used on the above ordered lists to implement multicast.  The 
proposed multicast algorithms are compared with each other as well as with 
their naive random ordering (RO) algorithm for a range of system sizes, switch 
sizes, message lengths, degrees of connectivity, destination set sizes, and 
communication start-up times.  With larger system size and larger destination 
set size, the CCO algorithm is shown to be the best to implement multicast 
with reduced contention and minimum latency.  The SHO algorithm emerges as the 
second best.  Such results related to multicast on irregular networks are the 
first of their kind in the wormhole literature.  Thus, these results 
demonstrate significant potential to be applied to current and future 
generation NOW systems with irregular interconnection. 

Keywords:  Collective communication, Multicase, Broadcast, Wormhole Routing, 
	   Irregular Networks, Link Concention, and Networks of Workstations. 


TR39	An Efficient Protocol for Call Setup and Path Migration in IEEE 
	802.6 Based Personal Communication Networks  by Xuefeng Dong and 
	Ten-Hwang Lai 

   Recently, DQDB (IEEE 802.6) MAN has been proposed as a component of the 
Personal Communication Networks, in which base statinos of wireless infra-
structures are connected by a number of DQDBs which in turn are connected 
by bridges.  We propose a protocol for call setup and path migration in a 
cluster of DQDBs.  To pursue simplicity, efficiency, and fastness in path 
setup, we use a link-state-like routing method for path selection, and a 
source-routing-based protocol for path establishment.  The feasibility of 
the source routing protocol in a DQDB cluster is guaranteed by our novel 
labeling scheme.  Each label represents the routing direction from one 
DQDB to another such that a path can be represented by a sequence of labels. 
Because of the locality property of the labels, a few bits are enough to 
denote the labels unambiguously.  Using multi-bit labels instead of multi-bye 
physical addresses, a path can be specified in a 53-octet DQDB slot. 


TR38	Dependency Sequences and Hierarchical Clocks:  Efficient Alternatives 
	to Vector Clocks for Mobile Computing Systems  by Ravi Prakash and 
	Mukesh Singhal 

   Vector clock has been used to capture causal dependencies between 
processes in distributed computing systems.  It is not suitable for mobile
computing systems due to (i) lack of scalability:  its size is equal to the 
number of nodes, and  (ii) its inability to cope with fluctuatinos in the 
number of nodes.  This paper presents two efficient alternatives to vector 
clock, namely, sets of dependency sequences, and hierarchical clock.  Both 
the alternatives are scalable, and are immune to fluctuations in the 
number of nodes in the systems. 


TR37	Multitolerance (Extended Abstract)  by Anish Arora and 
	Sandeep S. Kulkarni 

   Multitolerance refers to the ability of a system to tolerate multiple 
classes of faults, each in a possibly different way.  Achieving this ability 
is essential, not only because increasingly systems are subject to multiple 
classes of faults, but because designing a uniform type of tolerance to all 
their fault-classes is often inappropriate or inefficient. 
	In this paper, we give a formal definition of what it means for a 
system to be multitolerant and we propose a method for the design of 
multitolerant systems.  Our method simplifies the design task by proceeding 
in a stepwise fashion:  In each step, tolerance to a new fault-class is added 
while preserving tolerance to the previously simple theory is exploited to 
ensure mutual interference-freedom among the detectors, correctors, and 
underlying system.  We demonstrate the method by designing novel, fully 
distributed, multitolerant programs. 

Keywords:  multiple fault-tolerances, design methods, detectors, correctors, 
	   interference-freedom, distributed systems software 


TR36	Failure Recovery based on Quasi-Syhnchronous Checkpointing in 
	Mobile Computing Systems   by D. Manivannan and M. Singhal 

   Mobile computing systems are expected to revolutionize the way computers 
are used.  Mobile hosts have small memory, a relatively slow processor and 
low power batteries, and commmunicate over low bandwidth wireless communica- 
tion links.  In this paper, we address the problem of failure recovery in 
mobile computing systems.  Any recovery method for mobile computing systems 
should take into consideration energy and communication bandwidth constraints 
under which mobile computers have to operate.  Synchronous checkpointing is 
not suitable for mobile systems since it involves high communication cost 
over a low bandwidth network.  Asynchronous checkpointing is not suitable 
because multiple checkpoints need to be stored in the stable storage and also 
some or all of the checkpoints taken may be useless for constructing 
consistent global checkpoints.  In this paper, we propose a low-overhead 
recovery algorithm based on a quasi-synchronous checkpointing algorithm for 
mobile computing systems.  The checkpointing algorithm preserves process 
autonomy by allowing them to take checkpoints asynchronously and uses 
communication-induced checkpoint coordination for the progression of the 
recovery line which helps bound rollback propagation during a recovery.  Thus, 
it has the easeness and low overhead of asynchronous checkpointing and the 
recovery time advantages of synchronous checkpointing.  The checkpointing 
algorithm ensures the existence of a recovery line consistent with any 
checkpoint of any process all the time.  The recovery algorithm exploits 
this feature to restore the system to a state consistent with the latest 
checkpoint of a failed process, in the event of a failure.  It uses 
selective pessimistic message logging at the receiver end to handle the 
messages lost due to rollback. 

Keywords:  Distributed checkpointing, mobile computing, failure recovery, 
	   fault-tolerance. 


TR35	Benefits of Processor Clustering in Designing Parallel Systems: 
	When and How?  by Debahis Basak and Dhabaleswar K. Panda 
	NOTE: A similar title was used for 1995/TR41 but this is a new 
	report. 

   Advances in multiprocessor interconnect technology are leading to high 
performance networks.  however, software overheads associated with message 
passing are limiiting the processors to get maximum performance from these 
networks, leading to under-utilization of network resources.  Several 
research studies are ongoing for designing messaging protocols and hardware 
to reduce such overheads.  However, with such protocols messaging overheads 
cannot be eliminated.  Even though the overheads can be lowered, these will 
continue to be reasonably high compared to the network speed.  In this 
paper we take an alternative approach and analyze on how to exploit 
emerging processor-cluster technology for designing high-performance and 
cost-effective parallel systems given that messaging overheads can limit 
maximum performance.  Though processor-clusters are being used in some 
systems in an ad hoc manner, there is no formal analysis in the literature 
to show when and how processor clusters benefit in designing high performance 
and scalable systems.  In this paper we present a design-space-graph 
framework for analyzing and solving this problem by considering processor-
clustering, messaging overheads, and network performance in an integrated 
manner.  Our analysis establishes the following three design guidelines.  
With messaging overheads constraining performance in a system, processor 
clustering can be used to build  a) an equal-sized system with a smaller 
netowrk or  b) a larger system with an equal-sized network.  With messaging 
overheads not being a constraint, a combination of processor clustering and 
wider channels can be used to build a range of larger-sized systems.  These 
guidelines are validated through simulation and experimentation.  All these 
guidelines lead to designing cost-effective and scalable parallel systems 
while delivering high performance. 

Keywords:  parallel architectures, interconnection networks, clustered 
	   architectures, hierarchical architectures, scalability,
	   messaging overheads. 


TR34	Modeling Modular Software Structure for Human Understanding 
	by  Stephen H. Edwards 

   People form internal mental models of the things they interact with in 
order to understand those interactions.  This psychological insight has been 
used by the human-computer interaction (HCI) community to build software 
systems that are more intuitive for end users, but it has only been informally 
applied to the problems of software designers, programmers, and maintainers. 
Conventional programming languages still do little to help client programmers 
develop good mental models of software subsystems. 
   To address this problem, we have developed the Abstract and Concrete 
Temnplates and Instances (ACTI) model of modular, parameterized software 
subsystems.  This model of software structuer addresses the needs of human 
software engineers who must reason about collections of interacting software 
parts during design, maintenance, and evolution. 
   ACTI is different from other module systems and models of software in 
several ways.  In ACTI, a subsystem never has any implicit dependencies, and 
never depends directly on any external definitions - all external 
dependencies are described through an explicit interface.  In addition, a 
subsystem specitifcation is meaningful by iteself, even without respect to 
any implementation.  Finally, a subsystem is more than just a collection of 
types and operations; it also includes:  an explicit model of behavior, an 
explicit model of all external dependencies, a collection of definitions used 
to contstruct and describe these models, and (potentially complex) substruc-
ture.  There are strong parallels between ACTI and other research on the 
understanding of modularly structured physical devices, particularly 
Functional Representation. 

Keywords:  Mental model, model-based specification, interfaces, bindings, 
	   generics. 


TR33	Quasi-Synchronous Checkpointing:  Models, Characterization, and 
	Classification  by D. Manivannan and Mukesh Singhal 

   Checkpointing algorithms are classified as synchronous and asynchronous 
in the literature.  In synchronous checkpointing, processes synchronize their 
checkpointing activities so that a globally consistent set of checkpoints is 
always maintained in the system.  Synchronizing checkpointing activity 
involves message overhead and process execution may have to be suspended 
during the checkpointing coordination, resulting in performance degradation. 
In asynchronous checkpointing, processes take checkpoints without any 
coordination with others.  Asynchronous checkpointing provides maximum 
autonomy for processes to take checkpoints; however, some of the checkpoints 
taken may not lie on any consistent global checkpoint, thus making the 
checkpointing efforts useless.  Asynchronous checkpointing algorithms in the 
literature can reduce the number of useless checkpoints by making processes 
take communication induced checkpoints, besides asynchronous checkpoints. 
We call such algorithms, quasi-synchronous.  Quasi-synchronous checkpointing 
algorithms are attractive because they improve the performance without 
introducing undesirable effects.  in this paper, we present a theoretical 
framework for characterizing and classifying such algorithms.  The theory not
only helps to classify and characterize the quasi-synchronous checkpointing 
algorithms, but also helps analyzing the properties and limitations of the 
algorithms belonging to each class; it also provides guidelines for 
designing and evaluating such algorithms.  This classification also sheds 
light on some open problems that remain to be solved. 

Keywords:  Causality, distributed checkpointing, consistent global checkpoint, 
	   failure recovery, fault-tolerance, zigzag paths. 


TR32	Syntax-Directed Construction of Program Dependence Graphs 
	by Mary Jean Harrold and Gregg Rothermel

   We present an algorithm that constructs program dependence graphs as 
a program is parsed.  For programs that contain only structured transfers 
of control, our algorithm does not require explicit control flow or 
postdominator information to compute exact control dependencies.  For 
programs that contain explicit transfers of control, our algorithm can 
determine whether these transfers of control are used in a structured way, 
and if so, compute control dependencies without explicit control flow or 
postdominator information.  When transfers of control are ill-behaved, our 
algorithm adjusts the control dependence information computed during the 
parse, to obtain exact control dependencies.  For many programs, our 
algorithm provides savings in time and memory, because it does not require 
prior computation of control flow or postdominator information.  However, 
our algorithm also calculates control flow information during the parse, 
and incorporates this information into the program dependence graphs that 
it constructs; the resulting graphs have a wider range of applicability 
than graphs that do not contain this information. 

Keywords:  control dependence, control flow, data flow, program dependence
	   graphs, software testing. 


TR31	Covariance, Contravariance, and Synchronization Constraints 
	by Neelam Soundararajan 

   Object Orientation and concurrent programming are a natural match.  
Objects correspond to processes in a concurrent program; a message from one 
object to another invoking a method of the latter, corresponds naturally 
to (message passing) interaction between processes in a concurrent program.  
Despite this close correspodence, progress in developing an OO approach to 
concurrency has been limited.  One important problem has been the so-called 
inheritance anomaly which is concerned with how and how easily synchronization 
constraints specified in a base class may be modified in a derived class.  Our 
concern in the current paper is more basic.  We address the question of what 
types of modifications in the synchronization constraints inherited from the 
base class ought to be permitted to be made in the derived class.  We identify 
the modifications that are reasonable, especially when the problem is viewed 
from the point of view of a client of the class.  We point out that when 
viewed from this perspective, some of the current proposals for expressing 
synchronization constraints allow precisely the wrong types of modifications 
to be made in the derived classes.  We also analyze the relation between this 
question and the problems that inheritance encounters already in the 
sequential case, such as the covariance versus contravariance conflict. 


TR30	Flexible Object Reconstruction from Temporal Image Series 
	by Jeremy Loomis, Zhaohua Ding, Xiuwen Liu, Kikuo Fujimura, and 
	Hideo Ishikawa 

   The problem considered is flexible object reconstruction from a temporal 
series of orthogonal silhouette image pairs.  The subject of study is a 
growing plant.  Three approaches are examined, namely, (i) a bottom-up method,
(ii) a spatial model-based method, and (iii) a spatiotemporal model-based 
method and their experimental results are presented.  For all the three 
approaches, the plant is encoded by free-form splines, but each uses different 
construction methods.  Merits of the approaches are examined in terms of 
robustness and accuracy. 

Keywords:  Shape representation and recovery, image sequence analysis, 
	   nonrigid motion, model-based vision. 


TR29	Hardware Assisted Volume Rendering of Unstructured Grids by 
	Incremental Slicing  by Roni Yagel, David M. Reed, Asish Law, 
	Po-Wen Shih, and Naeem Shareef 

   Some of the more important research results in computational science rely 
on the use of simulation methods that operate on unstructured grids.  
However, these grids, composed of a set of convex polyhedra, introduce 
exceptional problems with respect to data visualization.  Volume rendering 
techniques, originally developed to handle rectangular grids, show 
significant promise for general use with unstructured grids as well.  The 
main disadvantage of this approach, compared to isosurfaces, particles or 
other visualization tools is its non-interactive performance.  We describe 
an efficient method for rendering unstructured grids that is based on 
incremental slicing and hardware polygon rendering.  For a given view 
direction, the grid vertices are transformed to image space using available 
graphics hardware.  We then incrementally compute the 2D polygon-meshes 
that result from letting a set of equidistant planes, parallel to the 
screen plane, intersect (slice) the transformed grid.  Finally, we use the 
graphics hardware to render (interpolate-fill) the polygon-meshes and 
composite them in a front-to-back order.  We show that, in addition to being 
much faster than existing methods, our approach also provides adaptive control 
and progressive image generation. 


TR28	Potential Networking Applications of Global Positioning 
	Systems (GPS)  by Gopal Dommety and Raj Jain

   Global Positioning System (GPS) Technology allows precise determination of 
location, velocity, direction, and time.  The price of GPS receivers is 
falling rapidly and the applications are growing.  PCMCIA receivers that can 
connected to any notebook personal computers are available for $300-400 to 
end consumers. 
   The main goal of this study was to survey current applications of GPS to 
distributed systems and networks.  While GPS is appearing in the computer 
magazines very often and while many computer companies have announced GPS 
related efforts, most such efforts are in providing navigational guidance to 
drivers.  Digitized city maps along with a GPS sensor on a mobile computer 
provide directions to drivers.  A number of consortiums have been formed by 
companies such as IBM, Apple, Toshiba, Sony, and others.
   Currently, applications of GPS for distributed computation and networking 
are limited to measuring delays in Wolter and Golderman's DA-30 network 
analyzers and in clock synchronization in synchronous optical networks 
(SONET) used in telecommunication networks. 
   We have identified twenty new applications of GPS for distributed computing 
and networking.  These applications include circuit switching using syn-
chronized clocks, synchronous slotted systems, clock synchronization in 
distributed systems, database synchronization, connectionless real-time 
communication, one-way delay, delay based routing, time to live, resource 
location, location adaptive protocols, home vs office vs car, electronic 
fence, handoffs in wireless networks, prescheduled handoffs based on velocity 
and direction, adaptive transmission power control algorithms, directional 
antennas, temporary cell partitioning for congestion avoidance, peer-to-peer 
routing with limited range receivers, email delivery based on geographic 
location, distributed robot control and navigation, and equipment location 
marking for maintenance crew. 
   There two main obstacles to applications of GPS.  First is that GPS antennas
must point to open sky.  They cannot be used directly inside a building.  
Unless this is avoided by new antenna designs or by rebroadcasting GPS data 
inside a building, the use of GPS techniques is limited to improving the 
performance of distributed systems rather than proper operation.  This is 
similar to cache memories in computer systems.  Systems can operate without 
cache but perform better with cache.  The second obstacle is the pseudo-
random noise introduced in the GPS signals by the defense department to 
disallow other governments from using the full precision of GPS.  This is 
called selective availability.  Fortunately, this can be easily overcome 
by differential GPS techniques.
   Detailed lists of GPS products, current applications, addresses of 
manufacturers, and sources for further information are included in this report.


TR27	Classification and Local Error Estimation of Interpolation and 
	Derivative Filters for Volume Rendering 
	by Torsten Moeller, Raghu Machiraju, Klaus Mueller and Roni Yagel 

   We describe a new method for analyzing, classifying, and evaluating filters,
that can be applied to inerpolation filters, first derivative filters, and 
high order derivative filters.  Our analysis is based on the Taylor series 
expansion of a convolution sum and some assumptions on the behavior of the 
data function.  These assumptions are reasonable based on the frequency 
behavior of common sampled volume data, which can be approximated by the 
1/[(omega)^r]  in frequency space.  As one result of our analysis, we 
derive the need and the method for normalization of derivative filter 
coefficients.  As an example, we demonstrate the utilizatino of our methods
to the analysis of the class of cardinal cubic filters.  Since our technique 
is not restricted to interpolation filters, we can show that the Catmull-Rom
spline filter and its derivative are the most accurate reconstruction and 
derivative filter among this class of filters.  We show that the derivative 
filter has a much higher impact on the rendered volume than the interpolation 
filter.  We demonstrate the use of these optimal filters for accurate 
interpolation and gradient estimation in volume rendering.


TR26	Multiple Multicast with Minimized Node Contention on Wormhole k-ary 
	n-cube Networks   by Ram Kesavan and D.K. Panda 

   This paper presents a new approach to minimize node contention while 
performing multiple multicast/broadcast on wormhole k-ary n-cube networks 
with overlapped destination sets.  The existing multicast algorithms in the 
literature deliver poor performance underr multiple multicast because these 
algorithms have been designed with only single multicast in mind.  The new 
algorithms introduced in this paper do not use any global knowledge about 
the respective destination sets of the concurrent multicasts.  Instead, only 
local information and source-specific partitioning approach are used.  For 
systems supporting unicast message-passing a new SPUmesh (Source-Partitioned-
Umesh) algorithm is proposed and shown to be superior than the conventional 
Umesh algorithm [14] for multiple multicast.  Two different algorithms, 
SQHL (Source-Quadrant Hierarchical Leader) and SCHL (Source-Centered 
Hierarchical Leader), are proposed for systems with multidestination message-
passing and shown to be superior than the HL scheme [18].  All these 
algorithms perform  1) 5-10 times faster than the existing algorithms under 
multicast and  2) as fast as existing algorithms under single multicast.  
Furthermore, the SCHL scheme demonstrates that the latency of multiple 
multicast can, in fact, be reduced as the degree of multicast increases 
beyond a certain number.  Such results related to multiple multicast are the 
first of their kind in the wormhole literature.  Thus, these algorithms 
demonstrate significant potential to be used for designing fast and scalable 
collective communication libraries on current and future generation 
wormhole systems. 

Keywords:  Collective communication, multicast, broadcast, wormhole routing, 
	   meshes, k-ary n-cube, unicast and multidestination message passing, 
	   and path-based routing. 


TR25	A Safe, Efficient Regressin Test Selection Technique
	by Gregg Rothermel and Mary Jean Harrold 

   Regression testing is an expensive but necessary maintenance activity 
performed on modified software to provide confidence that changes are 
correct and do not adversely affect other portions of the software.  A 
regression test selection technique chooses, from an existing test set, 
tests that are deemed necessary to validate modified software.  Most 
regression test selection techniques depend on a particular test adequacy 
criterion or require prior knowledge of where code has been modified.  We 
present a new technique for regression test selection that is neither 
adequacy-based, nor requires prior knowledge of modifications.  Our 
algorithms construct control flow graphs for a procedure or program and 
its modified version, and use these graphs to select tests, from the original 
test set, that execute changed code.  We prove that under certain conditions, 
the set of tests our algorithms select includes every test, from the original 
test suite, that can expose faults in the modified procedure or program.  
Thus, under these conditions, the algorithms are safe.  Moreover, although 
our algorithms may select some tests that cannot expose faults, they are at 
least as precise as other safe regression test selection algorithms.  Unlike 
many other regression test selection algorithms, our algorithms handle all 
language constructs, and all types of program modifications.  We have 
implemented our algorithms; initial empirical studies indicate that our 
technique can significantly reduce the cost of regresion testing modified 
software. 

Keywords:  software maintenance, regression testing, selective retest, 
	   regression test selection. 


TR24	Reducing Cache Invalidation Overheads in Wormhole Routed DSMs 
	Using Multidestination Message Passing  
	by Donglai Dai and D.K. Panda

   Directory-based distributed shared memory (DSM) systems have drawn high 
interests in parallel computing research and industry in recent years.  
Current generation systems are limited to using point-to-point messages for 
cache invalidation requests and associated acknowledgments.  Such an 
approach incurs a large number of control messages, heavy network traffic, 
and high occupancy at home nodes.  In this paper we present a new approach 
to reduce cache invalidation overheads.  We introduce a set of multi-
destination-based reservation and gather worms to distribute invalidation 
requests and collect acknowledgments.  Six different grouping schemes are 
investigated in generating such multidestination worms in order to 
implement fully-mapped directory-based cache coherence on networks using 
deterministic (e-cube) or adaptive (turn-model) wormhole routing.  These 
schemes are evaluated for different applications and 2D mesh configurations 
through execution-driven simulation.  The interplay among overall execution 
time, the number of invalidation messages, total network traffic, routing 
adaptivity is studied.  The results indicate that up to 15% reduction in 
overall execution time can be achieved by using multidestination messages.  
It is observed that turn-model routing with a density-dependent column 
grouping leads to maximum benefits.  The results demonstrate that current 
and future DSM systems can take advantage of these schemes to deliver 
better performance. 

Keywords:  Distributed shared memory systems, wormhole routing, directory- 
	   based protocols, cache coherence, meshes, multidestination 
	   mechanism, and path-based Routing. 


TR23	Analyzing Regression Test Selectino Techniques
	by Gregg Rothermel and Mary Jean Harrold 

   Regression testing is a necessary but expensive maintenance activity aimed 
at showing that code has not been adversely affected by changes.  Regression 
test selection techniques reuse tests from an existing test suite to test a 
modified program.  Many regression test selection techniques have been 
proposed; however, it is difficult to compare and evaluate these techniques 
because they have different goals.  This paper outlines the issues relevant to 
regression test selection techniques, and uses these issues as the basis for 
a framework within which to evaluate the techniques.  We illustrate the 
application of our framework by using it to evaluate existing regression test 
selection techniques.  The evaluation reveals the strengths and weakness 
of existing techniques, and highlights some problems that future work in this 
area should address. 

Keywords: software maintenance, regression testing, selective retest, 
	  regression test selection. 


TR22	A Dynamic Approach to Location Management in Mobile Computing Systems
	by  Ravi Prakash  and  Mukesh Singhal

   Managing location information of mobile nodes is an important issue in 
mobile computing systems.  There is a trade-off between location update 
effort (when a node moves) and node finding effort.  In this paper we present 
a dynamic location management strategy that has the following features:  
(i) all location servers need not maintain location information about every 
mobile node  (ii) a coterie based approach is adopted for location update 
and find,  (iii) every node move does not result in location updates,  (iv) 
location updates are done at a subset of location servers,  (v) a subset of 
location servers, corresponding to a mobile node, for location update and find 
operations is dynamic,  (vii) the dynamic nature of these sets helps alleviate 
situations of heavy on some location servers, when a large number of mobile 
nodes are concentrated in a small geographical area.  Thus, location 
management is done efficiently, and responsibility is shard fairly among 
location servers. 

Keywords:  mobile computing, location management, hashing. 


TR21	Building Efficient LImited Directory-Based DSMs:  A Multidestination 
	Message Passing Based Approach  by Donglai Dai and D.K. Panda 

   A cost-effective distributed shared memory (DSM) system typically uses 
a limited directory protocol to enforce cache coherence.  This paper 
presents a new family of protocols, called Limited directory with Region-
based Broadcast (Limited-RB), to efficiently implement cache coherence in 
wormhole routed DSM systems.  This protocol family uses multidestination-
based cache invalidation mechanisms to distribute invalidation requests to 
and collect the associated acknowledgments from separate regions.  As a 
result, a write invalidation can be accomplished with fewer messages, less 
network traffic, and reduced occupancy at home nodes.  These reductions 
contribute to decreasing invalidation latency and improving overall system 
performance.  Directory organizatino under this new protocol is developed 
for 2D systems with e-cube routing and evaluated through simulations for a 
set of applications.  The results indicate that with a small directory 
storage, the Limited-RB protocol using a directory storage equivalent to 3 
pointers outperforms the LimitLESS protocol using 5 pointers.  The results 
demonstrate that faster and less expensive DSM systems can be built using the 
new approach. 

Keywords:  Distributed shared memory systems, wormhole routing, directory-
	   based protocols, cache coherency, meshes, multidestination 
	   mechanism, and path-based routing. 


TR20	A Foundation for Designing Deadlock-free Routing Algorithms in 
	Wormhole Networks  by D.N. Jayasimha, D. Manivannan, Jeff A. May, 
	Loren Schwiebert, and Stephen L. Hary

   This paper provides necessary and sufficient conditions for deadlock-free 
unicast and multicast routing with the path-based routing model in 
interconnection networks which use the wormhole as switching technique.  The 
theory is developed around three central concepts:  channel waiting, False 
Resource Cycles, and valid destination sets.  The first two concepts are 
suitable extensions to those developed for unicast routing by two authors of 
this paper; the third concept has been developed by Lin and Ni.  The necessary 
and sufficient conditions can be applied in a straightforward manner to prove 
deadlock freedom and to find more adaptive routing algorithms for collective 
communication.  The latter point is illustrated by developing two routing 
algorithms for multicast communication in 2-D mesh architectures.  The first 
algorithm uses fewer resources (channels) than a proposed algorithm in 
literature but achieves the same adaptivity.  The second achieves full 
adaptivity for multicast routing. 


TR19	Universal Constructs in Distributed Computations 
	Ajay D. Kshemkalyani and Mukesh Singhal 

   This paper presents two classes of universal constructs that occur in 
distributed computations and explores their properties.  It first examines a 
pair of universal constructs termed IO and OI intervals that occur at 
nodes in distributed computations.  These universal constructs are used as 
building blocks to design two global universal constructs, termed segments 
and paths, that occur across nodes in distributed computations.  These 
constructs are a generalization of causal chains in a distributed computation. 
While a causal chain only captures the causal relation, it turns out that 
message chains in a distributed computation that do not capture causality play 
a significant role in the analysis of a distributed computation.  We show 
that a number of key concepts and structures characterizing distributed 
computations are special cases of and can be expressed in terms of the two 
proposed constructs. 

Key words: Causality, Crown, Distributed computation, Path, Segment, Zigzag
	   path. 


TR17	Stepwise Design of Tolerances in Barrier Computations
	Sandeep S. Kulkarni and Anish Arora

   We design a multitolerant program for synchronizing the phases of 
processes in a distributed system.  The tolerances of our program enable 
the system processes to:  (i) compute all phases correctly in the presence 
of faults that corrupt variables in a detectable manner, and (ii) compute 
at most a bounded number of phases incorrectly in the presence of faults 
that corrupt variable sin an undetectable manner.
   Our design proceeds in a stepwise fashion:  In each step, the states of 
the program are augmented - by adding new variables - and the actions of the 
program are transformed - by restricting old actions and adding new actions. 
We show that the program resulting from each step not only provides added 
functionality but also provides added (i) masking tolerance to the detectable 
corruptions of the newly added variables, and (ii) time-optimal nonmasking 
tolerance to undetectable corruptions of the newly added variables.  Since 
the program thus tolerates the corruptions of all of its variables, it thus 
automatically possesses stabilizing tolerance. 

Keywords:  multitolerance; masking, nonmasking, and stabilizing tolerance; 
	   detectable and undetectable faults; stepwise design; distributed 
	   systems. 


TR16	Finding Consistent Global Checkpoints in a Distributed Computation 
	D. Manivannan, Robert H.B. Netzer, and Mukesh Singhal

   Finding consistent global checkpoints of a distributed computation is 
important for analyzing, testing, or verifying properties of these 
computations.  In this paper we present a theoretical foundation for 
finding consistent global checkpoints.  Given an arbitrary set S of local 
checkpoints, we prove exactly which sets of the local checkpoints can be 
combined with S to build consistent global checkpoints, and we present an 
algorithm for finding all such global checkpoints.  The minimal and maximal 
consistent global checkpoints are presented as special cases.  The results 
are based on the notion of zigzag paths introduced by Netzer and Xu [4].  
We also present a method for finding zigzag paths using the rollback-
dependency graph introduced by Wang [17, 16]. 

Keywords:  casuality, distributed checkpointing, consistent global 
	   checkpoint collection, failure recovery, fault tolerance 


TR15	Efficient Broadcast and Multicast on Multistage Interconnection 
	Networks using Multiport Encoding by Rajeev Sivaram, Dhabaleswar K.
	Panda, and Craig B. Stunkel

   This paper proposes a new approach to implement fast multicast and 
broadcast in multistage interconnection networks (MINs) with multiport encoded 
multidestination worms.  For a MIN with k x k switches and n stages such 
worms use n header flits each.  One flit is used for each stage of the 
network and it indicates the output ports to which a multicast message needs 
to be replicated.  A multiport encoded worm with (d1, d2 . . ., dn with di 
in the closed interval 1 to k) degrees of replication for the respective 
stages is capable of covering (d1 x d2 x . . . x dn) destinations with a 
single communication start-up.  Grouping algorithms are presented to derive 
the associated multiport encoded worms for any multicast with an arbitrary set 
of destinations.  Using these worms a multinomial tree-based scheme is 
proposed to implement the multicast.  This scheme significantly reduces 
broadcast/multicast latency compared to schemes using unicast messages [27].  
Simulation studies indicate that improvement in broadcast/multicast latency up 
to a factor of 8 is feasible using the new approach.  Interestingly, this 
approach is able to implement multicast with reduced latency as the number 
tof destinations increases beyond a certain number.  Compared to implementing 
unicast messages this approach requires little additional logic at switches. 
Thus, the scheme demonstrates significant potential for implementing 
efficient collective communication operations on current and future MIN-
based systems. 

Keywords:  Parallel architectures, multistage interconnection networks, 
	   interprocessor communication, collective communication, broadcast 
	   and multicast algorithms, wormhole routing, and virtual cut-through.


TR14	Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus 
	Networks:  A Diagonal-Propagation Approach  by Yu-Chee Tseng, Ting-
	Hsien Lin, Sandeep K.S. Gupta, and Dhabaleswar K. Panda

   All-to-all personalized communication, or complete exchange, is at the 
heart of numerous applications in parallel computing.  Several complete 
exchange algorithms have been proposed in the literature for wormhole 
meshes.  However, these algorithms, when applied to tori, can not take 
advantage of wrap-around interconnections to implement complete exchange 
with reduced latency.  In this paper, a new diagonal-propagation approach 
is proposed to develop a set of complete exchange algorithms for 2D and 3D 
tori.  This approach exploits the symmetric interconnections of tori and 
allows to develop a communication schedule consisting of several contention-
free phases.  These algorithms are indirect in nature and they use message 
combining to reduce the number of phases (message start-ups).  It is shown 
that these algorithms effectively use the bisection bandwidth of a torus 
which is twice that for an equal sized mesh, to achieve complete exchange 
in time which is almost half of the best known complete exchange time on 
an equal sized mesh.  The effectiveness of these algorithms are verified 
through simulation studies for varying system and technological parameters. 
It is also demonstrated that synchronous implementation of these algorithms 
(by introducing barriers between phases) leads to reduced latency for complete 
exchange with large messages.  Asynchronous implementations are shown to be 
better for smaller messages. 

Key Words:  Collective communication, complete exchange, distributed memory 
	    systems, interprocessor communication, parallel computing, torus, 
	    wormhole routing. 


TR13	Multitolerance in Distributed Reset
	by Sandeep S. Kulkarni and Anish Arora

   We design in this paper a multitolerant, distributed program that enables 
a process in a distributed system to reset the state of all system processes 
as and when necessary.  More specifically, our distributed program exhibits 
two tolerances:  (i) in the presence of any finite number of fail-stop and 
repair faults of processes, it is "masking" tolerant, by which we mean that 
it completes every distributed reset correctly; (ii) in the presence of 
transient faults, it is "stabilizing" tolerant, by which we mean that upon 
starting from an arbitrary state it eventually reaches a state from where it 
completes every distributed resset correctly.  These tolerances are designed 
one by one:  We first augment a fault-intolerant distributed reset program 
so that it becomes masking tolerant to fail-stops and repairs.  We then 
augment the resulting program so that it becomes stabilizing tolerant to 
transients, while still retaining its masking tolerance to fail-stops and 
repairs.  Of special note is the fact that our program admits a bounded space 
implementation. 

Keywords:  robustness, distributed systems, masking, nonmasking, and 
	   stabilizing fault-tolerance. 


TR12	Performance Study of Buffer Management Schemes under Multicasting
	Traffic in ATM Switching Nodes  by Khalid H. Sheta and Mukesh 
	Singhal

   Multimedia applications in broadband ISDN present various types and 
classes of traffics such as video teleconferencing, broadband telephony and 
large file transfers.  Multicasting is important in multimedia applications. 
Since cell loss in a multicast packet will cause the retransmission of the 
entire packet causing excessive load in the network, the ATM switch should 
provide for the reduction of cell loss of multicast traffic.  The reduction 
inn cell loss of a particular type of traffic should be done in coordination 
with the QoS parameters required from other classes within each traffic type. 
Buffering schemes are used in ATM switches to achieve the required levels of 
QoS.  In this paper, we develop a new buffering scheme which considers traffic 
types that include multicasting and nonmulticasting components with two 
existing schemes, No Priority and Push Out.  We compare the new buffering 
scheme with two existing schemes, No Priority and Push Out.  We show that 
the new buffering scheme performs better than the two other schemes by 
reducing cell loss of multicast cells.  We also introduce a new cell 
scheduling policy that produces lower delay for the delay sensitive class. 
The new buffering and scheduling schemes provide the ability to tune the cell 
loss and delay of each class of cells within each traffic type. 

Key words:  Multicast, ATM switching and Buffer Schemes. 


TR11	An Optimal Ray Traversal Scheme for Visualizing Colossal Medical 
	Volumes  by Asish Law and Roni Yagel 

   Modern comptuers are unable to store the complete data of high resolution 
medical images in main memory.  Even on secondary memory (disk), such 
large datasets are sometimes stored in a compressed form.  At rendering time, 
parts of the volume are requested by the ray tracing algorithm and are 
loaded from disk.  If one is not careful, the same regions may be (decom-
pressed and) loaded to memory several times.  Instead, a coherent algorithm 
should be designed that minimizes this thrashing and optimizes the time 
and effort spent to (uncompress and) load the volume.  We present an 
algorithm that divides the volume into cubic cells, each (compressed and) 
stored on disk, in contrast to the more common slice-based storage.  At 
rendering time, each cell is allocated a queue of rays.  For a sequence of 
images, all rays are spawned and queued at the cells they intersect first.  
Cells are loaded, one at a time, in front-to-back (FTB) order.  A loaded 
cell is rendered by all rays found in its queue.  We analyze the algorithm 
in detail and demonstrate its advantages over existing ray casting volume 
rendering. 


TR10	The Active-Ray Approach to Rendering on Distributed Memory 
	Multiprocessors  by Asish Law and Roni Yagel

   Object dataflow is a popular approach used in parallel rendering.  The 
data representing the 3D scene is statically distributed among processors 
and objects are fetched and cached only on demand.  Most previous methods 
were implemented on shared memory architectures and exploited only object-
space coherency to reduce cache misses.  In this paper, we propose an 
efficient model for object dataflow on distributed memory machines.  The 
'Active Ray Tracing' algorithm is introduced and its ray-stacking mechanism 
is used to support latency hiding by postponing computation on inactive 
rays.  We also optimize memory usage by letting objects migrate and replicate 
at different processors rather than the common static assignments.  Our cache-
only-memory approach uses a distributed directory scheme to trace the location 
of objects at other nodes.  A mechanism to minimize network congestion was 
implemented which optimizes channel utilization.  Unlike previous methods, 
our approach can benefit from temporal coherence and effecatively eliminates 
communication costs in successive frames.  We implemented a volume ray 
casting instance of the algorithm on the Cray T3D and achieved higher 
efficiency and scalability than existing algorithms.  We achieve interactive 
frame rates of 20 Hz for 128 (to the third power)  volume, and 3 Hz for 256 
(to the third power)  volume on 128 processors. 

Keywords:  parallel rendering, distributed systems, CRAY T3D, ray tracing 
	   volume rendering. 


TR09	Wavelet Based Feature Driven Identification and Enhancement of 
	Medical Images  by Raghu Machiraju, Ajeetkumar Gaddipati, and 
	Roni Yagel 

   Much information in an image or a volume is concentrated in a few regions.
These regions are dominated by discontinuities in the intensity of the image. 
Much has been done to analyze and represent images by multiresolution 
approaches.  Wavelet based approaches are gaining popularity although they 
are far from bein extremely viable.  In this paper we present a wavelet based 
image analysis algorithm that identifies features persisting through the 
scales.  This is achieved through the notion of sub-band combining which 
essentially computes a function of the sub-band (Details) information at 
each point.  This sub-band information is obtained through the use of the 
wavelet transform.  We then show how the method can be employed to enhance 
or smooth features in an image.  Also we allude to the use of this method 
to compression and volume rendering. 


TR08	Reconstruction Error Characterization and Control:  A Sampling 
	Theory Approach  by Raghu Machiraju and Roni Yagel 

   Reconstruction is prerequisite whenever a discrete signal needs to be 
resampled as a result of transformation such as texture mapping, image 
manipulation, volume slicing, and rendering.  We present a new method for the 
characterization and measurement of reconstruction error in spatial domain. 
Our method uses the Classical Shannon's Sampling Theorem as a basis to develop 
error bounds.  We use this formulation to provide, for the first time, an 
efficient way to guarantee an error bound at every point by varying the size 
of the reconstruction filter.  We go further to support position-adaptive 
reconstruction and data-adaptive reconstruction which adjust filter size to 
the location of reconstruction point and to the data values in its vicinity. 
We demonstrate the effectiveness of our methods with 1D signals, 2D signals 
(images), and 3D signals (volumes). 


TR07	Multimedia Database Systems:  Challenges and Opportunities 
	by Yelena Yesha and Mukesh Singhal

   Since multimedia systems typically require storage, retrieval, and 
manipulation of huge amounts of information, database systems play a key role 
in the design of high-performance multimedia systems.  This article 
examines the issues in the design of multimedia database systems and 
explores the current state of the art. 

Key Words:  Multimedia systems, multimedia databases, information 
	    systems. 


TR06	The Intractability of Face Reachability in 3D Visit-Once Piecewise- 
	Constant Derivative Systems  by John F. Kolen and Feng Zhao 

   We consider a class of three-dimensional dynamical systems whose state 
spaces are defined by piecewise constant vector fields and whose trajectories 
never return to a state-space region once they exit the region.  We 
demonstrate that the reachability problem for this class of systems is 
decidable while the computation is provably intractable.  We prove the 
intractability (i.e., PSPACE-completeness) via reduction of satisfiability 
of quantified boolean formulas to this reachability problem.  This result 
extends the work of Asarin, Maler, and Pnueli (1995) that proves computational 
undecidability for three-dimensional constant-derivative systems. 


TR05	Designing Processor-cluster Based Systems:  Interplay Between 
	Cluster Organizations and Collective Communication Algorithms 
	by D. Basak and D.K. Panda 

   Advancements in VLSI has made it attractive to package multiple processors 
into a single multichip or a board module.  There is an increasing trend 
towards using such processor-clusters in large multiprocessor design.  Past 
research on designing processor-cluster based systems has focused mainly in 
studying the packaging technologies affecting the inter-cluster network.  
To make processor-cluster based multiprocessor design more attractive, there 
is a strong need to understand the details about the topology inside the 
cluster, its memory organization, and the impact of this organization on 
system performance.  In this paper we focus on such aspects of processor-
cluster design with an overall objective to support a logically shared address 
programming model.  We analyze the communication costs for accessing inter-
cluster and intra-cluster memories under different cluster organizations.  The 
merits of these organizations are evaluated based on the performance of 
collective communication algorithms, which occur frequently in applications. 
In this paper we focus on implementing the broadcast collective communication 
algorithm, Umesh, on clustered systems.  Our results indicate that cluster 
organizations like bus and crossbar which allow memory inside a cluster to be 
accessed without messaging overheads, outperform other organizations because 
of faster intra-cluster access.  We also demonstrate that such faster access 
can be exploited to design better algorithms on clustered systems.  We propose 
a new algorithm - clus_mesh for broadcasting on clustered meshes.  For 
reasonably faster communication within clusters, this algorithm can outperform 
the existing umesh algorithm by up to 20%. 

Keywords:  parallel architectures, interconnection networks, clustered 
	   architectures, hierarchical architectures, collective 
	   communication, broadcasting. 


TR04	Spatial Aggregation:  language and applications  
	by Christopher Bailey-Kellogg, Feng Zhao, and Kenneth Yip. 

   This paper describes the spatial aggregation language and its 
applications.  Spatial aggregation is a framework for organizing computations
around image-like, analogue representations of physical processes in data 
interpretation and control tasks; it transforms a numerical input field to 
successively higher-level descriptions by applying a small, identical set 
of operators to each layer given a metric, neighborhood relation and equiv-
alence relation. 
   The spatial aggregation language provides two abstract data types (ADTs) - 
neighborhood graph and field - and a set of interface operators for con-
structing the transformations of the field.  The language consists of a 
library of component implementations from which a user can mix-and-match 
and specialize for a particular application.  The modular design of the ADTs
supports language extensions and user control over tradeoffs such as 
efficiency vs. generality.  We illustrate the use of the language with 
several examples ranging from region growing in image analysis to trajectory 
grouping in dynamics interpretation.  Programs for these different task 
domains can be written in a modular, concise fashion in the spatial 
aggregation language. The language allows users to isolate and express 
important computational ideas while hiding low-level details. 

Content Areas:  Qualitative Reasoning, geometric/spatial reasoning, 
		programming language, ontologies, applications. 


TR03	An Optimal Algorithm for Generalized Causal Message Ordering
	by Ajay D. Kshemkalyani and Mukesh Singhal

   This paper presents an optimal algorithm for enforcing causal message 
ordering.  The algorithm works with non-FIFO channels and allows a process 
to multicast to arbitrary and dynamically changing process groups.  The 
algorithm achieves optimality by transmitting the bare minimum causal 
dependency information and using an encoding scheme to represent and 
transmit this information.  The algorithm is shown to be optimal both in 
space complexity of message overheads and in space complexity of message 
logs. 

Key Words:  Causal message ordering, distributed systems, optimal 
	    synchronization, concurrency. 


TR02	The OSU Scheme for Congestion Avoidance in ATM Networks Using 
	Explicit Rate Indication  by Raj Jain, Shiv Kalyanaraman and 
	Ram Viswanathan 

   An explicit rate indication scheme for congestion avoidance in computer and 
telecommunication networks is proposed.  The sources monitor their load and 
provide the information periodically to the switches.  The switches, in turn, 
compute the load level and ask the sources to adjust their rates up or 
down.  The scheme achieves high link utilization, fair allocation of rates 
among contending sources and provides quick convergence.  A backward 
congestion notification option is also provided.  The conditions under which 
this option is useful are indicated. 


TR01	An Information System to Support Collaborative Brain-Tumor Research 
	by  Sandra A. Mamrak, John Boyd, Ivan, Ordonez, Sandra L. 
	Cottingham, and Allan J. Yates

   A Neuro-Oncology Information System (NOIS) has been developed to support 
brain tumor research.  The system has been adopted for use by a group of 
researchers at several centers in the U.S.A. who are collaborating to input, 
store, and retrieve information concerning tissue specimens, 
corresponding clinical information, and eventual research results.  The 
NOIS project is targeted to support medical research, rather than clinical 
treatment or organ registry.  Thus, the context in which the NOIS is used 
is characterized by an ever-changing set of needs and requirements.  The 
NOIS has been implemented to continually accommodate these changing needs. 
In particular, the project follows the software life-cycle approach to 
motivate and guide the continuing design, implementation and evaluation of 
the system. 

Keywords for indexers:  Brain Neoplasms, Computer Communications Networks, 
Computing Methodologies, Glioma, Information Systems, Medical Informatics 
Applications, Medical Informatics Computing, Medical Oncology, Medical 
Research, Tissue Banks