Technical Report Series
		
	Computer and Information Science Research Center
	Department of Computer and Information Science
		The Ohio State University
		    2015 Neil Avenue,
		Columbus, OH  43210-1277

**********************NOTE*********************

	Abstracts for 1997

NOTE:	some titles may be available ONLY via paper (This is noted after
	the title/author section.)

If you have trouble accessing a file, please send email to
tfletch@cis.ohio-state.edu

1997

TR56	A Methodology for Detecting Violation of Real-Time Constraints in 
	Secure Electronic Commerce Transactions 
	by  Srividhya Subramanian and Mukesh Singhal 

   In electronic commerce applications, it is often critical to transact 
within a short stipulated time.  In such applications, it is financially 
damaging to the customer if the transaction does not complete within the 
stipulated time.  In this paper, we describe various situations in the 
stock market (one of the biggest electronic commerce markets) requiring 
real-time transactions.  We propose a methodology to enhance any electronic 
commerce protocol to a real-time aware electronic commerce protocol.  The 
methodology consists of two parts:  the first part adds a timestamp to 
some messages in the original protocol and the second part augments the 
protocol with a mechanism to detect violation of real-time constraints.  We 
show that the methodology does not violate any of the useful properties 
present in existing electronic commerce protocols.  As a result, the 
methodology may be successfully applied to any existing electronic commerce 
protocols.  The methodology is illustrated by applying it to any existing 
well known electronic commerce protocol, namely, the NetBill protocol. 

Keywords:  Electronic commerce, real-time, transaction, security, atomicity, 
	   privacy, anonymity, stock market. 


TR55	An Empirical Investigation and Comparison of Program Spectra 
	by Mary Jean Harrold, Gregg Rothermel, Rui Wu, and Liu Yi 

   A variety of expensive software maintenance and testing tasks require a 
comparison of the behaviors of program versions. Program spectra have 
recently been proposed as a heuristic for use in performing such comparisons.
To assess the potential usefulness of spectra in this context, we conducted 
an experiment that examined the relationship between program spectra and 
program behavior, %(measured in terms of program failure behavior) and 
empirically compared several types of spectra.
   This paper reports the results of that experiment. 


TR54	Accurate Low-Contrast 3D Cone-Beam Reconstruction With Algebraic 
	Methods  by  Klaus Mueller, Roni Yagel, and John J. Wheller 

   This paper examines the use of the Algebraic Reconstruction Method (ART) 
and related techniques to reconstruct 3D objects from a relatively sparse set 
of cone-beam projection data. Although ART has been widely used for cone-beam 
reconstruction of high-contrast objects, e.g. in computed angiography, we are 
interested in the more challenging low-contrast case which represents a little
investigated scenario for ART. Preliminary experiments indicate that for cone 
angles greater than 20°, traditional ART produces reconstructions with 
strong aliasing artifacts, obliterating much object detail. By analyzing the 
reconstruction process using signal processing principles it is revealed that 
the source of these artifacts is the non-uniform reconstruction grid sampling 
and correction by the cone-beam rays. To eliminate these errors, we devise a 
new way of computing the weights of the reconstruction matrix. This new 
method is more adequate for cone-beam and replaces the usual constant-size 
interpolation filter by one whose size and amplitude is dependent on the 
source-voxel distance. Doing so enables us to generate alias-free 
reconstruction at little extra cost. An alternative analysis reveals that 
Simultaneous ART (SART) has also great potential to produce reconstruction 
without cone-beam related artifacts, however at greater computational cost. 
Putting everything together, we thoroughly investigate the influence of 
various ART parameters, such as volume initialization, relaxation coefficient 
l, correction scheme, and number of iterations, on reconstruction quality.  
Using a 3D version of the Shepp-Logan phantom, we find that ART typically 
requires only three iterations to render a reconstruction result close to the
optimum (given proper parameter settings). Thus we conclude that ART is 
potentially not any costlier than Filtered Backprojection (FBP) techniques, 
particularly if one considers the fact that ART only requires a fraction of 
FBP's projection set.


TR53	Fast Implementations of Algebraic Methods for 3D Reconstruction from 
	Cone-Beam Data   by Klaus Mueller, Roni Yagel, and John J. Wheller 

   The prime motivation of this work is to devise techniques that make the 
Algebraic Reconstruction Technique (ART) and related methods more efficient 
for routine clinical use, while not compromising their accuracy. In 
particular, we strive to push the overall cost for an ART reconstruction as 
close as possible to the theoretical cost for a reconstruction obtained with 
Filtered Backprojection (FBP). While we focus mostly on fast implementations 
of ART-type methods in the context of 3D cone-beam reconstruction, different 
portions of the material presented here are also applicable to speed up 
reconstruction from fan-beam and parallel-beam data. 
   It was shown in previous research that three iterations are sufficient to 
obtain a high quality reconstruction for low-contrast cone-beam. Based on the 
observation that ART typically only requires only half the projections of FBP,
we conclude that if the overall cost for ART's projection-backprojection 
operations could be cut in half, then one could obtain an ART implementation 
whose cost is close to the theoretical cost of FBP. To achieve this goal, we 
first survey existing projection algorithms and find that these algorithms 
either lack accuracy or speed, or are not suitable for cone-beam 
reconstruction. We hence devise a new and more accurate extension to the 
splatting algorithm, which is a well-known voxel-driven projection method. 
We also describe a new 3D ray-driven projector that is considerably faster 
than the voxel-driven projector, and at the same time more accurate and also 
perfectly suited for the demands of cone-beam. We then devise caching schemes 
for both ART and Simultaneous ART (SART). These schemes minimize the number 
of redundant computations for projection and backprojection and, at the same 
time, are very memory-conscious. We conclude that with caching and the 
proposed fast projection algorithm the computational cost of ART can be 
reduced to a level close to FBP's computational effort. 


TR52	Fast Volume Rendering of Curvilinear Data Sets Using a Splatting 
	Approach  by Torsten Moeller, Roger Crawfis, and Nelson Max 

   We develope a new algorithm, that visualizes curvilinear grids fast and 
accurately.  Our algorithm is based on the concept of splatting.  In order 
to reconstruct the underlyuing function we use a Gaussian reconstruction 
kernel.  In computational domain the sampling of the data field is regular 
and therefore we have spherical reconstruction kernels.  However, the 
mapping into physical space is non-linear and the spherical reconstruction 
kernels from computational space are mapped to what we call "bean-shapes," 
which are usually not ellipsoids.  We approximate these bean-shapes with 
ellipsoids and project them to the screen.  Our algorithm is similar to Mao's 
splatting algorithm, but is much more efficient and less computatinoally 
involved.  It also more faithfully represents the underlying data. 


TR51	Elm:  An Algorithm to Approximate a t-Expected Partition 
	by  Christopher Jermaine and Renee J. Miller 

   We have previously proposed a new metric of association between values 
called unexpectedness that is based on information content.  This paper 
describes Elm, a novel, tree-based algorithm that builds a model of the 
data such that the information content, or unexpectedness, of any trend 
present in the actual data but not in the model is bounded. 


TR50	Efficient Collective Communication on Heterogeneous Networks of
	Workstations  By Mohammad Banikazemi, Vijay Moorthy and 
	Dhabaleswar K. Panda 

   Networks of Workstations (NOW) have become an attractive alternative
platform for high performance computing.  Due to the commodity nature of
workstations and interconnects, the NOWQ environments are being gradually
redefined as Heterogeneous Networks of Workstations (HNOW) environments.
This paper presents a new framework to implement collective communication
operations (as defined by the Message Passing Interface (MPI) standard)
efficiently for the emerging HNOW environments.  We first classify
different types of heterogeneity in HNOW and then focus on one important
characteristic: communication capabilities of workstations.  By taking this
characteristic into account, we propose a new Speed-Partitioned Ordered
Chain (SPOC) approach to order the participating nodes of a given
collective communication operation in an optimal fashion.  Such an ordering
allows flexibility to implement collective communication operations with
lower latency.  Using the new approach, a set of tree-based algorithms are
proposed for frequently used collective communication operations such as
broadcast, multicast, and barrier synchronization.  All these algorithms
are compared with the standard MPI implementations on experimental as well
as simulated testbeds.  On a 16-node existing HNOW environment with SGI
workstations and ATM interconnection, our approach reduces the latency of
broadcast, multi-cast and barrier synchronization operations by a factor of
up to 1.4 compared to the existing MPICH implementation.  On a 64-node
simulated testbed, our approach promises to reduce the latency of broadcast
operation by a factor of up to 4.8 compared to the MPI implementations.
Improvements up to a factor of 3.3 and 2.3 are observed for multicast and
barrier synchronization respectively.  Thus, these results demonstrate
significant potential to be applied towards designing scalable collective
communication libraries for current and future generation HNOW environments. 

Keywords:  Network of Workstations (NOW), Heterogeneous Network of
Workstations (HNOW), Collective Communication, Broadcast, Multicast,
Barrier Synchronization. 


TR49	Protocols for Secure, Atomic Transaction Execution in Electronic 
	Commerce   by Srividhya Subramanian and Mukesh Singhal 

   Computer networks are an efficient, inexpensive, convenient and fast
mode of communication and information transfer.  There is widespread demand
for the ability to buy and sell goods, especially electronic, using this
medium. Sales over computer networks is also referred to as electronic
commerce.  A protocol for electronic commerce transactions is the series of
steps that two parties must follow in order to transact business
successfully.  There are various issues that arise while designing
electronic commerce protocols:  how can one party be prevented from
disappearing with the money/goods in the middle of a transaction?  How can
a party establish its identity without revealing personal information to
unknown and untrusted parties?  How can we prevent a third party from
"stealing" money from the network?  All these and other problems make
designing an electronic commerce protocol both challenging and interesting. 
   In this paper, we first present a protocol for electronic commerce
transactions between an anonymous customer and a merchant whose identity is
public.  Tampering with the protocol, either by the involved parties or by
an intruder, does not cost the merchant or the  customer more than some
additional computation.  Furthermore, under all circumstances, the
transaction is either completed (the right amount of money is exchanged for
the right goods) or aborted (there is no transfer of money or goods).  In
other words, the protocol ensures anonymity, security, and atomicity in
business transactions.  We also propose a protocol for electronic auctions.  
The auction protocol has the same properties of anonymity, security, and
atomicity as the transaction protocol.  We extend these two protocols to
apply to real-time transactions where goods (such as stocks, bonds,
options, etc.) must be transferred within a stipulated period of time. 

Keywords:  Electronic commerce, transaction, security, atomicity, privacy,
anonymity, auction, real-time. 


TR48	A Comprehensive Survey of Join Techniques in Relational Databases 
	by Yuping Yang and Mukesh Singhal 

   Equijoin between two relations is one of the basic operations in 
relational database and a large volume of research has been devoted to it.  
However, in recent years, there hasn't been a survey which objectively 
compares a wide spectrum of various join techniques in their relative 
performances.  This survey compares performance and practicality between 
various join techniques.  Main criteria for performance comparisons are 
disk I/Os.  For comparing of practicality, criteria used are easiness and 
flexibility of implementation.  When comparing join techniques, each join 
technique is evaluated in its full potential, which means that if other 
techniques are available to enhance this join technique while retaining the 
main philosophy of it, the later techniques are applied. 
   The main contribution of the paper is that it confirms the belief that no 
dramatical performance improvement of three major join techniques (nested 
loops, sort-based, and hash based) of relational database can be made.  The 
future of join performance improvement in relational database systems lies in 
more radical approach:  parallel join, join index, composite index, and 
layered database. 

Key words:  Relational database, query execution, join, join index, equijoin, 
	    band join, selection, index, access path, disk I/O, performance. 


TR47	Fast Join Execution Using Summary Information in Large Databases 
	by Mukesh Singhal and Yuping Yang 

   It is well known that a query execution in relational databases is not 
fast and join execution is generally the most expensive operation in query 
executions.  All three major join methods, namely, nest loops, sorting, and 
hashing have been perfected to an extent that any further improvement in 
these methods will enhance the performance of join execution only marginally. 
Even with the advent of parallel database, the enhancement in the performance 
of join execution is limited:  join execution is often I/O bounded and an 
increase in the number of processors can only have a marginal effect after a 
certain point.  The use of join index to speed up join execution is also 
limited due to its huge storage requirement. 
   In this paper, we propose to use summary information to expedite query 
execution, especially join execution, in large databases.  Summary information 
is stored in asummary database which forms the upper layer of a multi-layer 
database.  The size of the summary database is much smaller than the size of a 
join index.  An interesting feature of our approach is that it allows the 
full set of query processing, including join operation, to be carried out in 
the summary database.  The results of the query processing in the summary 
database are used to narrow down the search efforts in the main relational 
database and can greatly expedite the overall query execution in the main 
database. 
   The main contribution of this paper is that it proposes a mechanism to 
perform join using summary information for performance enhancement in query 
execution in relational databases.  We present a performance analysis that 
shows that the proposed technique can greatly speed up query execution, 
especially join execution. 

Key words:  relational database, large database, multiple layer, join, 
	    signature, grouping, performance. 


TR46	Effective Use of Virtual Channels in Wormhole Routed Distributed 
	Shared Memory Systems   by Donglai Dai and D.K. Panda 

   Multiple virtual channels have been studied extensively and implemented 
as a powerful mechanism for obtaining high performance in modern inter-
connection networks.  However, the effective use of virtual channels in 
distributed shared memory (DSM) systems is non-trivial.  In this paper we 
present two simple techniques for exploiting the promising performance 
benefit of virtual channels in DSM systems.  First, we propose to use unequal 
number of virtual channels in the request and reply networks to harness 
performance benefit from the inherent unbalanced traffic in DSM systems.  
Next, we apply the virtual channel mechanism to injection and consumption 
channels to alleviate the performance bottleneck at the network interface.  
In addition, issues related to virtual channels are taken into account in 
designing non-FIFO coherence protocols.  These new design strategies are 
evaluated through simulation using a set of representative benchmark 
applications.  The effect of critical system parameters (such as node cache 
line size, switch routing delay, and network topology) on the new designs is 
also studied.  Overall, our results show that with an appropriate non-FIFO 
cache coherence protocol, virtual channel mechanism can considerably reduce 
the time for shared memory accesses and the overall execution time of DSM 
applications.  This study demonstrates that a carefully coordinated design 
strategy using the virtual channel mechanism can significantly improve the 
performance of distributed shared memory systems. 

Keywords:  Parallel architecture, distributed shared memory systems, 
	   performance modelling, interconnection networks, wormhole routing, 
	   topology, network interface, directory-based protocols, and 
	   cache coherence. 


TR45	HIPIQS:  A High-Performance Switch Architecture using Input Queuing 
	by  Rajeev Sivaram, Craig B. Stunkel, and Dhabaleswar K. Panda 

   Switch-based interconnects are used in a number of application domains 
including parallel system interconnects, local area networks, and wide 
area networks.  However, very few switches have been designed that are 
suitable for more than one of these application domains.  Such a switch must 
offer both extremely low latency and very high throughput for a variety of 
different message sizes.  While some architectures with output queuing have 
been shown to perform extremely well in terms of throughput, their performance 
can suffer when used in systems where a significant portion of the packets 
are extremely small.  On the other hand, architectures with input queuing 
offer limited throughput, or require fairly complex and centralized 
arbitration that increases latency. 
   In this paper we present a new input queue-based switch architecture called 
HIPIQS (High-Performance Input-Queued Switch).  It offers low latency for a 
range of message sizes, and provides throughput comparable to that of 
output queuing approaches.  Furthermore, it allows simple and distributed 
arbitration.  HIPIQS uses a dynamically allocated multi-queue organization, 
pipelined access to multi-bank input buffers, and small cross-point buffers, 
to deliver high performance.  Our simulation results show that HIPIQS can 
deliver performance close to that of output queuing approaches over a range of 
message sizes, systems sizes, and traffic.  The switch architecture can 
therefore be used to build high performance switches that are useful for both 
parallel system interconnects and for building computer networks. 

Keywords:  Parallel architectures, switch/router design, high-speed 
	   interconnection networks, networks of workstations. 


TR44	 On Consistent Checkpointing in Distributed Systems 
	 by Guohong Cao and Mukesh Singhal 


   Consistent checkpointing simplifies failure recovery and eliminates the 
domino effect in case of failure by preserving a consistent global checkpoint 
on the stable storage.  However, the approach suffers from high overhead 
associated with the checkpointing process. Two approaches are used to reduce 
the overhead: one is to minimize the number of synchronization messages and
the number of checkpoints; the other is to make the checkpointing process
non-blocking. These two approaches were orthogonal in previous years
until the Prakash-Singhal algorithm combined them. 
   In other words, the Prakash-Singhal algorithm forces only a minimum 
number of processes to take checkpoints, and it does not block the underlying 
computation.  In this paper, we identify two problems in their algorithm and 
prove that there does not exist a non-blocking algorithm that forces only a 
minimum number of processes to take their checkpoints. Based on the proof, 
we present an efficient algorithm that neither forces all processes to take 
checkpoints, nor blocks the underlying computation during checkpointing.
Correctness proofs are also provided.


TR43	Teaching the Essential Role of Mathematical Modeling in Understanding 
	and Reasoning about Objects  by  Murali Sitaraman,  Bruce Weide, 
	Tim Long, and Wayne Heym 

   Recognizing the advantages of object-based software, CS instructors have 
increasingly emphasized design and reuse of "objects" in CS1/CS2.  But unless 
students also learn how to understand objects and operations abstractly, and 
how to reason about the behavior of object-based programs, they can't possibly 
write correct non-trivial software.  We describe how we present formal yet 
simple mathematical models of objects and operations and how we use them to 
teach reasoning about object-based program behavior.  The techniques we 
teach are both suitable for introductory courses and scalable to industrial-
strength software. 


TR42	Providing Intellectual Focus to CS1/CS2  by Tim Long, Bruce Weide, 
	Paolo Bucci, Dave Gibson, Joe Hollingsworth, Murali Sitaraman, and 
	Steve Edwards 

   First-year computer science students need to see clearly that computer 
science as a discipline has an important intellectual role to play and that 
it offers deep philosophical questions, much like the other hard sciences and 
mathematics; that CS is not "just programming".  An appropriate intellectual 
focus for CS1/CS2 can be built on the foundations of systems thinking and 
mathematical modeling, as these principles are manifested in a component-
based software paradigm.  We outline some of the main technical features of 
this approach to CS1/CS2 and report preliminary observations from our 
experience with it. 


TR41-DIR	Conceptual Program Editors:  Design and Formal Specifications 
		Ph.D. Dissertation by Paolo Bucci 

   When a programmer uses a program editor, his/her view of what programs are 
and how they are built is affected by the editor itself.  This is a special 
case of a more general phenomenon; namely, that all artifacts have an impact 
on the way users think of them and of the activities they perform with them. 
Research in human-machine interaction has explored the ramifications of this 
insight with respect to how it affects the design of tools and their user 
interface. 
   In this dissertation we investigate the implications of this work for the 
design of program editors.  We note that existing program editors convey a 
low-level, often inconsistent and incomplete conceptual model of programs 
and program construciton.  This hides the high-level conceptual view of 
programs, imnposes excessive responsibilities on the programmer, and tends 
to make the programming task harder and more error prone. 
   We propose a new class of program editors that we call "conceptual 
editors."  These editors are based on a specific, precise, high-level 
conceptual model of programs and program construction, and the editor and its 
user interface can be designed and built to convey the model in a consistent 
fashion.  A conceptual editor, by design, supports, promotes, and even 
enforces an "appropriate" conceptual view of software through this model. 
   To show the feasibility of this approach to program editors, we formally 
specify an editor for a small, but significant (in terms of key concepts) 
subset of a programming language.  By example, this demonstrates a precise 
description of a conceptual model and of a corresponding editor's 
capabilities.  We also discuss how this design could be extended and 
generalized to an editor for a full-fledged programming language. 


TR40-DIR   Texture Segmentation Using Gaussian Markov Random Fields and 
	   Neural Oscillator Networks  by Erdogan Cesmeli and DeLiang L. Wang 

   An image segmentation method is proposed based on texture analysis.  The 
method is composed of two parts. The first part determines a novel set of 
texture features based on Gaussian Markov Random Fields (GMRF).  Unlike other 
GMRF-based methods, our method is not limited by a fixed set of texture types. 
The second part is a 2D array of locally excitatory globally inhibitory 
oscillator networks (LEGION).  The coupling strengths between neighboring 
oscillators are determined by texture feature differences.  When LEGION runs, 
the oscillators corresponding to the same texture tend to oscillate in  
synchrony, whereas different texture regions tend to correspond to different 
phases.  In simulations, a large system of differential equations are solved 
using a recently proposed method for integrating relaxation oscillator 
networks.  Results on real texture images are provided to demonstrate the 
performance of our method. 

Keywords:  texture segmentation, neural networks, gaussian markov random 
	   fields, relaxation oscillators, LEGION. 


TR39	A Movie Approximation Technique for the Implementation of Fast 
	Bandwidth Smoothing Algorithms   by Wu-chi Feng, Ming Liu, 
	and Chi Chung Lam 

   Bandwidth smoothing algorithms have been shown to be effective in 
reducing the network resource requirements for the delivery of compressed 
video streams.  For stored video, a large number of bandwidth smoothing 
algorithms have been introduced that are optimal under certain constraints.  
These algorithms, however, require access to all the frame sizes to achieve 
their optimal properties, which, in turn, can limit responsiveness of video-
on-demand systems especially under VCR functions.  In this paper, we 
introduce a movie approximation technique for the representation of the 
frame sizes, reducing the complexity of the bandwidth smoothing algorithms 
and reducing the amount of data and computation that must be transmitted prior 
to the start of playback.  Our results show that the proposed approximation 
technique can accurately approximate the frame sizes with a small number of 
linear segments without affecting the performance measures that the bandwidth 
smoothing algorithms are attempting to achieve by more than 1%.  In addition, 
we show that implementations of this techniques speed up execution times by 
100 to 1400 times, with typical approximated bandwidth smoothing plan 
calculations for full-length movies requiring less than 20 milliseconds.  
Evaluation using both a Motion-JPEG and MPEG video clip are provided. 

Keywords:  bandwidth smoothing, video-on-demand, MPEG, video networking. 


TR38-DIR   Perceiving without Learning:  from Spirals to Inside/Outside 
	   Ke Chen and DeLiang L. Wang 

   Since first proposed by Minsky and Papert (1969), the spiral problem 
is well-known in neural networks.  It receives much attention as a 
benchmark for various learning algorithms.  Unlike previous work that 
emphasizes learning, we approach the problem from a generic perspective 
that does not involve learning.  We point out that the spiral problem is 
intrinsically connected to the inside/outside problem proposed by 
Ullman (1984, 1996).  We propose a solution to both problems based on 
oscillatory correlation using a time delay network.  Our simulation results 
match human performance, and we interpret human limitations in terms of 
synchrony and time delays, both biologically plausible.  As a special case, 
our network without time delays can always distinguish these figures 
regardless of shape, position, size, and orientation.  We conjecture that 
visual perception will be effortful if local activation cannot be rapidly 
propagated, as synchrony would not be established in the presence of time 
delays. 

Keywords:  Visual perception, the spiral problem, inside/outside relations, 
	   oscillatory correlation, synchronization, desynchronization, 
	   LEGION, time delays 


TR37	Route Optimization in Mobile ATM Networks 
	by Gopal Dommety, Malathi Veeraraghan, and Mukesh Singhal 

   Location manatement schemes that propose setting up the connection to 
the home location of a mobile and then rerouting the connection to the 
mobile's current location could result in suboptimal connection paths.  
In this paper, we propose an algorithm for optimizing the route of a 
connection that becomes suboptimal due to location-based reroutes for 
mobile ATM (Asynchronous Transfer Mode) networks based on the PNNI (Private 
Network-to-Network Interface) standard.  This algorithm uses the hierarchical 
route information of the connection and the summarized topology and loading 
information of the network to determine a "crossover node" such that adjusting 
the connection from that crossover node results in an optimally routed 
connection.  This route optimization procedure can be executed during mobile 
connection setup, thus proposing a one-phase mobile location and connection 
setup scheme.  We also demonstrate how this route optimization procedure 
can be executed after a connection is set up to a mobile, thus proposing the 
second phase of two-phase mobile location and connection setup schemes.  A 
comparative performance analysis of one- and two-phase connection setup 
schemes is presented.  Measures of comparison are call setup delay and the 
amount of network resources allocated to a conneciton.  One-phase scheme 
incurs higher call setup delays, but results in an optimal route during 
connection setup.  On the other hand, two-phase schemes incur lower call 
setup delays, but result in connections with suboptimal routes, prior to the 
execution of the route optimization phase. 


TR35	Simulation of Modern Parallel Systems:  A CSIM-Based Approach 
	by Dhabaleswar K. Panda, Debashis Basak, Donglai Dai, Ram Kesavan, 
	Rajeev Sivaram, Mohammad Banikazemi and Vijay Moorthy 

   Components of modern parallel systems are becoming quite complex with 
many features and variations.  An integrated modeling of these components 
(interconnection network, messaging layer, programming model, and computation-
communication characteristics of applications) is essential to derive design 
guidelines for next generation parallel systems.  Most of the current 
simulation-based modeling platforms do not support such integrated modeling.  
This paper presents our effort at The Ohio State University towards integrated 
modeling of parallel systems.  Basic features of our CSIM-based Wormhole-
routed Multiprocessor Simulator (WORMulSim) are outlined.  A set of techniques 
used in our simulator to model different network components (such as switches, 
linkes, wormhole/cut-through switching techniques, routing protocols, 
network interfaces), messaging layer with basic communication primitives, 
distributed shared memory programming model, and computation-communication 
characteristics of applications are presented.  Some sample performance 
measures of our simulator on current generation workstations are reported to 
demonstrate the feasibility of integrated modeling with low computatonal 
overhead. 


TR34	Fine-Grain Multitolerant Barrier Synchronization 
	by Sandeep S. Kulkarni and Anish Arora 

We design a multitolerant program for synchronizing the phases of
concurrent processes. The tolerances of the program enable processes
to (i) compute all phases correctly in the presence of faults that
corrupt process state in a detectable manner, and (ii) compute only a
minimum possible number of phases incorrectly before resuming correct
computation in the presence of faults that corrupt process state in an
undetectable manner. The program is fine-grain in the sense that each
process action either updates the state of that process or involves
communication with one of two neighboring processes.


TR33	Flat Location Management Scheme for PCNs 
	by Gopal Dommety, Malathi Veeraraghavan, & Muesh Singhal 

   This paper presents a "flat" mobile location management scheme for 
Personal Communication Networks (PCNs) with the goal of reducing both 
mobile tracking and location costs.  Mobile tracking costs are reduced by 
the use of one-level tracking for mobiles in their "home area", and a 
two-level tracking approach for mobiles visiting other areas.  Mobile 
location costs are reduced by using an improved mobile location strategy, 
which requires a one-hop message exchange for mobiles in their home area, 
and a two-hop message exchange for mobiles visiting other areas.  Analytical 
models are set up to analyze the performance of the flat scheme, and compare 
it with that of the other approachees.  The measures of comparison used are 
the computation cost (database accesses), the communication cost (signaling 
messages), the average total cost, and the mobile location cost (call setup 
delay).  For all these measures, the proposed flat scheme performs better 
than the location management scheme defined in the IS-41 standard. Results 
also show that for medium and high values of call-to-mobility ratio (> 0.07, 
i.e., less than 14 moves per call), the flat scheme performs better than 
other approaches proposed in the literature. 


TR32	Compositional Design of Multitolerant Repetitive Byzantine 
	Agreement   by Sandeep S. Kulkarni and Anish Arora 

   We illustrate in this paper a compositional and stepwise method for 
designing programs that offer a potentially unique tolerance to each of their 
fault-classes.  More specifically, our illustration is a design of a 
repetitive agreement program that offers two tolerances:  (a) it masks the 
effects of Byzantine failures and (b) it is stabilizing in the presence of 
transient and Byzantine failures. 


TR31	A Multiple Layered Signature Database Architecture for Mobile 
	and Internet Environments  by Yuping Yang and Mukesh Singhal 

   The performance of very large relational databases is one of the major 
stumble blocks of their deployment in the mobile and Internet environments. 
Indexing, hashing, and sorting are still main query execution speed up 
techniques in actual database implementations.  Join index and materialized 
view have been proposed but not widely accepted due to their large storage 
requirements.  Recently, general multi-layer database concept based on 
concept-hierarchy has been proposed to improve the performance of query 
execution.  However, there are still much to be done to expand this idea into 
an implementable database architecture.  Implementable new query execution 
speed up techniques that can significantly speed up query execution, 
especially when there are a large number of users, yet do not demand excessive 
amount of extra storage space other than database itself, are in demand.  
This paper proposes a multiple layer database architecture based on signature 
method in implementable detail, compares this new architecture with existing 
query execution speed up techniques, and gives a performance analysis of this 
new architecture. 

Keywords:  Relational database, very large database, multi-layer, join, 
	   query processing, signature, grouping, performance, mobile 
	   Internet. 


TR30	Performance Study of Real-Time Scheduling Techniques under 
	Multicasting Traffic in an ATM Multiplexer 
	by Khalid H. Sheta and Mukesh Singhal 

   Multimedia applications in broadband ISDN present various types and classes 
of traffic streams, i.e., connections such as audio, video, and file 
transfers.  Output multiplexers at an ATM switching node are responsible for 
scheduling cells belonging to these traffic connections.  A cell loss in a 
traffic stream due to a cell missing its deadline results in the loss the 
entire packet that should have been delivered to the destination node.  
Multicasting is an important application in broadband networks.  A cell loss 
of a multicast stream at an ATM multiplexer not only affects one node, but 
also it affects a group of nodes in the multicasting group.  This packet loss 
leads to inconsistency in the information received among different members of 
the multicast group.  Moreover, if the lost packet can be retransmitted by 
the root node of the multicast tree, the retransmission will result in an 
excessive load to the network.  In this paper we address the problem of 
scheduling connections that contain real-time multicasting/non-multicasting 
and non-real-time traffic.  Traffic on real-time connections has a deadline.  
Each cell has a deadline associated with it (in terms of cell transmission 
time slots) which the cell has to be scheduled at the multiplexer before it 
expires.  We propose a new scheduling technique that makes use of the 
information available at multicast connection setup time to give appropriate 
priority to multicast traffic.  In particular we use the multicast tree size 
(number of nodes) and the multicast subtree sizes at each multiplexer.  We 
show how this information coupled with cell arrival time, can be used to 
proportinally weigh the actual cell deadline of multicast cells to give them 
priority over other classes of real-time non-multicast and non-real-time 
cells.  We assume an On-Off traffic source model for real-time connections 
containing two types of deadlines, tight and relaxed.  We compare, through 
extensive simulation, our technique with two well known real-time scheduling 
techniques.  We show how the new technique outperforms the other techniques 
by improving cell loss for multicast connections while attempting to 
maintain the same cell loss or better for non-multicasting connections over a 
wide range of parameter values.  We also show that the new technique does not 
increase the delay of non-real-time connections compared to the other 
scheduling techniques.  Finally, we discuss how the new technique can be 
incorporated into an ATM switching node. 

Key words:  Multicast, Real-Time, Scheduling, ATM Switches. 


TR29	Once-and-Forall Management Protocol (OFMP) 
	by Sandeep S. Kulkarni and Anish Arora 

OFMP is a hierarchical, network management protocol that enables group
operations to be executed on the management information bases of all
nodes in the group.  As long as no faults occur, OFMP ensures that 
all nodes execute their local operation exactly once in each group
operation.  If ``immediately detectable'' faults occur, it ensures
masking fault-tolerance; i.e., all non-failed nodes execute their
local operation exactly once in each group operation.  And, if
``eventually detectable'' faults occur, it ensures stabilizing
fault-tolerance; i.e., it eventually converges to a state from where
all non-failed nodes execute their local operation exactly once in
each subsequent group operation. Of special note is the ability of
OFMP to detect using only a bounded amount of memory whether nodes
have executed in same group operation, in a manner that masks
immediately detectable faults and stabilizes from 
eventually detectable faults.


TR28	A Certificate Path Generation Algorithm for Authenticated Signaling 
	in ATM Networks 
	by  Jun Xu and Mukesh Singhal

   ATM Forum specifies public key cryptography to be the default ATM 
authentication mechanism and directory services like X.509 to be the 
infrastructure for public key distribution and certification.  Authenticated 
signaling, widely acknowledged as a necessary security feature of ATM network, 
requires the signaling message to be authenticated with a digital signature, 
the called party needs to obtain the public key of the calling party and a 
proof of the calling party's ownership to that public key.  In X.509, the 
standard form of such a proof is a chain of public key certificates, called 
the certificate path between two parties.  Certificate exchange protol (CEP), 
proposed by ATM Forum, requires that another bi-directional connection be 
established between two parties to exchange public keys and certificate paths 
before an authenticated connection can be set up, which is not an ideal 
approach.  We propose an algorithm which is embedded into ATM routing 
protocols to generate a certificate path inside a signaling message on-the-fly
as the signaling message travels through the ATM network.  In this approach, 
all that a calling party needs to know for authentication purpose is its own 
public key certificate and the ATM network builds the rest of the certificate 
path for it.  Related issues like distribution of public key certificates and 
optimization of CA hierarchy are also addressed in this paper. 

Keywords:  ATM, P-NNI, authenticate signaling, certificate path. 


TR27-DIR	Slice Based Volume Rendering 
		by David M. Reed and Roni Yagel 

   Some of the more important research results in computational science rely 
on the use of simulation methods that operate on unstructured grids.  
However, these grids, composed of polyhedra, introduce exceptional problems 
with respect to data visualization.  Volume rendering techniques, originally 
developed to handle rectangular grids, show significant promise for general 
use with unstructured grids as well.  The main disadvantage of this approach, 
compared to isosurfaces, particles or other visualization tools is its non-
interactive performance.  This paper presents an efficient method for 
rendering unstructured grids that is based on incremental slicing and 
hardware polygon rendering.  The grid is sliced with a set of parallel planes 
resulting in a polygonal mesh.  Look-up tables are used to generate the 
polygons based on which edges of a polyhedron are intersected.  The graphics 
hardware is then used to render (interpolate-fill) the polygon meshes and 
composite them in a visibility order.  The main advantages of this algorithm 
are that it avoids an explicit sort to produce a depth ordering and can take 
advantage of polygon rendering hardware.  These two properties of the 
algorithm allow it to produce good quality images quickly using machiens with 
polygon rendering hardware. 


TR26-DIR	Segmentation of Medical Images Using LEGION  
		by Naeem Shareef, DeLiang L. Wang, and Roni Yagel 

   Advances in visualization technology and specialized graphic workstations 
allow clinicians to virtually interact with anatomical structures contained 
within sample medical image datasets.  A hindrance to the effective use of 
this technology is the difficult problem of image segmentation.  In this 
paper, we utilize a recently proposed oscillator network called LEGION 
(Locally Excitatory Globally Inhibitory Oscillator Network), whose ability to 
achieve fast synchrony with local excitation and desynchrony with global 
inhibition makes it an effective computational framework for grouping similar 
features and segregating dissimilar ones in an image.  We identify four key 
computational components that determine its dynamics.  Then, we describe an 
image segmentation algorithm that uses these LEGION components and an adaptive 
scheme to choose tolerances.  We show results of the algorithm to 2D and 3D 
(volume) CT and MRI medical image datasets, and compare our results with 
manual segmentation.  LEGION's computational and architectural properties 
make it a promising approach for real-time medical image segmentation using 
multiple features. 


TR25	Scheduling Fork-Join Computations on Distributed-Memory 
	Multiprocessor Systems
	by Khalid H. Sheta, Mukesh Singhal and Phillip E. Krueger 

   In this paper, we develop two analytic models for scheduling fork-join 
computations on distributed-memory multiprocessor system.  The first model 
allows for arriving tasks to multiplex processors with existing tasks, while 
the other model does not allow multiplexing.  Both models assume that tasks 
block for I/O and that the overhead of rescheduling a task to another 
processor is prohibitive.  We compare the performance of both models in terms 
of the overall job completion time.  We show that multiplexing have a 
performance advantage over non-multiplexing over a wide range of conditions.  
We point out how the results can be used in the design of an adaptive 
multiprocessor scheduler. 

Key words:  distributed-memory multiprocessors, job scheduling, fork-join 
	    computations, performance analysis. 


TR24	Phase-Space Nonlinear Control Toolbox:  The Maglev Experience 
	by Feng Zhao, Shiou C. loh, and Jeff A. May

We describe the Phase-Space Nonlinear Control Toolbox, a suite of
computational tools for synthesizing and evaluating control laws for a
broad class of nonlinear dynamical systems.  The Toolbox comprises
computational algorithms for identifying optimal control reference
trajectories in the phase space of dynamical systems and experimental
methods for evaluating performance of the control laws.  These
algorithms combine knowledge of the geometric theory of modern
nonlinear dynamical systems with efficient computational methods for
geometric reasoning and graph search; they define the property of
controllability and robustness in terms of phase-space geometric
structures and exploit the phase-space neighborhood adjacencies to
obtain computational efficiency.  Compared to the traditional analytic
control design methods, the phase-space based control synthesis and
evaluation rely on high-performance computational techniques and are
applicable to physical systems operating in large nonlinear regimes.
Using a proof-of-concept physical experiment for stabilizing a
nonlinear magnetic levitation system, we have successfully
demonstrated the feasibility of the phase-space control technology.


TR23	An Introduction to RESOLVE/Ada95 
	by David S. Gibson 

   (No abstract per se exists:  this is the first paragraph of the 
Preface).  This report introduces RESOLVE/Ada95, a discipline for 
building high-quality software components using the 1995 revision to Ada.  
RESOLVE/Ada95 is an approach to implementing software components in Ada95 
using the principles of RESOLVE.  RESOLVE is three things.  First, 
RESOLVE is a framework for software component engineering.  RESOLVE is also 
a language which includes two integrated sub-languages:  a model-based 
formal specification language and an imperative, sequential, programming 
language.  Finally, RESOLVE is a discipline with detailed design principles 
which guide software engineers in the development of high quality software 
components and systems.  The RESOLVE discipline has been applied to the Ada 
and C++ programming languages.  This report demonstrates how the RESOLVE
displine may be applied to Ada95. 


TR22	Stepwise Design of Tolerances in Barrier Computations 
	by Sandeep S. Kulkarni and Anish Arora 

   We design in a stepwise manner a multitolerant computation for 
synchronizing the phases of concurrent processes.  The tolerances of the 
computation enable processes to (i) compute all phases correctly in the 
presence of faults that corrupt process state in a detectable manner, and 
(ii) compute only a minimum possible number of phases incorrectly before 
resuming correct computation in the presence of faults that corrupt 
process state in an undetectable manner. 

Keywords:  multitolerance; detectable and undetectable faults; stepwise 
	   design; concurrent systems. 


TR21	Multicasting on Switch-based Irregular Networks using Multi-drop 
	Path-based Multidestination Worms 
	by Ram Kesavan and Dhabaleswar K. Panda 

   This paper presents a novel concept of multi-drop path-based multi-
destination message passing on switch-based irregular networks.  First, the 
multi-drop mechanism is defined with an associated header encoding scheme, 
and this mechanism is used to develop path-based multidestination worms.  
Next, a method is proposed to identify valid multidestination paths on 
arbitrary irregular networks with a typical deadlock-free routing.  Then, the 
deadlock-free property of multi-drop path-based worms is emphasized.  Using 
the above concepts, three multicast algorithms are proposed:  multi-drop 
path-based greedy (MDP-G), multi-drop path-based less-greedy (MDP-LG), and 
multi-drop path-based binomial (MDP-B).  The proposed algorithms are compared 
with each other and with the best unicast based algorithm, the CCO [5], for a 
range of system and technological parameters.  The MDP-LG scheme is shown 
to be the best to implement multicast with reduced latency. 

Keywords:  Interconnection network, collective communication, multicast, 
	   broadcast, worm-hole routing, cut-through routing, networks of 
	   workstations, and path-based routing. 


TR20	Multicasting in Irregular Networks with Cut-Through Switches 
	using Tree-Based Multidestination Worms  
	by Rajeev Sivaram, Dhabaleswar K. Panda  and Craig B. Stunkel 

   Multidestination message passing has been proposed as a mechanism to 
achieve efficient multicast in regular direct and indirect networks.  The 
application of this technique to parallel systems based on irregular networks 
has, however, not been studied.  In this paper we propose two schemes for 
performing multicast using multidestination worms on irregular networks and 
examine the extent to which multidestination message passing can improve 
multicast performance.  For each of the schemes we propose solutions for the 
associated problems such as, methods for encoding and decoding 
multidestination headers, alterations to the setup algorithm run by the 
switches, logic to perform header manipulation, etc.  We perform extensive 
simulations to evaluate our schemes under a variety of changing parameters:  
software startup overhead per message, system size, switch size, message 
length and degree of connectivity.  Our results establish that even a very 
naive multicasting algorithm using multidestination message passing can 
result in considerable performance gains over the best unicast based 
multicasting schemes. 

Keywords:  Parallel computer architecture, cut-through routing, multicast, 
	   broadcast, irregular networks, collective communication. 


TR19	How Can We Design Better Networks for DSM Systems? 
	by Donglai Dai and Dhabaleswar K. Panda 

   Most DSM research in current years have ignored the impact of inter-
connectioni network altogether.  Similarly, most of the interconnection 
network research have focused on better network designs by using synthetic 
(uniform/non-uniform) traffic.  Both these trends do not lead to any 
concrete guidelines about designing better networks for the emerging 
Distributed Shared Memory (DSM) paradigm.  In this paper, we address these 
issues by taking a three-step approach.  First, we propose a comprehensive 
parameterized model to estimate the performance of an application on a DSM 
system.  This model takes into account all key aspects of a DSM system:  
application, procesor, memory hierarchy, coherence protocol, network, etc. 
Next, using this model we evaluate the impact of different network design 
choices (link speed, link width, topology, ratio between router to physical 
link delay) on the overall performance of DSM applications and establish 
guidelines for designing better networks for DSM systems.  Finally, we 
use simulations of SPLASH2 benchmark suites to validate our design 
guidelines.  Some of the important design guidelines established in this 
paper are:  1) better performance is achieved by increasing link speed instead 
of link width, 2) changing topology of a network under constant bisection 
bandwidth constraint is not at all beneficial, 3) network contention 
experienced by short messages is very crucial to the overall performance, etc. 
These guidelines together with several others lays a good foundation for 
designing better networks for current and future generation DSM systems. 

Keywords:  Parallel architecture, distributed shared memory systems, 
	   performance modeling, interconneciton networks, wormhole 
	   routing, topology, directory-based protocols, and cache coherence. 


TR18	Performance Evaluation of Smoothing Algorithms for Transmitting 
	Prerecorded Variable-Bit-Rate Video   by Wu-chi Feng and 
	Jennifer Rexford 

   The transfer of prerecorded, compressed video requires multimedia services 
to support large fluctuations in bandwidth requirements on multiple time 
scales.  Bandwidth smoothing techniques can reduce the burstiness of a 
variable-bit-rate stream by prefetching data at a series of fixed rates, 
simplifying the allocation of resources in video servers and the communication 
network.  Given a fixed client-side prefetch buffer, several bandwidth 
smoothing algorithms have been introduced that are provably optimal under 
certain constraints.  This paper presents a comprehensive performance 
evaluation of bandwidth smoothing algorithms based on a collection of metrics 
that relate directly to the server, network, and client resources necessary 
for the transmission, transport, and playback of prerecorded video.  Due to 
the scarcity of available trace data, we have constructed a video capture 
testbed and generated a collection of twenty full-length, motion-JPEG encoded 
video clips.  Using these video traces and a range of client buffer sizes, we 
investigate the interplay between the performance metrics through simulation 
experiments.  The results highlight the unique strengths and weaknesses of 
each bandwidth smoothing algorithm and pinpoint promising directions for 
future research. 


TR17	Aristotle:  A System for Research on and Development of Program-
	Analysis-Based Tools  by Mary Jean Harrold and Gregg Rothermel 

   Aristotle provides program analysis information and supports the 
development of software engineering tools.  Aristotle's front end consists of
parsers that gather control-flow, local data flow, and symbol table 
information for C and Java programs.  Aristotle tools use the data provided 
by the front end to perform a variety of tasks, such as data-flow and control-
dependence analysis, data-flow testing, regression test selection, graph 
construction and graph viewing.  Parsers and tools use database access 
routines to store information in, and retrieve it from, a data repository.  
Users can view analysis data textually or graphically.  A user interface 
provides menu-driven access to tools; many tools can also be invoked directly 
from applications programs.  Most of Aristotle's components function on 
single procedures and entire programs.  We use Aristotle as a platform for 
developing and experimenting with program analysis, maintenance, and testing 
tools. 

KEY WORDS:  program analysis software engineering, CASE, experimental systems. 


TR16	Constructing Pairwise Disjoint Paths with Few Links 
	by Himanshu Gupta and Rephael Wenger 

   Let P be a simple polygon and let {(ui, ui',)} be m pairs of distinct 
vertices of P where for every distinct i, j is less than or equal to m, there 
exist pairwise disjoint paths connecting ui to ui' and uj to uj'.  We wish to 
construct m pairwise disjoint paths in the interior of P connecting ui to 
ui' for i = 1, . . . , m, with minimal total number of line segments.  We 
give an approximation algorithm which in O(n log m + M log m) time constructs 
such a set of paths using O(M) line segments where M is the number of line 
segments in the optimal solution. 


TR15	Shape Reconstruction from Contours using Isotopic Deformation 
	by Kikuo Fujimura and Eddy Kuo 

   A method for shape reconstruction from contours is presented using isotopic 
deformation.  The reconstructed shape is free of self-intersection and it can 
incorporate given feature correspondences.  The method automatically adds 
vertices, where necessary, to satisfy these requirements.  A new method for 
handling bifurcation is also proposed, which can handle cases that are 
problematic for some other algorithms.  The method proposed is suitable for 
terrain modeling, since reconstructed shapes generated by the method do not 
have overhangs.  The running time of the algorithm is dependent on the 
complexity of the input scene and is shown to be worst-case optimal for the 
class of task defined.  Experimental results are included to illustrate the 
feasibility of the approach. 

Keywords:  shape reconstruction, terrain modeling, isosurface, isotopy. 


TR14	How Much Does Network Contention Affect Distributed Shared Memory 
	Performance?   by Donglai Dai and Dhabaleswar K. Panda

   Many research results in recent years have focused on the design of 
distributed shared memory (DSM) systems.  However, most of these results are 
centered around either careful design of node controllers or cache coherence 
protocols.  While evaluating these designs, simplified models of networks 
(constant network latency or average latency based on the network size) are 
typically used.  Such models completely ignore network contention and thus, do 
not capture the latency of remote memory references accurately.  While this 
approach may seem reasonable for designing better node architectures or 
coherency protocols, it serves poorly for understanding and solving the 
network latency bottleneck problem.  In order to help network designers to 
design better networks for DSM systems, in this paper, we focus on two major 
goals:  1) to estimate the impact of network link contention and network 
interface contention on the overall performance of DSM applications and 2) to 
study the impact of critical architectural parameters on network contention.  
We achieve these goals by evaluating a set of applications from the widely 
used SPLASH2 suite on a closely integrated processor and network simulator for 
a CC-NUMA system.  The simulator models the processor, cache, and memory 
access references at instruction level and models the network at a flit 
transfer level.  A series of three network models are proposed to isolate 
network link contention and network contention.  For an 8 x 8 wormhole system, 
the study indicates that network contention can degrade performance up to 
59.8%.  Out of this, up to 7.2% is caused by network interface contention 
alone.  The study indicates that network contention becomes dominant for DSM 
systems using small caches, wide cache line sizes, low degrees of 
associativity, high processing node speeds, high memory speeds, low network 
speeds, or small network widths. 

Keywords:  Parallel architecture, distributed shared memory systems, inter-
	   connection networks, wormhole routing, directory-based protocols, 
	   cache coherence, and meshes. 


TR13	Interaction Refinement in Object-Oriented Systems (Extended 
	Abstract)  by Neelam Soundarajan 

   An OO designer typically starts with a high-level idea of the interactions 
between the key objects in the system.  As the design progresses, the 
designer refines these interactions by identifying the exact operations of 
each object that other objects will invoke and the order of invocations, or 
by introducing new objects with appropriate operations to mediate the required 
interactions between existing objects, etc.  These refinements are usually not 
recorded, so the rationale behind the design may be lost.  We motivate the 
notion of interaction refinement with a few examples, provide a precise 
definition of the concept, and develop a formalism that can be an important 
tool in recording and validating OO designs. 


TR12	Refining INteractions in a Distributed System (Extended Abstract) 
	by Neelam Soundarajan  

   We identify a new type of refinement that we call interaction refinement 
that is of great value in designing distributed systems.  An interaction 
refinement step is one in which a system designer decides to implement a 
given "high-level" interaction as a particular sequence of "low-level" 
interactions.  For example a particular high-level interaction may be 
specified as "identify-client" involving a "client process" and a "server 
process".  The designer may decide to implement this interaction as a series 
of low-level interactions between the two processes, with the client process 
first sending to the server process an id, followed by a password, and perhaps 
allow the client several attempts at getting the password right, etc.  Here 
the designer is not refining the internal structure of either the client 
process or the server process; rather she is deciding to implement the 
"identify-client" interaction between the two processes in terms of a sequence 
of low-level communications between them.  By contrast, other types of 
refinement that are normally considered are concerned with refining the 
internal structure of one or more of the processes, the interactions between 
the processes being taken as a given. 
   We motivate the idea of interaction refinement with simple and natural 
examples, present a precise definition of the concept, and develop a 
preliminary formalism that can be used to establish correctness of systems 
developed using interaction refinement.  We also compare our work with 
previous work which allow one to deal with restricted kinds of interaction 
refinement. 


TR11	Can Scatter Communication Benefit from Multidestination Message 
	Passing?   Mohammad Banikazemi and Dhabaleswar K. Panda 

   Multiple research directions in the past have emphasized the benefits of 
multidestination message passing to support efficient non-personalized 
communication like broadcast and multicast.  In this paper, we investigate 
whether personalized communications like scatter can benefit by using 
multidestination message-passing support, if available in a system.  In 
order to use such support, we propose a new approach where personalized data 
for a set of destinations can be grouped together and sent as a single 
multidestination broadcast/multicast worm.  This approach is shown to use 
less number of communication steps while achieving the lower bound for scatter 
communication.  Using this approach, we propose two schemes, sequential 
multidestination tree-based (SMT) and hierchical multidestination tree-based 
(HMT), to implement one-to-all scatter in k-ary n-cube wormhole systems.  
Performance of these two schemes are compared with two commonly used unicast-
based schemes, sequential tree-based (ST) and binomial tree-based (BT).  
Analytical and simulation studies are carried out to determine the scatter 
latency under the four schemes for k-ary n-cube systems with varying 
communication start-up times (ts), propagation times (tp), and message 
lengths (l, size of data being sent to each destination).  The study indicates 
that for l less than or equal to ts/ktp, the scatter can be best implemented 
by the HMT scheme.  Similarly, the SMT scheme is shown to outperform where 
ts/ktp < l is less than or equal to ts/tp.  For l > ts/tp, both ST and SMT 
perform better than other algorithms.  Simulation results on an 8 x 8 system 
with ts = 10.0 microsec show that the HMT and SMT schemes reduce scatter 
latency up to a factor of 30 and 8 compared to the ST scheme for small message 
lengths.  These results confirm that scatter communication can in fact take 
advantage of multidestination message passing support. 

Keywords:  Parallel architecture, collective communication,  scatter, 
	   personalized communication, wormhole routing, k-ary n-cubes, 
	   path-based routing, and multidestination message passing. 


TR10	Optimal Multicast with Packetization and Network Interface Support 
	Ram Kesavan and Dhabaleswar K. Panda 

   Most multicast algorithms proposed in the literature assume an arbitrarily 
long message being communicated as a single packet.  Modern networks typically 
limit the size of the largest packet, and long messages are packetized and 
transmitted.  Such networks also provide network interface support for nodes, 
which typically includes a coprocessor and memory, to implement the lower 
layers of the communication protocol.  Such network interfaces can be 
programmed to support efficient multicasting to eliminate software overhead 
for higher layers during absorb and retransmit.  In this paper, we present an 
optimal multicast algorithm for systems with such smart network interface 
support for packetization.  Two implementations of smart network interface, 
the First-Child-First-Served (FCFS) and the First-Packet-First-Served (FPFS), 
are studied and compared.  It is shown that the FPFS network interface 
support is more practical and efficient.  Next, the multicast latency is 
modeled for the FPFS implementation.  A concept of k-binomial tree is 
introduced, and proved to be optimal for multicasting using such network 
interface support.  A method to construct contention-free k-binomial trees on 
contention-free orderings of the nodes is also shown.  The performance of the 
optimal k-binomial tree is compared with the standard binomial tree for a 
range of multicast set size and number of packets.  For a 64 node system, and 
parameters of ts = 12.5 microseconds (message and overhead), tr = 12.5 
microseconds (message receive overhead), and tstep = 7.2 microsecond (overhead 
of preparation and transfer of a packet by the network interface), the 
k-binomial tree is shown to be up to 3 times better than the binomial tree for 
24 packet long messages.  The k-binomial tree shows increased benefits with 
increase in number of packets.  Thus, these results demonstrate significant 
potential to be applied to current and future generation high performance 
systems including MPPs and NOWs, where network interface support for 
multicast is provided. 

Keywords:  Interconnection Network, Collective communication, Multicast, 
	   Broadcast, Wormhole Routing, Link Contention, Networks of 
	   Workstations, Network Interface Support, and Packetization. 


TR09	Hybrid Algorithms for Complete Exchange in 2D Meshes 
	by N.S. Sundar, D.N. Jayasimha, D.K. Panda, and P. Sadayappan 

   (No Abstract - this is the first paragraph of the introduction):  The 
all-to-all personalized (complete) exchange is a global collective 
communication operation in distributed memory multicomputers in which every 
processor sends a unique block of data to every other processor in the system. 
This pattern arises in many important problems such as multidimensional FFT, 
matrix transposition and sorting.  It is one of the collective communication 
primitives defined by the MPI [11] message passing interface standard.  Since 
complete exchange is communication intensive, its effective implementation 
in wire-sparse topologies like 2D meshes is a challenging problem. 


TR08	Accurate Methods for the Voxelization of Planar Objects 
	Vassily Filippov and Roni Yagel 

   The process of generating discrete surfaces in volumetric representation, 
termed voxelization, is confronted with topological considerations as well 
as accuracy and efficiency requirements.  We define the topological notions 
of separability and minimality and introduce a method for converting planar 
objects into voxel representation.  Through geometric principles, our method 
provides topological consistency as well as numerical accuracy. 


TR07	A Medium Access Control Protocol for Voice and Data Integration in 
	Receiver-Oriented DS-CDMA PCNs 
	Yibin Yang, Junfeng He, and Ming T. Liu 

   Primarily used in military communications in the past, code division 
multiple access (CDMA) is recently found to be attractive for personal 
communications as well.  As a large number of mobile hosts are supported 
within a cell and a wide range of services are provided, one of the most 
important issues in a CDMA personal communication network is how to control 
the uplink access to the shared wireless spectrum.  In this paper, we 
address this issue in a realistic situatio where the receiver-oriented 
transmission protocol is employed and the packet loss due to multiple access 
interference (MAI) cannot be ignored.  A medium access control protocol for 
voice and data integration is proposed.  It solves the problems of code 
assignment and MAI control at the same time.  A Markov chain model is used to 
analyze the protocol and the analytical results are shown to be very close 
to simulations.  Based on the modeling, the effectiveness of the protocol's 
MAI control is demonstrated and some system design issues are investigated. 

Keywords:  code division multiple access, medium access control, personal 
	   communication networks, slotted ALOHA, voice and data integration. 


TR06	Efficient Multicast in IP/ATM Networks  
	Walid Mostafa and Mukesh Singhal 

   Real-time transport protocols rely on IP to disseminate audio/video 
packets from a sender to one or more receivers.  In IP over ATM broadband 
networks, studies have shown that the effective bandwidth is reduced by at 
least 10% and that larger protocol payload sizes yield higher throughput. 
However, a dropped cell corrupts the IP packet it belongs to and reduces the 
rate successfully-received data, or goodput.  This results in quality of 
service degradation to multimedia traffic, particularly for intra-frame 
encoded video traffic that is sensitive to a packet loss.  A key observation 
for multimedia applications is that achieving maximum throughput versus 
reaching optimum goodput is determined by the end-to-end quality of 
service requirements.  In this report, we derive analytic expressions for the 
maximum throughput and packet-loss probability which provides us with a 
better understanding of the goodput of real-time transport protocols.  We 
also derive an analytic expression for optimum PDU that maximizes the goodput 
with respect to cell-loss probability and number of ATM switches.  We show 
that optimum PDUs yield higher goodput than the default MTU for IP over 
ATM, particularly as the cell-loss probability and/or the number of ATM 
switches increase.  For reliable multicast, we also derive analytic 
expressions to the expected total number of transmissions per packet and to 
the goodput of reliable multicast. 

Keywords:  Multimedia Dissemination, Reliable Multicast, Transport PDU, 
	   IP over ATM. 


TR05	Online Smoothing for Delayed Transmission of Live Video 
	Jennifer Rexford and Wu-chi Feng 

   (No real abstract exists for this TR; the following is the 1st paragraph 
of the Introduction). 
   Many emerging applications rely on the efficient transmission of live or 
pre-recorded video.  Although effective compression techniques can 
substantially reduce the storage and bandwidth requirements, compressed video 
streams typically exhibit significant burstiness that complicates the 
provisioning of resources along the path from the video server to the client 
site.  To reduce the burstiness and the peak bandwidth requirements, a 
multimedia server can smooth the outgoing stream by transmitting a sequence 
of video frames at an averaged rate.  Recently, several bandwidth smoothing 
techniques have been introduced that reduce the server and network resource 
requirements for transmitting pre-recorded video.  By capitalizing on 
prior knowledge of the frame sizes for the entire video, these techniques 
can smooth traffic on a large time scale by prefetching frames into a buffer 
in advance of bursts of large frames.  These algorithms, given a fixed-sized 
client buffer, can minimize the peak bandwidth of the video stream, while 
also optimizing other important performance metrics. 


TR04	Refining Interactions in a Distributed System (Extended Abstract) 
	Neelam Soundarajan

   We identify a new type of refinement that we call interaction refinement 
that is of great value in designing distributed systems.  An interaction 
refinement step is one in which a system designer decides to implement a 
given 'high-level' interaction as a particular sequence of 'low-level' 
interactions.  For example a particular high-level interaction may be 
specified as 'identify-client' invloving a 'client process' and a 'server 
process'.  The designer may decide to implement this interaction as a series 
of low-level interactions between the two processes, with the client process 
first sending to the server process an id, followed by a password, and perhaps 
allow the client several attempts at getting the password right, etc.  Here 
the designer is not refining the internal structure of either the client 
process or the server process; rather she is deciding to implement the 
'identify-client' interaction between the two processes in terms of a 
sequence of low-level communications between them.  By contrast, other types 
of refinement that are normally considered are concerned with refining the 
internal structure of one or more of the processes, the interactions between 
the processes being taken as a given. 
   We motivate the idea of interaction refinement with simple and natural 
examples, present a precise definition of the concept, and develop a 
preliminary formalism that can be used to establish correctness of systems 
developed using interaction refinement.  We also compare our work with 
previous work which allow one to deal with restricted kinds of interaction 
refinement. 


TR03	On the Specification, Inheritance, and Verification of Synchronization 
	Constraints  by Neelam Soundarajan 

   Object-orientation and distributed systems are a natural match.  Objects 
correspond to processes in a distributed program; the invocation of a method 
of one object by another object corresponds naturally to a message being 
passed between the corresponding processes in the distributed program.  
Despite this close correspondence, progress in developing an OO approach to 
concurrency has been limited.  One important problem has been the so-called 
inheritance anomaly which is concerned with how and how easily synchronization 
constraints specified in a base class may be modified in a derived class.  
Our concern in the current paper is slightly different.  We are interested 
in developing ways to abstractly specify these synchronization constraints, 
and ways to verify them.  In other words we are interested in what these 
synchronization constraints, and ways to verify them.  In other words we are 
interested in what these synchronization constraints do, and this is, of 
course, the critical question from the point of view of the users of these 
objects.  We use the mechanism of acceptance sets in our specifications.  We 
develop a proof method to verify that (base as well as derived) classes meet 
their specifications.  We also consider the question of what kinds of 
modifications of synchronization constraints in the derived classes are easy 
for the clients of the class to deal with. 

Key phrases: Specification and verification; synchronization constraints; 
	     inheritance anomaly. 


TR02	A Survey of the Use-It-Or-Lose-It Policies for the ABR Service in 
	ATM Networks  by  Shiv Kalyanaraman, Raj Jain, Rohit Goyal and 
	Sonia Fahmy 

   The Available Bit Rate (ABR) service has been developed to support data 
applications over Asynchronous Transfer Mode (ATM).  The ABR service uses 
a closed-loop rate-based traffic management framework where the network 
divides available bandwidth among contending sources.  The ATM Forum then 
worked on incorporating open-loop control capabilities to make the ABR 
service robust to temporary network failures and source inactivity periods. 
One of the problems addressed was whether rate allocations of sources should 
be taken away if sources do not use them.  The proposed solutions, popularly 
known as the Use-It-or-Lose-It (UILI) policies, have had significant impact 
on the ABR service capabilities.  In this paper we survey the design, 
development, the final shape of these policies and their impact on the ABR 
service. 

Keywords:  Asynchronous Transfer Mode (ATM), Available Bit Rate (ABR), 
	   traffic management, congestion control. 


TR01	Spatial Aggretation:  Modeling and Controlling Physical Fields 
	by Chris Bailey-Kellogg and Feng Zhao 

   Many important physical phenomena, such as temperature distribution,
air flow, and acoustic waves, are described as continuous, distributed
parameter fields. Controlling and optimizing these physical processes
and systems are common design tasks in many scientific and engineering
domains. However, the challenges are multifold: distributed fields are
conceptually harder to reason about than lumped parameter models;
computational methods are prohibitively expensive for complex spatial
domains; the underlying physics imposes severe constraints on
observability and controllability.

   This paper develops an ontological abstraction and an
aggregation-disaggregation mechanism, in a framework collectively
known as spatial aggregation (SA), for reasoning about and
synthesizing distributed control schemes for physical fields. The
ontological abstraction models physical fields as networks of spatial
objects.  The aggregation-disaggregation mechanism employs a set of
data types and generic operators to find a feasible control structure,
specifying control placement and associated actions that satisfy given
constraints.  SA abstracts common computational patterns of control
design and optimization in a small number of operators to support
modular programming; it builds concise and articulable structural
descriptions for physical fields.  We illustrate the use of the SA
ontological abstraction and operators in an example of regulating a
thermal field in industrial heat treatment.

Keywords. Qualitative reasoning; Spatial reasoning; Ontologies;
Decentralized control; Distributed AI; Programming languages.