Technical Report Series Computer and Information Science Research Center Department of Computer and Information Science The Ohio State University 2015 Neil Avenue, Columbus, OH 43210-1277 **********************NOTE********************* Abstracts for 1997 NOTE: some titles may be available ONLY via paper (This is noted after the title/author section.) If you have trouble accessing a file, please send email to tfletch@cis.ohio-state.edu 1997 TR56 A Methodology for Detecting Violation of Real-Time Constraints in Secure Electronic Commerce Transactions by Srividhya Subramanian and Mukesh Singhal In electronic commerce applications, it is often critical to transact within a short stipulated time. In such applications, it is financially damaging to the customer if the transaction does not complete within the stipulated time. In this paper, we describe various situations in the stock market (one of the biggest electronic commerce markets) requiring real-time transactions. We propose a methodology to enhance any electronic commerce protocol to a real-time aware electronic commerce protocol. The methodology consists of two parts: the first part adds a timestamp to some messages in the original protocol and the second part augments the protocol with a mechanism to detect violation of real-time constraints. We show that the methodology does not violate any of the useful properties present in existing electronic commerce protocols. As a result, the methodology may be successfully applied to any existing electronic commerce protocols. The methodology is illustrated by applying it to any existing well known electronic commerce protocol, namely, the NetBill protocol. Keywords: Electronic commerce, real-time, transaction, security, atomicity, privacy, anonymity, stock market. TR55 An Empirical Investigation and Comparison of Program Spectra by Mary Jean Harrold, Gregg Rothermel, Rui Wu, and Liu Yi A variety of expensive software maintenance and testing tasks require a comparison of the behaviors of program versions. Program spectra have recently been proposed as a heuristic for use in performing such comparisons. To assess the potential usefulness of spectra in this context, we conducted an experiment that examined the relationship between program spectra and program behavior, %(measured in terms of program failure behavior) and empirically compared several types of spectra. This paper reports the results of that experiment. TR54 Accurate Low-Contrast 3D Cone-Beam Reconstruction With Algebraic Methods by Klaus Mueller, Roni Yagel, and John J. Wheller This paper examines the use of the Algebraic Reconstruction Method (ART) and related techniques to reconstruct 3D objects from a relatively sparse set of cone-beam projection data. Although ART has been widely used for cone-beam reconstruction of high-contrast objects, e.g. in computed angiography, we are interested in the more challenging low-contrast case which represents a little investigated scenario for ART. Preliminary experiments indicate that for cone angles greater than 20°, traditional ART produces reconstructions with strong aliasing artifacts, obliterating much object detail. By analyzing the reconstruction process using signal processing principles it is revealed that the source of these artifacts is the non-uniform reconstruction grid sampling and correction by the cone-beam rays. To eliminate these errors, we devise a new way of computing the weights of the reconstruction matrix. This new method is more adequate for cone-beam and replaces the usual constant-size interpolation filter by one whose size and amplitude is dependent on the source-voxel distance. Doing so enables us to generate alias-free reconstruction at little extra cost. An alternative analysis reveals that Simultaneous ART (SART) has also great potential to produce reconstruction without cone-beam related artifacts, however at greater computational cost. Putting everything together, we thoroughly investigate the influence of various ART parameters, such as volume initialization, relaxation coefficient l, correction scheme, and number of iterations, on reconstruction quality. Using a 3D version of the Shepp-Logan phantom, we find that ART typically requires only three iterations to render a reconstruction result close to the optimum (given proper parameter settings). Thus we conclude that ART is potentially not any costlier than Filtered Backprojection (FBP) techniques, particularly if one considers the fact that ART only requires a fraction of FBP's projection set. TR53 Fast Implementations of Algebraic Methods for 3D Reconstruction from Cone-Beam Data by Klaus Mueller, Roni Yagel, and John J. Wheller The prime motivation of this work is to devise techniques that make the Algebraic Reconstruction Technique (ART) and related methods more efficient for routine clinical use, while not compromising their accuracy. In particular, we strive to push the overall cost for an ART reconstruction as close as possible to the theoretical cost for a reconstruction obtained with Filtered Backprojection (FBP). While we focus mostly on fast implementations of ART-type methods in the context of 3D cone-beam reconstruction, different portions of the material presented here are also applicable to speed up reconstruction from fan-beam and parallel-beam data. It was shown in previous research that three iterations are sufficient to obtain a high quality reconstruction for low-contrast cone-beam. Based on the observation that ART typically only requires only half the projections of FBP, we conclude that if the overall cost for ART's projection-backprojection operations could be cut in half, then one could obtain an ART implementation whose cost is close to the theoretical cost of FBP. To achieve this goal, we first survey existing projection algorithms and find that these algorithms either lack accuracy or speed, or are not suitable for cone-beam reconstruction. We hence devise a new and more accurate extension to the splatting algorithm, which is a well-known voxel-driven projection method. We also describe a new 3D ray-driven projector that is considerably faster than the voxel-driven projector, and at the same time more accurate and also perfectly suited for the demands of cone-beam. We then devise caching schemes for both ART and Simultaneous ART (SART). These schemes minimize the number of redundant computations for projection and backprojection and, at the same time, are very memory-conscious. We conclude that with caching and the proposed fast projection algorithm the computational cost of ART can be reduced to a level close to FBP's computational effort. TR52 Fast Volume Rendering of Curvilinear Data Sets Using a Splatting Approach by Torsten Moeller, Roger Crawfis, and Nelson Max We develope a new algorithm, that visualizes curvilinear grids fast and accurately. Our algorithm is based on the concept of splatting. In order to reconstruct the underlyuing function we use a Gaussian reconstruction kernel. In computational domain the sampling of the data field is regular and therefore we have spherical reconstruction kernels. However, the mapping into physical space is non-linear and the spherical reconstruction kernels from computational space are mapped to what we call "bean-shapes," which are usually not ellipsoids. We approximate these bean-shapes with ellipsoids and project them to the screen. Our algorithm is similar to Mao's splatting algorithm, but is much more efficient and less computatinoally involved. It also more faithfully represents the underlying data. TR51 Elm: An Algorithm to Approximate a t-Expected Partition by Christopher Jermaine and Renee J. Miller We have previously proposed a new metric of association between values called unexpectedness that is based on information content. This paper describes Elm, a novel, tree-based algorithm that builds a model of the data such that the information content, or unexpectedness, of any trend present in the actual data but not in the model is bounded. TR50 Efficient Collective Communication on Heterogeneous Networks of Workstations By Mohammad Banikazemi, Vijay Moorthy and Dhabaleswar K. Panda Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects, the NOWQ environments are being gradually redefined as Heterogeneous Networks of Workstations (HNOW) environments. This paper presents a new framework to implement collective communication operations (as defined by the Message Passing Interface (MPI) standard) efficiently for the emerging HNOW environments. We first classify different types of heterogeneity in HNOW and then focus on one important characteristic: communication capabilities of workstations. By taking this characteristic into account, we propose a new Speed-Partitioned Ordered Chain (SPOC) approach to order the participating nodes of a given collective communication operation in an optimal fashion. Such an ordering allows flexibility to implement collective communication operations with lower latency. Using the new approach, a set of tree-based algorithms are proposed for frequently used collective communication operations such as broadcast, multicast, and barrier synchronization. All these algorithms are compared with the standard MPI implementations on experimental as well as simulated testbeds. On a 16-node existing HNOW environment with SGI workstations and ATM interconnection, our approach reduces the latency of broadcast, multi-cast and barrier synchronization operations by a factor of up to 1.4 compared to the existing MPICH implementation. On a 64-node simulated testbed, our approach promises to reduce the latency of broadcast operation by a factor of up to 4.8 compared to the MPI implementations. Improvements up to a factor of 3.3 and 2.3 are observed for multicast and barrier synchronization respectively. Thus, these results demonstrate significant potential to be applied towards designing scalable collective communication libraries for current and future generation HNOW environments. Keywords: Network of Workstations (NOW), Heterogeneous Network of Workstations (HNOW), Collective Communication, Broadcast, Multicast, Barrier Synchronization. TR49 Protocols for Secure, Atomic Transaction Execution in Electronic Commerce by Srividhya Subramanian and Mukesh Singhal Computer networks are an efficient, inexpensive, convenient and fast mode of communication and information transfer. There is widespread demand for the ability to buy and sell goods, especially electronic, using this medium. Sales over computer networks is also referred to as electronic commerce. A protocol for electronic commerce transactions is the series of steps that two parties must follow in order to transact business successfully. There are various issues that arise while designing electronic commerce protocols: how can one party be prevented from disappearing with the money/goods in the middle of a transaction? How can a party establish its identity without revealing personal information to unknown and untrusted parties? How can we prevent a third party from "stealing" money from the network? All these and other problems make designing an electronic commerce protocol both challenging and interesting. In this paper, we first present a protocol for electronic commerce transactions between an anonymous customer and a merchant whose identity is public. Tampering with the protocol, either by the involved parties or by an intruder, does not cost the merchant or the customer more than some additional computation. Furthermore, under all circumstances, the transaction is either completed (the right amount of money is exchanged for the right goods) or aborted (there is no transfer of money or goods). In other words, the protocol ensures anonymity, security, and atomicity in business transactions. We also propose a protocol for electronic auctions. The auction protocol has the same properties of anonymity, security, and atomicity as the transaction protocol. We extend these two protocols to apply to real-time transactions where goods (such as stocks, bonds, options, etc.) must be transferred within a stipulated period of time. Keywords: Electronic commerce, transaction, security, atomicity, privacy, anonymity, auction, real-time. TR48 A Comprehensive Survey of Join Techniques in Relational Databases by Yuping Yang and Mukesh Singhal Equijoin between two relations is one of the basic operations in relational database and a large volume of research has been devoted to it. However, in recent years, there hasn't been a survey which objectively compares a wide spectrum of various join techniques in their relative performances. This survey compares performance and practicality between various join techniques. Main criteria for performance comparisons are disk I/Os. For comparing of practicality, criteria used are easiness and flexibility of implementation. When comparing join techniques, each join technique is evaluated in its full potential, which means that if other techniques are available to enhance this join technique while retaining the main philosophy of it, the later techniques are applied. The main contribution of the paper is that it confirms the belief that no dramatical performance improvement of three major join techniques (nested loops, sort-based, and hash based) of relational database can be made. The future of join performance improvement in relational database systems lies in more radical approach: parallel join, join index, composite index, and layered database. Key words: Relational database, query execution, join, join index, equijoin, band join, selection, index, access path, disk I/O, performance. TR47 Fast Join Execution Using Summary Information in Large Databases by Mukesh Singhal and Yuping Yang It is well known that a query execution in relational databases is not fast and join execution is generally the most expensive operation in query executions. All three major join methods, namely, nest loops, sorting, and hashing have been perfected to an extent that any further improvement in these methods will enhance the performance of join execution only marginally. Even with the advent of parallel database, the enhancement in the performance of join execution is limited: join execution is often I/O bounded and an increase in the number of processors can only have a marginal effect after a certain point. The use of join index to speed up join execution is also limited due to its huge storage requirement. In this paper, we propose to use summary information to expedite query execution, especially join execution, in large databases. Summary information is stored in asummary database which forms the upper layer of a multi-layer database. The size of the summary database is much smaller than the size of a join index. An interesting feature of our approach is that it allows the full set of query processing, including join operation, to be carried out in the summary database. The results of the query processing in the summary database are used to narrow down the search efforts in the main relational database and can greatly expedite the overall query execution in the main database. The main contribution of this paper is that it proposes a mechanism to perform join using summary information for performance enhancement in query execution in relational databases. We present a performance analysis that shows that the proposed technique can greatly speed up query execution, especially join execution. Key words: relational database, large database, multiple layer, join, signature, grouping, performance. TR46 Effective Use of Virtual Channels in Wormhole Routed Distributed Shared Memory Systems by Donglai Dai and D.K. Panda Multiple virtual channels have been studied extensively and implemented as a powerful mechanism for obtaining high performance in modern inter- connection networks. However, the effective use of virtual channels in distributed shared memory (DSM) systems is non-trivial. In this paper we present two simple techniques for exploiting the promising performance benefit of virtual channels in DSM systems. First, we propose to use unequal number of virtual channels in the request and reply networks to harness performance benefit from the inherent unbalanced traffic in DSM systems. Next, we apply the virtual channel mechanism to injection and consumption channels to alleviate the performance bottleneck at the network interface. In addition, issues related to virtual channels are taken into account in designing non-FIFO coherence protocols. These new design strategies are evaluated through simulation using a set of representative benchmark applications. The effect of critical system parameters (such as node cache line size, switch routing delay, and network topology) on the new designs is also studied. Overall, our results show that with an appropriate non-FIFO cache coherence protocol, virtual channel mechanism can considerably reduce the time for shared memory accesses and the overall execution time of DSM applications. This study demonstrates that a carefully coordinated design strategy using the virtual channel mechanism can significantly improve the performance of distributed shared memory systems. Keywords: Parallel architecture, distributed shared memory systems, performance modelling, interconnection networks, wormhole routing, topology, network interface, directory-based protocols, and cache coherence. TR45 HIPIQS: A High-Performance Switch Architecture using Input Queuing by Rajeev Sivaram, Craig B. Stunkel, and Dhabaleswar K. Panda Switch-based interconnects are used in a number of application domains including parallel system interconnects, local area networks, and wide area networks. However, very few switches have been designed that are suitable for more than one of these application domains. Such a switch must offer both extremely low latency and very high throughput for a variety of different message sizes. While some architectures with output queuing have been shown to perform extremely well in terms of throughput, their performance can suffer when used in systems where a significant portion of the packets are extremely small. On the other hand, architectures with input queuing offer limited throughput, or require fairly complex and centralized arbitration that increases latency. In this paper we present a new input queue-based switch architecture called HIPIQS (High-Performance Input-Queued Switch). It offers low latency for a range of message sizes, and provides throughput comparable to that of output queuing approaches. Furthermore, it allows simple and distributed arbitration. HIPIQS uses a dynamically allocated multi-queue organization, pipelined access to multi-bank input buffers, and small cross-point buffers, to deliver high performance. Our simulation results show that HIPIQS can deliver performance close to that of output queuing approaches over a range of message sizes, systems sizes, and traffic. The switch architecture can therefore be used to build high performance switches that are useful for both parallel system interconnects and for building computer networks. Keywords: Parallel architectures, switch/router design, high-speed interconnection networks, networks of workstations. TR44 On Consistent Checkpointing in Distributed Systems by Guohong Cao and Mukesh Singhal Consistent checkpointing simplifies failure recovery and eliminates the domino effect in case of failure by preserving a consistent global checkpoint on the stable storage. However, the approach suffers from high overhead associated with the checkpointing process. Two approaches are used to reduce the overhead: one is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process non-blocking. These two approaches were orthogonal in previous years until the Prakash-Singhal algorithm combined them. In other words, the Prakash-Singhal algorithm forces only a minimum number of processes to take checkpoints, and it does not block the underlying computation. In this paper, we identify two problems in their algorithm and prove that there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints. Based on the proof, we present an efficient algorithm that neither forces all processes to take checkpoints, nor blocks the underlying computation during checkpointing. Correctness proofs are also provided. TR43 Teaching the Essential Role of Mathematical Modeling in Understanding and Reasoning about Objects by Murali Sitaraman, Bruce Weide, Tim Long, and Wayne Heym Recognizing the advantages of object-based software, CS instructors have increasingly emphasized design and reuse of "objects" in CS1/CS2. But unless students also learn how to understand objects and operations abstractly, and how to reason about the behavior of object-based programs, they can't possibly write correct non-trivial software. We describe how we present formal yet simple mathematical models of objects and operations and how we use them to teach reasoning about object-based program behavior. The techniques we teach are both suitable for introductory courses and scalable to industrial- strength software. TR42 Providing Intellectual Focus to CS1/CS2 by Tim Long, Bruce Weide, Paolo Bucci, Dave Gibson, Joe Hollingsworth, Murali Sitaraman, and Steve Edwards First-year computer science students need to see clearly that computer science as a discipline has an important intellectual role to play and that it offers deep philosophical questions, much like the other hard sciences and mathematics; that CS is not "just programming". An appropriate intellectual focus for CS1/CS2 can be built on the foundations of systems thinking and mathematical modeling, as these principles are manifested in a component- based software paradigm. We outline some of the main technical features of this approach to CS1/CS2 and report preliminary observations from our experience with it. TR41-DIR Conceptual Program Editors: Design and Formal Specifications Ph.D. Dissertation by Paolo Bucci When a programmer uses a program editor, his/her view of what programs are and how they are built is affected by the editor itself. This is a special case of a more general phenomenon; namely, that all artifacts have an impact on the way users think of them and of the activities they perform with them. Research in human-machine interaction has explored the ramifications of this insight with respect to how it affects the design of tools and their user interface. In this dissertation we investigate the implications of this work for the design of program editors. We note that existing program editors convey a low-level, often inconsistent and incomplete conceptual model of programs and program construciton. This hides the high-level conceptual view of programs, imnposes excessive responsibilities on the programmer, and tends to make the programming task harder and more error prone. We propose a new class of program editors that we call "conceptual editors." These editors are based on a specific, precise, high-level conceptual model of programs and program construction, and the editor and its user interface can be designed and built to convey the model in a consistent fashion. A conceptual editor, by design, supports, promotes, and even enforces an "appropriate" conceptual view of software through this model. To show the feasibility of this approach to program editors, we formally specify an editor for a small, but significant (in terms of key concepts) subset of a programming language. By example, this demonstrates a precise description of a conceptual model and of a corresponding editor's capabilities. We also discuss how this design could be extended and generalized to an editor for a full-fledged programming language. TR40-DIR Texture Segmentation Using Gaussian Markov Random Fields and Neural Oscillator Networks by Erdogan Cesmeli and DeLiang L. Wang An image segmentation method is proposed based on texture analysis. The method is composed of two parts. The first part determines a novel set of texture features based on Gaussian Markov Random Fields (GMRF). Unlike other GMRF-based methods, our method is not limited by a fixed set of texture types. The second part is a 2D array of locally excitatory globally inhibitory oscillator networks (LEGION). The coupling strengths between neighboring oscillators are determined by texture feature differences. When LEGION runs, the oscillators corresponding to the same texture tend to oscillate in synchrony, whereas different texture regions tend to correspond to different phases. In simulations, a large system of differential equations are solved using a recently proposed method for integrating relaxation oscillator networks. Results on real texture images are provided to demonstrate the performance of our method. Keywords: texture segmentation, neural networks, gaussian markov random fields, relaxation oscillators, LEGION. TR39 A Movie Approximation Technique for the Implementation of Fast Bandwidth Smoothing Algorithms by Wu-chi Feng, Ming Liu, and Chi Chung Lam Bandwidth smoothing algorithms have been shown to be effective in reducing the network resource requirements for the delivery of compressed video streams. For stored video, a large number of bandwidth smoothing algorithms have been introduced that are optimal under certain constraints. These algorithms, however, require access to all the frame sizes to achieve their optimal properties, which, in turn, can limit responsiveness of video- on-demand systems especially under VCR functions. In this paper, we introduce a movie approximation technique for the representation of the frame sizes, reducing the complexity of the bandwidth smoothing algorithms and reducing the amount of data and computation that must be transmitted prior to the start of playback. Our results show that the proposed approximation technique can accurately approximate the frame sizes with a small number of linear segments without affecting the performance measures that the bandwidth smoothing algorithms are attempting to achieve by more than 1%. In addition, we show that implementations of this techniques speed up execution times by 100 to 1400 times, with typical approximated bandwidth smoothing plan calculations for full-length movies requiring less than 20 milliseconds. Evaluation using both a Motion-JPEG and MPEG video clip are provided. Keywords: bandwidth smoothing, video-on-demand, MPEG, video networking. TR38-DIR Perceiving without Learning: from Spirals to Inside/Outside Ke Chen and DeLiang L. Wang Since first proposed by Minsky and Papert (1969), the spiral problem is well-known in neural networks. It receives much attention as a benchmark for various learning algorithms. Unlike previous work that emphasizes learning, we approach the problem from a generic perspective that does not involve learning. We point out that the spiral problem is intrinsically connected to the inside/outside problem proposed by Ullman (1984, 1996). We propose a solution to both problems based on oscillatory correlation using a time delay network. Our simulation results match human performance, and we interpret human limitations in terms of synchrony and time delays, both biologically plausible. As a special case, our network without time delays can always distinguish these figures regardless of shape, position, size, and orientation. We conjecture that visual perception will be effortful if local activation cannot be rapidly propagated, as synchrony would not be established in the presence of time delays. Keywords: Visual perception, the spiral problem, inside/outside relations, oscillatory correlation, synchronization, desynchronization, LEGION, time delays TR37 Route Optimization in Mobile ATM Networks by Gopal Dommety, Malathi Veeraraghan, and Mukesh Singhal Location manatement schemes that propose setting up the connection to the home location of a mobile and then rerouting the connection to the mobile's current location could result in suboptimal connection paths. In this paper, we propose an algorithm for optimizing the route of a connection that becomes suboptimal due to location-based reroutes for mobile ATM (Asynchronous Transfer Mode) networks based on the PNNI (Private Network-to-Network Interface) standard. This algorithm uses the hierarchical route information of the connection and the summarized topology and loading information of the network to determine a "crossover node" such that adjusting the connection from that crossover node results in an optimally routed connection. This route optimization procedure can be executed during mobile connection setup, thus proposing a one-phase mobile location and connection setup scheme. We also demonstrate how this route optimization procedure can be executed after a connection is set up to a mobile, thus proposing the second phase of two-phase mobile location and connection setup schemes. A comparative performance analysis of one- and two-phase connection setup schemes is presented. Measures of comparison are call setup delay and the amount of network resources allocated to a conneciton. One-phase scheme incurs higher call setup delays, but results in an optimal route during connection setup. On the other hand, two-phase schemes incur lower call setup delays, but result in connections with suboptimal routes, prior to the execution of the route optimization phase. TR35 Simulation of Modern Parallel Systems: A CSIM-Based Approach by Dhabaleswar K. Panda, Debashis Basak, Donglai Dai, Ram Kesavan, Rajeev Sivaram, Mohammad Banikazemi and Vijay Moorthy Components of modern parallel systems are becoming quite complex with many features and variations. An integrated modeling of these components (interconnection network, messaging layer, programming model, and computation- communication characteristics of applications) is essential to derive design guidelines for next generation parallel systems. Most of the current simulation-based modeling platforms do not support such integrated modeling. This paper presents our effort at The Ohio State University towards integrated modeling of parallel systems. Basic features of our CSIM-based Wormhole- routed Multiprocessor Simulator (WORMulSim) are outlined. A set of techniques used in our simulator to model different network components (such as switches, linkes, wormhole/cut-through switching techniques, routing protocols, network interfaces), messaging layer with basic communication primitives, distributed shared memory programming model, and computation-communication characteristics of applications are presented. Some sample performance measures of our simulator on current generation workstations are reported to demonstrate the feasibility of integrated modeling with low computatonal overhead. TR34 Fine-Grain Multitolerant Barrier Synchronization by Sandeep S. Kulkarni and Anish Arora We design a multitolerant program for synchronizing the phases of concurrent processes. The tolerances of the program enable processes to (i) compute all phases correctly in the presence of faults that corrupt process state in a detectable manner, and (ii) compute only a minimum possible number of phases incorrectly before resuming correct computation in the presence of faults that corrupt process state in an undetectable manner. The program is fine-grain in the sense that each process action either updates the state of that process or involves communication with one of two neighboring processes. TR33 Flat Location Management Scheme for PCNs by Gopal Dommety, Malathi Veeraraghavan, & Muesh Singhal This paper presents a "flat" mobile location management scheme for Personal Communication Networks (PCNs) with the goal of reducing both mobile tracking and location costs. Mobile tracking costs are reduced by the use of one-level tracking for mobiles in their "home area", and a two-level tracking approach for mobiles visiting other areas. Mobile location costs are reduced by using an improved mobile location strategy, which requires a one-hop message exchange for mobiles in their home area, and a two-hop message exchange for mobiles visiting other areas. Analytical models are set up to analyze the performance of the flat scheme, and compare it with that of the other approachees. The measures of comparison used are the computation cost (database accesses), the communication cost (signaling messages), the average total cost, and the mobile location cost (call setup delay). For all these measures, the proposed flat scheme performs better than the location management scheme defined in the IS-41 standard. Results also show that for medium and high values of call-to-mobility ratio (> 0.07, i.e., less than 14 moves per call), the flat scheme performs better than other approaches proposed in the literature. TR32 Compositional Design of Multitolerant Repetitive Byzantine Agreement by Sandeep S. Kulkarni and Anish Arora We illustrate in this paper a compositional and stepwise method for designing programs that offer a potentially unique tolerance to each of their fault-classes. More specifically, our illustration is a design of a repetitive agreement program that offers two tolerances: (a) it masks the effects of Byzantine failures and (b) it is stabilizing in the presence of transient and Byzantine failures. TR31 A Multiple Layered Signature Database Architecture for Mobile and Internet Environments by Yuping Yang and Mukesh Singhal The performance of very large relational databases is one of the major stumble blocks of their deployment in the mobile and Internet environments. Indexing, hashing, and sorting are still main query execution speed up techniques in actual database implementations. Join index and materialized view have been proposed but not widely accepted due to their large storage requirements. Recently, general multi-layer database concept based on concept-hierarchy has been proposed to improve the performance of query execution. However, there are still much to be done to expand this idea into an implementable database architecture. Implementable new query execution speed up techniques that can significantly speed up query execution, especially when there are a large number of users, yet do not demand excessive amount of extra storage space other than database itself, are in demand. This paper proposes a multiple layer database architecture based on signature method in implementable detail, compares this new architecture with existing query execution speed up techniques, and gives a performance analysis of this new architecture. Keywords: Relational database, very large database, multi-layer, join, query processing, signature, grouping, performance, mobile Internet. TR30 Performance Study of Real-Time Scheduling Techniques under Multicasting Traffic in an ATM Multiplexer by Khalid H. Sheta and Mukesh Singhal Multimedia applications in broadband ISDN present various types and classes of traffic streams, i.e., connections such as audio, video, and file transfers. Output multiplexers at an ATM switching node are responsible for scheduling cells belonging to these traffic connections. A cell loss in a traffic stream due to a cell missing its deadline results in the loss the entire packet that should have been delivered to the destination node. Multicasting is an important application in broadband networks. A cell loss of a multicast stream at an ATM multiplexer not only affects one node, but also it affects a group of nodes in the multicasting group. This packet loss leads to inconsistency in the information received among different members of the multicast group. Moreover, if the lost packet can be retransmitted by the root node of the multicast tree, the retransmission will result in an excessive load to the network. In this paper we address the problem of scheduling connections that contain real-time multicasting/non-multicasting and non-real-time traffic. Traffic on real-time connections has a deadline. Each cell has a deadline associated with it (in terms of cell transmission time slots) which the cell has to be scheduled at the multiplexer before it expires. We propose a new scheduling technique that makes use of the information available at multicast connection setup time to give appropriate priority to multicast traffic. In particular we use the multicast tree size (number of nodes) and the multicast subtree sizes at each multiplexer. We show how this information coupled with cell arrival time, can be used to proportinally weigh the actual cell deadline of multicast cells to give them priority over other classes of real-time non-multicast and non-real-time cells. We assume an On-Off traffic source model for real-time connections containing two types of deadlines, tight and relaxed. We compare, through extensive simulation, our technique with two well known real-time scheduling techniques. We show how the new technique outperforms the other techniques by improving cell loss for multicast connections while attempting to maintain the same cell loss or better for non-multicasting connections over a wide range of parameter values. We also show that the new technique does not increase the delay of non-real-time connections compared to the other scheduling techniques. Finally, we discuss how the new technique can be incorporated into an ATM switching node. Key words: Multicast, Real-Time, Scheduling, ATM Switches. TR29 Once-and-Forall Management Protocol (OFMP) by Sandeep S. Kulkarni and Anish Arora OFMP is a hierarchical, network management protocol that enables group operations to be executed on the management information bases of all nodes in the group. As long as no faults occur, OFMP ensures that all nodes execute their local operation exactly once in each group operation. If ``immediately detectable'' faults occur, it ensures masking fault-tolerance; i.e., all non-failed nodes execute their local operation exactly once in each group operation. And, if ``eventually detectable'' faults occur, it ensures stabilizing fault-tolerance; i.e., it eventually converges to a state from where all non-failed nodes execute their local operation exactly once in each subsequent group operation. Of special note is the ability of OFMP to detect using only a bounded amount of memory whether nodes have executed in same group operation, in a manner that masks immediately detectable faults and stabilizes from eventually detectable faults. TR28 A Certificate Path Generation Algorithm for Authenticated Signaling in ATM Networks by Jun Xu and Mukesh Singhal ATM Forum specifies public key cryptography to be the default ATM authentication mechanism and directory services like X.509 to be the infrastructure for public key distribution and certification. Authenticated signaling, widely acknowledged as a necessary security feature of ATM network, requires the signaling message to be authenticated with a digital signature, the called party needs to obtain the public key of the calling party and a proof of the calling party's ownership to that public key. In X.509, the standard form of such a proof is a chain of public key certificates, called the certificate path between two parties. Certificate exchange protol (CEP), proposed by ATM Forum, requires that another bi-directional connection be established between two parties to exchange public keys and certificate paths before an authenticated connection can be set up, which is not an ideal approach. We propose an algorithm which is embedded into ATM routing protocols to generate a certificate path inside a signaling message on-the-fly as the signaling message travels through the ATM network. In this approach, all that a calling party needs to know for authentication purpose is its own public key certificate and the ATM network builds the rest of the certificate path for it. Related issues like distribution of public key certificates and optimization of CA hierarchy are also addressed in this paper. Keywords: ATM, P-NNI, authenticate signaling, certificate path. TR27-DIR Slice Based Volume Rendering by David M. Reed and Roni Yagel Some of the more important research results in computational science rely on the use of simulation methods that operate on unstructured grids. However, these grids, composed of polyhedra, introduce exceptional problems with respect to data visualization. Volume rendering techniques, originally developed to handle rectangular grids, show significant promise for general use with unstructured grids as well. The main disadvantage of this approach, compared to isosurfaces, particles or other visualization tools is its non- interactive performance. This paper presents an efficient method for rendering unstructured grids that is based on incremental slicing and hardware polygon rendering. The grid is sliced with a set of parallel planes resulting in a polygonal mesh. Look-up tables are used to generate the polygons based on which edges of a polyhedron are intersected. The graphics hardware is then used to render (interpolate-fill) the polygon meshes and composite them in a visibility order. The main advantages of this algorithm are that it avoids an explicit sort to produce a depth ordering and can take advantage of polygon rendering hardware. These two properties of the algorithm allow it to produce good quality images quickly using machiens with polygon rendering hardware. TR26-DIR Segmentation of Medical Images Using LEGION by Naeem Shareef, DeLiang L. Wang, and Roni Yagel Advances in visualization technology and specialized graphic workstations allow clinicians to virtually interact with anatomical structures contained within sample medical image datasets. A hindrance to the effective use of this technology is the difficult problem of image segmentation. In this paper, we utilize a recently proposed oscillator network called LEGION (Locally Excitatory Globally Inhibitory Oscillator Network), whose ability to achieve fast synchrony with local excitation and desynchrony with global inhibition makes it an effective computational framework for grouping similar features and segregating dissimilar ones in an image. We identify four key computational components that determine its dynamics. Then, we describe an image segmentation algorithm that uses these LEGION components and an adaptive scheme to choose tolerances. We show results of the algorithm to 2D and 3D (volume) CT and MRI medical image datasets, and compare our results with manual segmentation. LEGION's computational and architectural properties make it a promising approach for real-time medical image segmentation using multiple features. TR25 Scheduling Fork-Join Computations on Distributed-Memory Multiprocessor Systems by Khalid H. Sheta, Mukesh Singhal and Phillip E. Krueger In this paper, we develop two analytic models for scheduling fork-join computations on distributed-memory multiprocessor system. The first model allows for arriving tasks to multiplex processors with existing tasks, while the other model does not allow multiplexing. Both models assume that tasks block for I/O and that the overhead of rescheduling a task to another processor is prohibitive. We compare the performance of both models in terms of the overall job completion time. We show that multiplexing have a performance advantage over non-multiplexing over a wide range of conditions. We point out how the results can be used in the design of an adaptive multiprocessor scheduler. Key words: distributed-memory multiprocessors, job scheduling, fork-join computations, performance analysis. TR24 Phase-Space Nonlinear Control Toolbox: The Maglev Experience by Feng Zhao, Shiou C. loh, and Jeff A. May We describe the Phase-Space Nonlinear Control Toolbox, a suite of computational tools for synthesizing and evaluating control laws for a broad class of nonlinear dynamical systems. The Toolbox comprises computational algorithms for identifying optimal control reference trajectories in the phase space of dynamical systems and experimental methods for evaluating performance of the control laws. These algorithms combine knowledge of the geometric theory of modern nonlinear dynamical systems with efficient computational methods for geometric reasoning and graph search; they define the property of controllability and robustness in terms of phase-space geometric structures and exploit the phase-space neighborhood adjacencies to obtain computational efficiency. Compared to the traditional analytic control design methods, the phase-space based control synthesis and evaluation rely on high-performance computational techniques and are applicable to physical systems operating in large nonlinear regimes. Using a proof-of-concept physical experiment for stabilizing a nonlinear magnetic levitation system, we have successfully demonstrated the feasibility of the phase-space control technology. TR23 An Introduction to RESOLVE/Ada95 by David S. Gibson (No abstract per se exists: this is the first paragraph of the Preface). This report introduces RESOLVE/Ada95, a discipline for building high-quality software components using the 1995 revision to Ada. RESOLVE/Ada95 is an approach to implementing software components in Ada95 using the principles of RESOLVE. RESOLVE is three things. First, RESOLVE is a framework for software component engineering. RESOLVE is also a language which includes two integrated sub-languages: a model-based formal specification language and an imperative, sequential, programming language. Finally, RESOLVE is a discipline with detailed design principles which guide software engineers in the development of high quality software components and systems. The RESOLVE discipline has been applied to the Ada and C++ programming languages. This report demonstrates how the RESOLVE displine may be applied to Ada95. TR22 Stepwise Design of Tolerances in Barrier Computations by Sandeep S. Kulkarni and Anish Arora We design in a stepwise manner a multitolerant computation for synchronizing the phases of concurrent processes. The tolerances of the computation enable processes to (i) compute all phases correctly in the presence of faults that corrupt process state in a detectable manner, and (ii) compute only a minimum possible number of phases incorrectly before resuming correct computation in the presence of faults that corrupt process state in an undetectable manner. Keywords: multitolerance; detectable and undetectable faults; stepwise design; concurrent systems. TR21 Multicasting on Switch-based Irregular Networks using Multi-drop Path-based Multidestination Worms by Ram Kesavan and Dhabaleswar K. Panda This paper presents a novel concept of multi-drop path-based multi- destination message passing on switch-based irregular networks. First, the multi-drop mechanism is defined with an associated header encoding scheme, and this mechanism is used to develop path-based multidestination worms. Next, a method is proposed to identify valid multidestination paths on arbitrary irregular networks with a typical deadlock-free routing. Then, the deadlock-free property of multi-drop path-based worms is emphasized. Using the above concepts, three multicast algorithms are proposed: multi-drop path-based greedy (MDP-G), multi-drop path-based less-greedy (MDP-LG), and multi-drop path-based binomial (MDP-B). The proposed algorithms are compared with each other and with the best unicast based algorithm, the CCO [5], for a range of system and technological parameters. The MDP-LG scheme is shown to be the best to implement multicast with reduced latency. Keywords: Interconnection network, collective communication, multicast, broadcast, worm-hole routing, cut-through routing, networks of workstations, and path-based routing. TR20 Multicasting in Irregular Networks with Cut-Through Switches using Tree-Based Multidestination Worms by Rajeev Sivaram, Dhabaleswar K. Panda and Craig B. Stunkel Multidestination message passing has been proposed as a mechanism to achieve efficient multicast in regular direct and indirect networks. The application of this technique to parallel systems based on irregular networks has, however, not been studied. In this paper we propose two schemes for performing multicast using multidestination worms on irregular networks and examine the extent to which multidestination message passing can improve multicast performance. For each of the schemes we propose solutions for the associated problems such as, methods for encoding and decoding multidestination headers, alterations to the setup algorithm run by the switches, logic to perform header manipulation, etc. We perform extensive simulations to evaluate our schemes under a variety of changing parameters: software startup overhead per message, system size, switch size, message length and degree of connectivity. Our results establish that even a very naive multicasting algorithm using multidestination message passing can result in considerable performance gains over the best unicast based multicasting schemes. Keywords: Parallel computer architecture, cut-through routing, multicast, broadcast, irregular networks, collective communication. TR19 How Can We Design Better Networks for DSM Systems? by Donglai Dai and Dhabaleswar K. Panda Most DSM research in current years have ignored the impact of inter- connectioni network altogether. Similarly, most of the interconnection network research have focused on better network designs by using synthetic (uniform/non-uniform) traffic. Both these trends do not lead to any concrete guidelines about designing better networks for the emerging Distributed Shared Memory (DSM) paradigm. In this paper, we address these issues by taking a three-step approach. First, we propose a comprehensive parameterized model to estimate the performance of an application on a DSM system. This model takes into account all key aspects of a DSM system: application, procesor, memory hierarchy, coherence protocol, network, etc. Next, using this model we evaluate the impact of different network design choices (link speed, link width, topology, ratio between router to physical link delay) on the overall performance of DSM applications and establish guidelines for designing better networks for DSM systems. Finally, we use simulations of SPLASH2 benchmark suites to validate our design guidelines. Some of the important design guidelines established in this paper are: 1) better performance is achieved by increasing link speed instead of link width, 2) changing topology of a network under constant bisection bandwidth constraint is not at all beneficial, 3) network contention experienced by short messages is very crucial to the overall performance, etc. These guidelines together with several others lays a good foundation for designing better networks for current and future generation DSM systems. Keywords: Parallel architecture, distributed shared memory systems, performance modeling, interconneciton networks, wormhole routing, topology, directory-based protocols, and cache coherence. TR18 Performance Evaluation of Smoothing Algorithms for Transmitting Prerecorded Variable-Bit-Rate Video by Wu-chi Feng and Jennifer Rexford The transfer of prerecorded, compressed video requires multimedia services to support large fluctuations in bandwidth requirements on multiple time scales. Bandwidth smoothing techniques can reduce the burstiness of a variable-bit-rate stream by prefetching data at a series of fixed rates, simplifying the allocation of resources in video servers and the communication network. Given a fixed client-side prefetch buffer, several bandwidth smoothing algorithms have been introduced that are provably optimal under certain constraints. This paper presents a comprehensive performance evaluation of bandwidth smoothing algorithms based on a collection of metrics that relate directly to the server, network, and client resources necessary for the transmission, transport, and playback of prerecorded video. Due to the scarcity of available trace data, we have constructed a video capture testbed and generated a collection of twenty full-length, motion-JPEG encoded video clips. Using these video traces and a range of client buffer sizes, we investigate the interplay between the performance metrics through simulation experiments. The results highlight the unique strengths and weaknesses of each bandwidth smoothing algorithm and pinpoint promising directions for future research. TR17 Aristotle: A System for Research on and Development of Program- Analysis-Based Tools by Mary Jean Harrold and Gregg Rothermel Aristotle provides program analysis information and supports the development of software engineering tools. Aristotle's front end consists of parsers that gather control-flow, local data flow, and symbol table information for C and Java programs. Aristotle tools use the data provided by the front end to perform a variety of tasks, such as data-flow and control- dependence analysis, data-flow testing, regression test selection, graph construction and graph viewing. Parsers and tools use database access routines to store information in, and retrieve it from, a data repository. Users can view analysis data textually or graphically. A user interface provides menu-driven access to tools; many tools can also be invoked directly from applications programs. Most of Aristotle's components function on single procedures and entire programs. We use Aristotle as a platform for developing and experimenting with program analysis, maintenance, and testing tools. KEY WORDS: program analysis software engineering, CASE, experimental systems. TR16 Constructing Pairwise Disjoint Paths with Few Links by Himanshu Gupta and Rephael Wenger Let P be a simple polygon and let {(ui, ui',)} be m pairs of distinct vertices of P where for every distinct i, j is less than or equal to m, there exist pairwise disjoint paths connecting ui to ui' and uj to uj'. We wish to construct m pairwise disjoint paths in the interior of P connecting ui to ui' for i = 1, . . . , m, with minimal total number of line segments. We give an approximation algorithm which in O(n log m + M log m) time constructs such a set of paths using O(M) line segments where M is the number of line segments in the optimal solution. TR15 Shape Reconstruction from Contours using Isotopic Deformation by Kikuo Fujimura and Eddy Kuo A method for shape reconstruction from contours is presented using isotopic deformation. The reconstructed shape is free of self-intersection and it can incorporate given feature correspondences. The method automatically adds vertices, where necessary, to satisfy these requirements. A new method for handling bifurcation is also proposed, which can handle cases that are problematic for some other algorithms. The method proposed is suitable for terrain modeling, since reconstructed shapes generated by the method do not have overhangs. The running time of the algorithm is dependent on the complexity of the input scene and is shown to be worst-case optimal for the class of task defined. Experimental results are included to illustrate the feasibility of the approach. Keywords: shape reconstruction, terrain modeling, isosurface, isotopy. TR14 How Much Does Network Contention Affect Distributed Shared Memory Performance? by Donglai Dai and Dhabaleswar K. Panda Many research results in recent years have focused on the design of distributed shared memory (DSM) systems. However, most of these results are centered around either careful design of node controllers or cache coherence protocols. While evaluating these designs, simplified models of networks (constant network latency or average latency based on the network size) are typically used. Such models completely ignore network contention and thus, do not capture the latency of remote memory references accurately. While this approach may seem reasonable for designing better node architectures or coherency protocols, it serves poorly for understanding and solving the network latency bottleneck problem. In order to help network designers to design better networks for DSM systems, in this paper, we focus on two major goals: 1) to estimate the impact of network link contention and network interface contention on the overall performance of DSM applications and 2) to study the impact of critical architectural parameters on network contention. We achieve these goals by evaluating a set of applications from the widely used SPLASH2 suite on a closely integrated processor and network simulator for a CC-NUMA system. The simulator models the processor, cache, and memory access references at instruction level and models the network at a flit transfer level. A series of three network models are proposed to isolate network link contention and network contention. For an 8 x 8 wormhole system, the study indicates that network contention can degrade performance up to 59.8%. Out of this, up to 7.2% is caused by network interface contention alone. The study indicates that network contention becomes dominant for DSM systems using small caches, wide cache line sizes, low degrees of associativity, high processing node speeds, high memory speeds, low network speeds, or small network widths. Keywords: Parallel architecture, distributed shared memory systems, inter- connection networks, wormhole routing, directory-based protocols, cache coherence, and meshes. TR13 Interaction Refinement in Object-Oriented Systems (Extended Abstract) by Neelam Soundarajan An OO designer typically starts with a high-level idea of the interactions between the key objects in the system. As the design progresses, the designer refines these interactions by identifying the exact operations of each object that other objects will invoke and the order of invocations, or by introducing new objects with appropriate operations to mediate the required interactions between existing objects, etc. These refinements are usually not recorded, so the rationale behind the design may be lost. We motivate the notion of interaction refinement with a few examples, provide a precise definition of the concept, and develop a formalism that can be an important tool in recording and validating OO designs. TR12 Refining INteractions in a Distributed System (Extended Abstract) by Neelam Soundarajan We identify a new type of refinement that we call interaction refinement that is of great value in designing distributed systems. An interaction refinement step is one in which a system designer decides to implement a given "high-level" interaction as a particular sequence of "low-level" interactions. For example a particular high-level interaction may be specified as "identify-client" involving a "client process" and a "server process". The designer may decide to implement this interaction as a series of low-level interactions between the two processes, with the client process first sending to the server process an id, followed by a password, and perhaps allow the client several attempts at getting the password right, etc. Here the designer is not refining the internal structure of either the client process or the server process; rather she is deciding to implement the "identify-client" interaction between the two processes in terms of a sequence of low-level communications between them. By contrast, other types of refinement that are normally considered are concerned with refining the internal structure of one or more of the processes, the interactions between the processes being taken as a given. We motivate the idea of interaction refinement with simple and natural examples, present a precise definition of the concept, and develop a preliminary formalism that can be used to establish correctness of systems developed using interaction refinement. We also compare our work with previous work which allow one to deal with restricted kinds of interaction refinement. TR11 Can Scatter Communication Benefit from Multidestination Message Passing? Mohammad Banikazemi and Dhabaleswar K. Panda Multiple research directions in the past have emphasized the benefits of multidestination message passing to support efficient non-personalized communication like broadcast and multicast. In this paper, we investigate whether personalized communications like scatter can benefit by using multidestination message-passing support, if available in a system. In order to use such support, we propose a new approach where personalized data for a set of destinations can be grouped together and sent as a single multidestination broadcast/multicast worm. This approach is shown to use less number of communication steps while achieving the lower bound for scatter communication. Using this approach, we propose two schemes, sequential multidestination tree-based (SMT) and hierchical multidestination tree-based (HMT), to implement one-to-all scatter in k-ary n-cube wormhole systems. Performance of these two schemes are compared with two commonly used unicast- based schemes, sequential tree-based (ST) and binomial tree-based (BT). Analytical and simulation studies are carried out to determine the scatter latency under the four schemes for k-ary n-cube systems with varying communication start-up times (ts), propagation times (tp), and message lengths (l, size of data being sent to each destination). The study indicates that for l less than or equal to ts/ktp, the scatter can be best implemented by the HMT scheme. Similarly, the SMT scheme is shown to outperform where ts/ktp < l is less than or equal to ts/tp. For l > ts/tp, both ST and SMT perform better than other algorithms. Simulation results on an 8 x 8 system with ts = 10.0 microsec show that the HMT and SMT schemes reduce scatter latency up to a factor of 30 and 8 compared to the ST scheme for small message lengths. These results confirm that scatter communication can in fact take advantage of multidestination message passing support. Keywords: Parallel architecture, collective communication, scatter, personalized communication, wormhole routing, k-ary n-cubes, path-based routing, and multidestination message passing. TR10 Optimal Multicast with Packetization and Network Interface Support Ram Kesavan and Dhabaleswar K. Panda Most multicast algorithms proposed in the literature assume an arbitrarily long message being communicated as a single packet. Modern networks typically limit the size of the largest packet, and long messages are packetized and transmitted. Such networks also provide network interface support for nodes, which typically includes a coprocessor and memory, to implement the lower layers of the communication protocol. Such network interfaces can be programmed to support efficient multicasting to eliminate software overhead for higher layers during absorb and retransmit. In this paper, we present an optimal multicast algorithm for systems with such smart network interface support for packetization. Two implementations of smart network interface, the First-Child-First-Served (FCFS) and the First-Packet-First-Served (FPFS), are studied and compared. It is shown that the FPFS network interface support is more practical and efficient. Next, the multicast latency is modeled for the FPFS implementation. A concept of k-binomial tree is introduced, and proved to be optimal for multicasting using such network interface support. A method to construct contention-free k-binomial trees on contention-free orderings of the nodes is also shown. The performance of the optimal k-binomial tree is compared with the standard binomial tree for a range of multicast set size and number of packets. For a 64 node system, and parameters of ts = 12.5 microseconds (message and overhead), tr = 12.5 microseconds (message receive overhead), and tstep = 7.2 microsecond (overhead of preparation and transfer of a packet by the network interface), the k-binomial tree is shown to be up to 3 times better than the binomial tree for 24 packet long messages. The k-binomial tree shows increased benefits with increase in number of packets. Thus, these results demonstrate significant potential to be applied to current and future generation high performance systems including MPPs and NOWs, where network interface support for multicast is provided. Keywords: Interconnection Network, Collective communication, Multicast, Broadcast, Wormhole Routing, Link Contention, Networks of Workstations, Network Interface Support, and Packetization. TR09 Hybrid Algorithms for Complete Exchange in 2D Meshes by N.S. Sundar, D.N. Jayasimha, D.K. Panda, and P. Sadayappan (No Abstract - this is the first paragraph of the introduction): The all-to-all personalized (complete) exchange is a global collective communication operation in distributed memory multicomputers in which every processor sends a unique block of data to every other processor in the system. This pattern arises in many important problems such as multidimensional FFT, matrix transposition and sorting. It is one of the collective communication primitives defined by the MPI [11] message passing interface standard. Since complete exchange is communication intensive, its effective implementation in wire-sparse topologies like 2D meshes is a challenging problem. TR08 Accurate Methods for the Voxelization of Planar Objects Vassily Filippov and Roni Yagel The process of generating discrete surfaces in volumetric representation, termed voxelization, is confronted with topological considerations as well as accuracy and efficiency requirements. We define the topological notions of separability and minimality and introduce a method for converting planar objects into voxel representation. Through geometric principles, our method provides topological consistency as well as numerical accuracy. TR07 A Medium Access Control Protocol for Voice and Data Integration in Receiver-Oriented DS-CDMA PCNs Yibin Yang, Junfeng He, and Ming T. Liu Primarily used in military communications in the past, code division multiple access (CDMA) is recently found to be attractive for personal communications as well. As a large number of mobile hosts are supported within a cell and a wide range of services are provided, one of the most important issues in a CDMA personal communication network is how to control the uplink access to the shared wireless spectrum. In this paper, we address this issue in a realistic situatio where the receiver-oriented transmission protocol is employed and the packet loss due to multiple access interference (MAI) cannot be ignored. A medium access control protocol for voice and data integration is proposed. It solves the problems of code assignment and MAI control at the same time. A Markov chain model is used to analyze the protocol and the analytical results are shown to be very close to simulations. Based on the modeling, the effectiveness of the protocol's MAI control is demonstrated and some system design issues are investigated. Keywords: code division multiple access, medium access control, personal communication networks, slotted ALOHA, voice and data integration. TR06 Efficient Multicast in IP/ATM Networks Walid Mostafa and Mukesh Singhal Real-time transport protocols rely on IP to disseminate audio/video packets from a sender to one or more receivers. In IP over ATM broadband networks, studies have shown that the effective bandwidth is reduced by at least 10% and that larger protocol payload sizes yield higher throughput. However, a dropped cell corrupts the IP packet it belongs to and reduces the rate successfully-received data, or goodput. This results in quality of service degradation to multimedia traffic, particularly for intra-frame encoded video traffic that is sensitive to a packet loss. A key observation for multimedia applications is that achieving maximum throughput versus reaching optimum goodput is determined by the end-to-end quality of service requirements. In this report, we derive analytic expressions for the maximum throughput and packet-loss probability which provides us with a better understanding of the goodput of real-time transport protocols. We also derive an analytic expression for optimum PDU that maximizes the goodput with respect to cell-loss probability and number of ATM switches. We show that optimum PDUs yield higher goodput than the default MTU for IP over ATM, particularly as the cell-loss probability and/or the number of ATM switches increase. For reliable multicast, we also derive analytic expressions to the expected total number of transmissions per packet and to the goodput of reliable multicast. Keywords: Multimedia Dissemination, Reliable Multicast, Transport PDU, IP over ATM. TR05 Online Smoothing for Delayed Transmission of Live Video Jennifer Rexford and Wu-chi Feng (No real abstract exists for this TR; the following is the 1st paragraph of the Introduction). Many emerging applications rely on the efficient transmission of live or pre-recorded video. Although effective compression techniques can substantially reduce the storage and bandwidth requirements, compressed video streams typically exhibit significant burstiness that complicates the provisioning of resources along the path from the video server to the client site. To reduce the burstiness and the peak bandwidth requirements, a multimedia server can smooth the outgoing stream by transmitting a sequence of video frames at an averaged rate. Recently, several bandwidth smoothing techniques have been introduced that reduce the server and network resource requirements for transmitting pre-recorded video. By capitalizing on prior knowledge of the frame sizes for the entire video, these techniques can smooth traffic on a large time scale by prefetching frames into a buffer in advance of bursts of large frames. These algorithms, given a fixed-sized client buffer, can minimize the peak bandwidth of the video stream, while also optimizing other important performance metrics. TR04 Refining Interactions in a Distributed System (Extended Abstract) Neelam Soundarajan We identify a new type of refinement that we call interaction refinement that is of great value in designing distributed systems. An interaction refinement step is one in which a system designer decides to implement a given 'high-level' interaction as a particular sequence of 'low-level' interactions. For example a particular high-level interaction may be specified as 'identify-client' invloving a 'client process' and a 'server process'. The designer may decide to implement this interaction as a series of low-level interactions between the two processes, with the client process first sending to the server process an id, followed by a password, and perhaps allow the client several attempts at getting the password right, etc. Here the designer is not refining the internal structure of either the client process or the server process; rather she is deciding to implement the 'identify-client' interaction between the two processes in terms of a sequence of low-level communications between them. By contrast, other types of refinement that are normally considered are concerned with refining the internal structure of one or more of the processes, the interactions between the processes being taken as a given. We motivate the idea of interaction refinement with simple and natural examples, present a precise definition of the concept, and develop a preliminary formalism that can be used to establish correctness of systems developed using interaction refinement. We also compare our work with previous work which allow one to deal with restricted kinds of interaction refinement. TR03 On the Specification, Inheritance, and Verification of Synchronization Constraints by Neelam Soundarajan Object-orientation and distributed systems are a natural match. Objects correspond to processes in a distributed program; the invocation of a method of one object by another object corresponds naturally to a message being passed between the corresponding processes in the distributed program. Despite this close correspondence, progress in developing an OO approach to concurrency has been limited. One important problem has been the so-called inheritance anomaly which is concerned with how and how easily synchronization constraints specified in a base class may be modified in a derived class. Our concern in the current paper is slightly different. We are interested in developing ways to abstractly specify these synchronization constraints, and ways to verify them. In other words we are interested in what these synchronization constraints, and ways to verify them. In other words we are interested in what these synchronization constraints do, and this is, of course, the critical question from the point of view of the users of these objects. We use the mechanism of acceptance sets in our specifications. We develop a proof method to verify that (base as well as derived) classes meet their specifications. We also consider the question of what kinds of modifications of synchronization constraints in the derived classes are easy for the clients of the class to deal with. Key phrases: Specification and verification; synchronization constraints; inheritance anomaly. TR02 A Survey of the Use-It-Or-Lose-It Policies for the ABR Service in ATM Networks by Shiv Kalyanaraman, Raj Jain, Rohit Goyal and Sonia Fahmy The Available Bit Rate (ABR) service has been developed to support data applications over Asynchronous Transfer Mode (ATM). The ABR service uses a closed-loop rate-based traffic management framework where the network divides available bandwidth among contending sources. The ATM Forum then worked on incorporating open-loop control capabilities to make the ABR service robust to temporary network failures and source inactivity periods. One of the problems addressed was whether rate allocations of sources should be taken away if sources do not use them. The proposed solutions, popularly known as the Use-It-or-Lose-It (UILI) policies, have had significant impact on the ABR service capabilities. In this paper we survey the design, development, the final shape of these policies and their impact on the ABR service. Keywords: Asynchronous Transfer Mode (ATM), Available Bit Rate (ABR), traffic management, congestion control. TR01 Spatial Aggretation: Modeling and Controlling Physical Fields by Chris Bailey-Kellogg and Feng Zhao Many important physical phenomena, such as temperature distribution, air flow, and acoustic waves, are described as continuous, distributed parameter fields. Controlling and optimizing these physical processes and systems are common design tasks in many scientific and engineering domains. However, the challenges are multifold: distributed fields are conceptually harder to reason about than lumped parameter models; computational methods are prohibitively expensive for complex spatial domains; the underlying physics imposes severe constraints on observability and controllability. This paper develops an ontological abstraction and an aggregation-disaggregation mechanism, in a framework collectively known as spatial aggregation (SA), for reasoning about and synthesizing distributed control schemes for physical fields. The ontological abstraction models physical fields as networks of spatial objects. The aggregation-disaggregation mechanism employs a set of data types and generic operators to find a feasible control structure, specifying control placement and associated actions that satisfy given constraints. SA abstracts common computational patterns of control design and optimization in a small number of operators to support modular programming; it builds concise and articulable structural descriptions for physical fields. We illustrate the use of the SA ontological abstraction and operators in an example of regulating a thermal field in industrial heat treatment. Keywords. Qualitative reasoning; Spatial reasoning; Ontologies; Decentralized control; Distributed AI; Programming languages.