TECHNICAL REPORT SERIES

Computer and Information Science Research Center
Department of Computer and Information Science

The Ohio State University
2015 Neil Avenue,
Columbus, OH 43210-1277

**********************NOTE*********************

Abstracts for 1995 NOTE: some titles may be available ONLY via paper (This is noted after the title/author section.)

If you have trouble accessing a file, please send email to tfletch@cis.ohio-state.edu.

1995

TR55Computer Program Verification: Improvements for Human Reasoning by Wayne David Heym (Ph.D. Dissertation)

To ably create or modify computer programs that behave according to specification, programmers find it necessary to reason about their programs' behavior. We have formalized, in a direct, natural way, the informal pattern of reasoning generally used with programs written in modular, imperative languages. This formal system provides a solid basis against which to check the soundness and (relative) completeness of an informal reasoning method.

Formal proof that a program meets a specification can be done in this new system or in existing systems (e.g., calculating weakest preconditions using Hoare-style axioms or using symbolic execution). Each system prescribes a way to translate the program-specification pair to a mathematical assertion whose truth implies that the program satisfies the specification. Alternative systems are distinguished, however, by how well they fit programmers' informal reasoning methods. Programmers think of the effect that the execution of a given statement will have on variables' values, and they consider what conditions must hold for those values in each branch of the program. The new method is organized accordingly, unlike previous methods, which are organized for the convenience of mathematicians.

TR54Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths by Dhabaleswar K. Panda, Sanjay Singal, and Ram Kesavan

This paper proposes a novel concept of multidestination message passing mechanism for wormhole k-ary n-cube networks. Similar to the familiar car- pool concept, this mechanism allows data to be delivered to or picked-up from multiple nodes with a single message-passing step. Such messages can propagate along any valid path in a wormhole network conforming to the underlying base routing scheme (deterministic or adaptive). The mechanism requires very little modification to the existing routers and can have \ unicast message-passing as a subset operation. A base-routing-conformed- path (BRCP) model to implement this mechanism is presented. The model is illustrated for several commonly-used routing schemes and the associated deadlock-freedom properties are analyzed. Using this model a set of new algorithms are proposed and evaluated to implement broadcasting and multicasting. It is shown that these algorithms can considerably reduce the latency of these collective communication operations compared to the Umesh (unicast-based multicast) [22] and the Hamiltonian-Path-based [20] schemes. A very interesting and counter-intuitive result is presented which shows that a multicast can be implemented with reduced or near-constant latency as the number of processors participating in the multicast increases beyond a certain number. It is also shown that the BRCP model can take advantage of routing-adaptivity to reduce tahe latency of these operations further. These results are the first ones in the wormhole-routing literature to propose broadcasting and multicasting schemes with such reduced latency on networks with deterministic and adaptive routing. The multidestination mechanism and the BRCP model establish a new foundation to provide fast and scalable collective communication support on wormhole-routed systems.

Keywords: Wormhole routing, Collective communication, Path-based Routing, Broadcast, Multicast, k-ary n-cubes, Meshes, Interconnection networks, Deadlock-freedom, and Interprocessor Communication.

TR53Comprehensive Low-overhead Process Recovery Based on Quasi- synchronous Checkpointing by D. Manivannan and M. Singhal

In this paper, we propose a low-overhead recovery algorithm based on a quasi-synchronous checkpointing algorithm. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easeness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The checkpointing algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process, in the event of a failure. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. It uses selective pessimistic message logging at the receiver end to handle the messages lost due to rollback. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints which result in high message overhead during failure-free operation if the number of processes is large.

Keywords: Distributed checkpointing, failure recovery, fault-tolerance.

TR52Hybrid Algorithms for Complete Exchange in 2D Meshes by N. S. Sundar D. N. Jayasimha D. K. Panda P. Sadayappan

Parallel algorithms for several common problems such as sorting and the FFT involve an personalized exchange of data among all the processors. Past approaches to doing this complete exchange have taken either a combining approach or a direct exchange approach. While combining reduces the number of message startups, direct exchange minimizes the volume of data transmitted.

This paper presents a family of hybrid algorithms for wormhole-routed 2D meshes that can effectively utilize the complementary strengths of these two approaches to complete exchange. The performance of hybrid algorithms using Cyclic Exchange and Scott's Direct Exchange are studied using analytical models as well as simulation. The results show that hybrids achieve lower completion times than either pure algorithm for a large range of mesh sizes, data block sizes and message startup costs. The analytical models may be used to select the optimum hybrid for any given combination of system parameters (mesh size, startup time, flit transfer time and barrier cost) and the problem parameter (data block size).

Keywords: complete exchange, mesh interconnection, message combining, direct exchange, hybridizing, modeling, simulation, distributed memory systems.

TR51Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements by Debashis Basak and Dhabaleswar K. Panda

NOTE: This title was used before but this is a new tech report

Pacakging technologies impose various physical constraints on bisection bandwidth and channel width of a system whereas processor and interconnect technologies lead to certain throughput and latency demands on the system performance. Earlier studies in literature have either focused on only flat interconnections or proposed hierarchical/clustered interconnections with limited packaging constraints. Pinout technologies and capacity of packaging modules have been ignored, often leading to configurations which are not design-feasible. In this paper, we propose a new supply-demand framework for multiprocessor system design by considering packaging, processor, and interconnect technologies in an integrated manner. The elegance of this framework lies in its parameterized representation of different technologies. For a given set of technolotical parameters the framework derives the best configuration while considering practicall design aspects like maximum board area, maximum available pinout, fixed channel width, and scalability. In order to build a scalable parallel system with a given number of processors, the framework explores the design space of flat k-ary n-cube topologies and their clustered variations (k-ary n-cube cluster-c) to derive design-feasible configurations with best system performance. The study identifies processor board area, supported channel width, board pinout density, and router pinout as critical parameters and analyzes their impact on deriving design-feasible and best configurations. For a wide range of parameters, it is shown that best configurations are achieved with cluster-based systems with 2-8 processors per cluster and 3D-5D inter-cluster interconnection.

Keywords: Multiprocessor System Design, Scalable Systems, Clustered Architectures, Hierarchical Organization, k-ary n-cube Inter- connection, Packaging Constraints, Parallel Architectures.

TR48Real-Time Communications in FDDI-Based Mobile Networks by Ten-Hwang Lai, Yibin Yang, and Ming-Tsan Liu

With the introduction of a wide variety of portable computing devices recently, many LANs have been extended with wireless infrastructures in efforts to provide them with wireless and/or nomadic access to existing networks. When portable or mobile computing devices are used in real- time computing, a need for mobile real-time communications arises. Given FDDI's popularity as an underlying network for real-time computing, we are motivated to investigate the issue of supporting real-time communications in an FDDI-based mobile network, in which base stations of the wireless infrastructure are connected to an FDDI backbone. In real-time communications, a message sent from a mobile host (MH) to another mobile host (MH) must be delivered in a timely fashion from the source MH to a base station (BS), from the BS to another BS, then from the later BS to the destination MH. We are primarily concerned with the part of communications between two base stations (over the FDDI backbone). We identify a wide range of problems concerning bandwidth allocation, bandwidth management, and QOS guarantee in the face of handoffs. As solutions to these problems, we propose a bandwidth management scheme that allocates synchronous bandwith to arriving calls; a handoff control protocol that makes handoffs transparent to mobile users; an overhead bandwidth management scheme that allocates overhead bandwidth to handoffs; and a strategy to minimize the needed amount of overhead bandwidth. The proposed solutions are compatible with the FDDI standards.

Keywords: real-time communications, mobile networks, FDDI, handoff, network protocols, bandwidth allocation, bandwidth management.

TR47Randomized Quick Hull by R. Wenger

This paper contains a simple, randomized algorithm for constructing the convex hull of a set of n points in the plane with expected running time O(n log h) where h is the number of points on the convex hull.

TR46Distributed Dynamic Fault-Tolerant Channel Allocation for Mobile Computing by Ravi Prakash, Niran Shivaratri, and Mukesh Singhal

Efficient allocation of communication channels is critical for the performance of wireless mobile computing systems. The centralized allocation algorithms proposed in literature are neither robust, nor scalable. Distributed channel allocation schemes proposed in the past are complicated and require active participation of the mobile nodes. These algorithms are unable to dynamically adjust to spatial and temporal fluctuations in channel demand (load). We present a distributed dynamic channel allocation algorithm in which heavily loaded regions are assigned a large number of communication channels, while their lightly loaded neighbors are assigned fewer channels. As the spatial distribution of channel demand changes with time, the spatial distribution of allocated channels adjusts accordingly. The algorithm described in this paper requires minimal involvement of the mobile nodes, thus conserving their limited energy supply. The algorithm is proved to be deadlock free, starvation free and fair. It prevents co-channel interference and can tolerate the failure of mobile as well as static nodes without any significant degradation in service. The algorithm is scalable and imposes low computation and communication overheads, as demonstrated by simulation experiments.

Key words: channel allocation, mobile computing, cellular communication.

TR45 In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easeness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during check- pointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any praocess all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronous for single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation.

Keywords: Distributed checkpointing, failure recovery, fault-tolerance.

TR44An Efficient Causal Ordering Algorithm for Mobile Computing Environments by Ravi Prakash, Michel Raynal, and Mukesh Singhal

Causal message ordering is required for several distributed applications. In order to preserve causal ordering, only direct dependency information between messages, with respect to the destination process(es), should be sent with each message. By eliminating other kinds of control information from the messages, the communication overheads can be significantly reduced. In this paper we present an algorithm that uses this knowledge to efficiently enforce causal ordering of messages. The proposed algorithm does not require any prior knowledge of the network or communication topology. As computation proceeds, it acquires knowledge of the logical communication topology and is capable of handling dynamically changing multicast communication gropus. With regard to communication overheads, the algorithm is optimal for the broadcast communication case. Its energy efficiency and low bandwidth requirement make it suitable for mobile computing systems. We present a strategy that employs the algorithm for causally ordered multicasting of messages in mobile computing environments.

Key words: Causal message ordering, direct dependency, mobile computing.

TR43A Quasi-synchronous Algorithm for Checkpointing in Distributed Systems by D. Manivannan and M. Singhal

In this paper, we propose a quasi-synchronous algorithm for checkpointing in distributed systems. This algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication- induced checkpoint coordination for the progression of the recovery line which helps in bounding roll-back propagation during a recovery. Thus, it has the easeness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. This property of the algorithm is a highly desirable feature for rollback recovery because a failed process needs only rollback to its latest checkpoint and request other processes to rollback to a consistent checkpoint, thus avoiding domino effect completely. It also helps in garbage collection because after a process has established a recovery line consistent with its latest checkpoint, all the processes can delete the checkpoints that precede the recovery line. Multiple processes can establish recovery lines consistent with their latest checkpoints simultaneously without intruding other processes.

Keywords: distributed checkpointing, global snapshot collection, failure recovery, fault-tolerance.

TR42Modeling and Analysis of Channel Transferability in Mobile Computing Environments by Ravi Prakash and Mukesh Singhal

Mobile computers use wireless channels to communicate with other computers. Efficient channel allocation is at the heart of an efficient mobile computing system. The finite number of channels should be efficiently allocated to maximize throughput and to avoid co-channel interference. Temporal variations in channel demand (load) require channel allocation to adapt dynamially to the changing demand. The presence of load imbalance does not automatically imply the usefulness of channel transfer. In this paper we present a probabilistic analysis of the temporal imbalance in channel demand. The effectiveness of channel transfer depends on the duration for which the imbalance persists. This duration is influenced in part by the velocity of mobile units. The wide variance in velocities of the mobile units can be handled by using a hierarchical cellular layout. Bulk channel transfers can be employed to reduce the overheads of channel transfer and to enable the system to quickly adjust to spatial variations in channel demand. The results of our analysis provide guidelines for the design of dynamic distributed channel allocation strategies.

Keywords: mobile computing, channel allocation, load sharing, transfer potential, transfer window, bulk channel transfer, performance modeling.

TR41Benefits of Processor Clustering in Designing Large Parallel Systems: When and How? by D. Basak, D.K. Panda, and M. Banikazemi

Scaling the size of parallel systems while maintaining the system performance is an important problem. A default scaling approach by using larger networks to interconnect more processors works only up to a limited extent and the derived configurations using this approach are not cost- effective. Recent advents in VLSI and packaging technologies now offer multiple processors on a single multi-chip module. This offers potential to design larger systems by replacing each processor in a base system by a processor-cluster. Similarly, advents in the interconnection network technology now provide the ability to use wider and faster network channels. In this paper we analyze the scaling of systems by taking into account current interconnection and processor-integration technologies. The impact of software overhead to send and receive messages in parallel system on its scalibility is also demonstrated. Our results indicate that systems with high message overhead costs can be scaled in a cost-effective manner by employing larger clusters as computing nodes. This can be achieved without changing the network until the messaging costs are balanced. A system with balanced messaging costs can be further scaled in a more cost-effective manner by using larger clusters and wider network channels.

Keywords: parallel architectures, interconnection networks, clustered architectures, hierarchical architectures, scalability, messaging overheads.

TR40Search in Temporal Domains by Kikuo Fujimura

Best-first search algorithms have been widely used to find a minimum cost path in graph search. To formulate certain problems involving temporal events, it is at times instrumental to use graphs whose edge costs are time- dependent. In such a graph, shortest paths are dependent on time at which traversal in the graph begins. Typically, structural changes in pi sub t occur at discrete points in t, where pi sub t is the shortest path with start time t. In this paper, a best-first algorithm is generalized to handle time-dependent graphs to find shortest paths together with their start time characteristics and some properties of the algorithm are investigated.

Keywords: search algorithms, best-first search, time-dependent graphs.

TR39Using Abstraction Relations to Verify Abstract Data Type Representations by Murali Sitaraman, Bruce W. Weide, and William F. Ogden

In a typical data abstraction situation, the correspondence between the concrete representation and the abstract conceptual value of an ADT variable (object) is a many-to-one function. For example, many different pointer aggregates give rise to exactly the same binary tree. The theoretical possibility that this correspondence generally should be relational has long been recognized, but convincing examples have not been forthcoming. We show how such examples arise naturally in optimization problems.

Keywords: abstract data type, abstraction function, abstraction mapping, abstraction relation, data abstraction, formal specification, program verification, non-determinism, optimization problem, relation.

TR38Representation INheritance: A Safe Form of "White Box" Code Inheritance by Stephen H. Edwards

Inheritance as a programming language mechanism can be used to achieve several different goals, both in terms of expressing relationships between components and in terms of defining new components "by difference" from existing ones. For defining new component implementations in terms of existing implementations, there are several approaches to using "code inheritance." Black box code inheritance allows subclasses to reuse superclass implementations as-is, without direct access to their internals. Alternatively, white box code inheritance allows subclasses to have direct access to superclass implementation details, which may be necessary for the efficiency of some subclass operations.

Unfortunately, white box code inheritance violates the encapsulation protection afforded to superclasses, opening up the possibility for subclasses to interfere with the correct operation of superclass methods. Representation inheritance is proposed as a restricted form of white box code inheritance where subclasses have direct access to superclass implementation details, but are required to respect the representation invariant(s) and abstraction function(s) of their ancestor(s). This preserves the protection that encapsulation would have provided, while allowing the freedom of access that component implementers sometimes desire.

Keywords: inheritance, object oriented, representation invariant, abstraction function, reuse.

TR37Characterizing Observability and Controllability of Software Components by Bruce W. Weide, Stephen H. Edwards, Wayne D. Heym, Timothy J. Long, and William F. Ogden

Two important objectives when designing a specification for a reusable software component are understandability and utility. For a typical component defining a new abstract data type, a significant common factor affecting both of these objectives is the choice of a mathematical model of the (state space of the) ADT, which is used to explain the behavior of the ADT's operations to potential clients. There are subtle connections between the expressiveness of this mathematical model and the functions computable using the operations provided with the ADT, giving rise to interesting issues involving "controllability". Previously we recommended a practical way to test compliance of a proposed design with these informally-defined principles: it should be possible to construct layered implementations of operations to test equality of and to copy variables of an ADT. This paper discusses problems associated with formalizing intuitively-stated observability and controllability principles in accordance with these tests. Although the example we use for illustration is simple, the analysis has implications for the design of reusable software components of every scale and conceptual complexity.

Keywords: abstraction, controllability, design for reuse, formal specification, full abstraction, model-based specification, observability, reuse, software component.

TR36Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-cube Systems by Debashis Basak and Dhabaleswar K. Panda

This paper identifies performance degradation in wormhole routed k-ary n-cube networks due to limited number of router-to-processor consumption channels at each node. Many recent research in wormhole routing have advocated the advantages of adaptive routing and virtual channel flow control schemes to deliver better network performance. However, most of these results are based on infinite message consumption capacity leading to unrealistic design guidelines. This paper indicates that the advantages associated with these schemes can not be realized with limited consumption capacity. To alleviate such performance bottleneck, a new solution using multiple consumption channels is proposed. It is shown that wormhole networks with higher routing adaptivity, dimensionality, degree of hot-spot traffic, and number of virtual networks have to take advantage of multiple consumption channels to deliver better performance. The interplay between system topology, routing algorithm, number of virtual channels/networks, memory bandwidth, physical link bandwidth, and communication traffic is studied to derive an upper bound as well as the optimal number of consumption channels required for a system. Using the on-going technological trend, it is shown that wormhole-routed systems can use up to 2-6 consumption channels per node to deliver better system performance.

Keywords: Wormhole Routing, k-ary n-cube, Consumption Channel, Virtual Channel, Deterministic Routing, Adaptive Routing, Hot-spot Traffic, Performance Evaluation, and Interprocessor Communication.

TR35Priority-Driven Ray Tracing by Roni Yagel and John Meeker

A typical ray tracing algorithm traces a ray through each screen-pixel and spawns secondary rays at ray-object intersection points. Unlike traditional ray tracers which follow these rays recursively, we assign a priority value to each newly spawned ray and insert it into a priority queue. The priority assigned to each ray can be based on a variety of criteria, some of which we explore here. The next ray we trace is always the one with the highest priority in the queue. Occasionally, we trigger display updates when a checkpoint or predefined threshold is reached, providing intermediate images for review and evaluation. Classical ray tracers, once given the rendering specifications, are not controllable by the user. The priority-driven ray tracing, on the other hand, provides the user with a mechanism to steer rendering and deliver intermediate images amid processing. This paper describes the ilumination model of the non- recursive priority-driven ray tracer and evaluates its memory and time requirements. We show that although worst-case memory requirements can be overwhelming, in practice, our method is both useful and feasible.

TR34Helly-Type Theorems and Geometric Transversals by Rephael Wenger

A geometric transversal is an affine subspace of R^d, such as a point, line, plane or hyperplane, which intersects every member of a family of convex sets. Eduard Helly's celebrated theorem gives conditions for the members of a family of convex sets to have a point in common, i.e., a point transversal. In Section 1 we highlight some of the more notable theorems related to Helly's Theorem and point transversals. Section 2 is devoted to geometric transversal theory.

TR33Necessary and Sufficient Conditions on Information for Causal Message Ordering and Their Optimal Implementation by Ajay D. Kshemkalyani and Mukesh Singhal

This paper formulates invariants that represent necessary and sufficient conditions on the information required for enforcing causal ordering. The paper then presents an optimal algorithm for enforcing causal message ordering. The algorithm works with non-FIFO channels and allows a process to multicast to arbitrary and dynamically changing process groups. We show that the algorithm satisfies the invariants and is optimal both in space complexity of message overheads and in space complexity of message logs. In general, the space complexity of causal message ordering is shown to be O(n^2) where n is the number of nodes in the system. The algorithm achieves optimality by transmitting the bare minimum causal dependency information specified by the necessity condition, and using an encoding scheme to represent and transmit this information.

Key Words: Causal message ordering, distributed systems, optimal synchronization, concurrency.

TR32Predicting and Estimating Job Execution Times in Computing Systems Using Survival Analysis by M.G. Sriram and Mukesh Singhal

Accurate and reliable estimation of job execution times is an important requirement for the efficient performance of advanced operating systems. In this paper we show how estimation techniques from Survival Analysis, a subfield of biostatistics, can be used to obtain accurate, detailed knowledge of program execution times from empirical observations. A framework for the implementation of survival analytic techniques in computing systems is described. Two possible architectures for accomplishing this task are presented and their merits discussed. One important advantage of the proposed approach is that many survival analytic estimators of job execution times are optimal according to statistical criteria. Another benefit is that the responsibility of estimating program execution times is removed from programmers.

TR31A Stabilized Matrix Sign Function Algorithm for Solving Algebraic Riccati Equations by Judith D. Gardiner Because of its suitability for parallel computation, the matrix sign function is a popular algorithm for solving the algebraic Riccati equation. The algorithm can be numerically unstable, however, even when combined with iterative refinement. This paper presents an enhanced algorithm which is shown to be backward stable under most circumstances. The method uses a combination of iterative refinement, scaling, and shifting, together with carefully chosen stopping criteria. An error analysis supports the algorithm modifications. Computational examples demonstrate the effectiveness of these techniques.

Key Words: Computational methods for control, parallel algorithms, algebraic Riccati equations, matrix sign function.

TR30Incremental Learning of Complex Temporal Patterns by DeLiang Wang and Budi Yuwono

A neural model for temporal pattern generation is used and analyzed for training with multiple complex sequences in a sequential manner. The network exhibits some degree of interference when new sequences are acquired. It is proven that the model is capable of incrementally learning a finite number of complex sequences. The model is then evaluated with a large set of highly correlated sequences. While the number of intact sequences increases linearly with the number of previously acquired sequences, the amount of retraining due to interference appears to be independent of the size of existing memory. The model is extended to include a chunking network which detects repeated subsequences between and within sequences. The chunking mechanism substantially reduces the amount of retraining in sequential training. Thus, the network investigated here constitutes an effective sequential memory. Various aspects of such a memory are discussed at the end of the paper.

TR29Constructing Piecewise Linear Homeomorphisms of Simple Polygons by Himanshu Gupta and Rephael Wenger

Let P and Q be simple polygons with vertex sets {p1, ... , pn} and {q1, ... , qn}, respectively. We present an algorithm to construct a piecewise linear homeomorphism between P and Q ....... (the remainder of the abstract cannot be reproduced here).

TR28Visibility Computation Reconfigurable Meshes by Kikuo Fujimura

Visibility problems are investigated using reconfigurable meshes. A set of algorithms are proposed on the architecture for visibility computation in two and three dimensions. We show that visibility of a total of n disjoint edges in the plane can be computed in O(1) time on an n x n mesh. The result is optimal in the word model of VLSI. For the case that the edges are not disjoint, the problem is shown to be solvable in O(1) time by using a mesh of slightly larger size or in slightly more time on an n x n mesh. We also present hidden line and surface elimination algorithms that run on an n x n x n mesh for a set of disjoint triangles in 3-space containing a total of n vertices in O(1) time and O(k) time, respectively, where 0 is less than or equal to k which is less than n is an input-dependent constant.

Keywords: visibility, hidden line and surface elmination, reconfigurable meshes, parallel algorithms.

TR27Shape Transformation in Space-Time by Jinghuan Lu and Kikuo Fujimura

A method is presented for shape transformation in the plane. The primary strength of the proposed method over previous approaches for shape transformation is that our method can achieve shape transformation in a domain that can contain obstacles which are possibly in motion. A smooth transformation between two given polygonal shapes is to be attained while ensuring that any part of the shape is free of collision with the obstacles in the domain during the transformation process. The problem is formulated as minimization of a cost function associated with each transformation path that controls length, smoothness, and collision-freedom of the path. The problem is solved in three steps. First, a population of tentative transformation paths is generated by combining path planning for an anchor point in the shape and a randomization process. Second, collision detection and shape deformation techniques are applied to keyframes along the paths to minimize the occurrence of collision. Lastly, a genetic algorithm is used to refine the population of transformation paths. Experimental results are presented to demonstrate the feasibility of our approach.

Keywords: shape transformation, deformation, animation.

TR26A Timing-based Schema for Stabilizing Information Exchange by Anish Arora and David M. Poduska

The paradigm of information exchange provides a basis for nodes in a network to stay uptodate with the recent information in the network. In this paradigm, nodes cooperate to share their current information with each other. We present a simple and uniform schema for building information exchange protocols that are stabilizing, in the following strong sense. Starting from arbitrary state, the protocols reach within bounded real-time a state wherefrom all nodes remain uptodate with the recent information in the network.

The ability to stabilize in bounded time is achieved by using timing-based actions. The timing constraints on these actions can be systematically adapted to suit a variety of network loads, delay requirements, and scheduling restrictions and to tolerate out-of-phase and drift-prone node clocks. Our schema also tolerates any number of topological changes in the network. Moreover, it accommodates information that is time-varying as well as it does information that is fixed. It is thus well-suited to dynamic high speed networks.

Keywords: stabilization, timing-based protocols, fault-tolerance, adaptivity, high speed networks, formal methods, verification

TR25Issues in Designing Efficient and Practical Algorithms for Collective Communication on Wormhole-Routed Systems by Dhabaleswar K. Panda

The significance of collective communication operations for scalable parallel systems has been well emphasized by the recent Message Passing Interface (MPI) standard [20]. A large number of algorithms have been proposed in recent past to support these operations on wormhole-routed systems. However, some algorithms lack in specifying their exact scope. Similarly, some algorithms do not fully take into account the architectural characteristics of underlying wormhole-routed architecture in performance evaluation. This makes it difficult to compare different algorithms and select the best ones for practical implementation. In this paper, we emphasize on such issues. An overview of collective communication operations is provided. For basic communication types are identified to be essential to implement these operations efficiently on any system. A representaative set of algorithms for frequently-used collective communication operations are analyzed to reflect on how to make these algorithms practical from application, algorithm, and system perspective.

Keywords: Collective Communication, Wormhole Routing, Broadcasting, Multicasting, Barrier Synchronization, Reduction, All-to-All Exchange, Meshes, k-ary n-cubes, and Path-based Routing.

TR24Time and Money: A Case Study in Systematic Development of Constraint Logic Programs by Spiro Michaylov and Iva'n Ordo'n~ez

The utility of Constraint Logic Programming (CLP) for developing complex and flexible software has been well established. However, on realizing the full power of the paradigm, programmers can find themselves coding some remarkably complicated models, producing programs that are compact, powerful, but difficult to understand and modify. The Skeletons and Techniques discipline for developing Prolog programs has been seen to be most helpful in such situations. This discipline has already been shown to be usable for CLP.

Here we present the first case study in developing a substantial CLP program, a financial modeling package in \CLPR{}, using an extended method based on Skeletons and Techniques. We develop a core set of simple programs that identify the essence of the required computations, and combine these at the source level to develop a complex and powerful program that is far easier to understand in terms of its component skeletons, techniques and compositions than by examining the resulting program source.

TR23Efficient Extraction of Imperative Computation in Constraint Logic Programs by Christopher J. Bailey-Kellogg and Spiro Michaylov

In most Constraint Logic Programming (CLP) languages, procedures can be transformed to improve the efficiency of constraint solving for a particular set of calling patterns. In particular, it is often possible to take advantage of groundness information to replace a considerable amount of constraint solving with ground imperative computation. We present an efficient algorithm for identifying the specializations of a procedure that allow such optimization. This algorithm generates efficiently a concise representation of information flow in a procedure. This representation can be used to produce a set of calling patterns for which specialization is likely to be fruitful, together with the suitably transformed procedure for each specialization.

TR22Spatial Aggregate: Theory and Application to Qualitative Physics by Kenneth Yip and Feng Zhao

Effective reasoning about a physical system requires an appropriate mapping from the system characteristics to abstractions that match the impedance of the task at hand. In Qualitative Physics, three ontological abstractions are widely used: device, process, and constraint. We present a framework and a new ontological abstraction, the field ontology, to unify many reasoning tasks involving image-like analogue representations such as the velocity field for fluid motion, phase space for dynamical systems, and configuration space for mechanism analysis.

A field is defined as a mapping from one continuum (say R^m) to another (say R^n). A field is typically information-rich. To transform the information- rich representation into symbolic descriptions well suited for explaining the structure and behavior of the system, we propose a theory of multi-layer spatial abstractions. Abstraction in each layer is represented by a neighbor- hood graph whose nodes are objects and edges are adjacency relations. The multi-layer theory has two advantages: (1) A nonlocal property of a lower layer can be redescribed as a local property of a higher layer, and (2) On each layer the neighborhood graph provides a common interface to support identical modular computations.

To illustrate our theory, we examine the computational structure of three implemented programs - KAM, MAPS, and J&S - that integrate symbolic, numerical and visual reasoning. We show a small set of generic operators that construct, transform, filter, classify, and search neighborhood graphs capture the commonalities of these programs. We develop a language, a way of organizing programs around neighborhood graphs, to make programs written in this style clear.

TR20Real-Time Communications in Wireless FDDI Networks Yibin Yang, Ten-Hwang Lai, and Ming-Tsan Liu

In this paper, we propose a new architecture for wireless FDDI networks and address the issue of providing a real-time communications service in this kind of wireless networks. A wide range of problems concerning synchronous bandwidth management and quality of service guarantee are identified. To solve these problems, we present a dynamic bandwidth management scheme, a source handoff protocol and two approaches to handling destination handoffs. They are able to make handoffs transparent to mobile users so that no quality of service degradation will be observed during handoffs. In the meantime, they also keep the overhead incurred by handoffs small. These solutions are fully compatible with the FDDI standards so that the installation of the wireless base stations implementing them does not require any modification to the existing stations on an FDDI network.

Key Words. FDDI Networks, Handoff Protocol, Real-Time Communications, Synchronous Bandwidth Management, Wireless/Mobile Networks

TR19Distributed Dynamic Channel Allocation for Mobile Computing: Lessons from Load Sharing in Distributed Systems Ravi Prakash and Mukesh Singhal

Mobile computers use wireless channels to communicate with other computers. Efficient channel allocation is at the heart of the design of an efficient mobile computing system. The finite number of channels should be efficiently allocated to maximize throughput and avoid co-channel interference. Temporal vaxiations in channel demand require channel allocation to adapt dynamically to the changing demand. In this paper we present a probabilistic analysis of the temporal imbalance in channel demand. The results provide guidelines for the design of dynamic distributed channel allocation strategies. The analysis indicates that the presence of load imbalance does not necessarily imply the possibility for channel transfer. Also, the effectiveness of channel transfer depends on the duration for which the imbalance persists. This duration is influenced in part by the velocity of mobile units. The wide variance in velocities of the mobile units can be handled by using a hierarchical cellular layout. Bulk channel transfers can be employed to reduce the overheads of channel transfer and to enable the system to quickly adjust to spatial variations in channel demand.

TR18A LA-COMA Implmentation of Parallel Volume Rendering Asish Law and Roni Yagel

Object dataflow is a popular approach often usedfor parallel rendering. The scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous object dataflow methods were implemented on shared memory architectures, and exploit imagelobject space coherency to reduce cache misses. In this paper, we propose an efficient object dataflow incremental rotation system on distributed memory machines. It uses a distributed-directory scheme to trace the location of objects at other nodes. The objects migrate and replicate at different processors as in Cache-Only Memory Architectures (COMA). During the animation process, the processors predict and prefetch the data that will be needed for subsequent frame generations, thus employing a look-ahead (LA) data acquisition to hide the latency of communication. Load balancing, minimizing network congestion, and optimal algorithm embedding are some of the other issues considered in the design process. The results on the Cray T3D show good load balancing and significant speedup.

TR17Spatial Domain Characterization and Control of Reconstruction Errors Raghu Machiraju, Edward Swan, and Roni Yagel

Reconstruction is imperative whenever an image or a volume needs to be resampled as a result of an affine or perspective transformation, texture mapping, or volume rendering. We present a new method for the characteriza- tion and measurement of reconstruction error. Our method, based on spatial domain error analysis, uses approximation theory to develop error bounds. We provide, for the first time, an efficient way to guarantee an error bound at every point by varying the filter size. We go further to support position- adaptive and data-adaptive reconstruction which adjust filter size to the location of reconstruction and the data in its vicinity. We demonstrate the effectiveness of our methods with suitable 2D and 3D examples.

TR16Accuracy Control of Reconstruction Errors in Volume Slicing Raghu Machiraju and Roni Yagel

Slicing operation is a most powerful and useful tool for the exploration of volume data. In the core of the slicing algorithm is a resampling process that produces a set of new samples constituting the new slice. Reconstruction is imperative whenever an image or a volume needs to be resampled as a result of an affine or perspective transformation, texture mapping, or volume slicing. Traditionally, this reconstruction is performed by using trilinear or cubic interpolation. However these methods suffer from significant blurring in the resulting images and have no way to guarantee some error level at every pixel. We employ a new method for the characterization and measurement of reconstruction error that is based on spatial domain error analysis. We provide an efficient way to guarantee an error bound at every point by varying the filter size. Our approach supports also position- adaptive and data-adaptive reconstruction which adjust filter size to the location of reconstruction and the data in its vicinity.

TR15 Performance Study of Buffering within Switches in Local Area Networks by Amr Elsaadany, Mukesh Singhal, and Ming T. Liu PAPER COPY ONLY

Today's demanding applications, such as multimedia, require higher network transfer rates. Traditional Local Area Networks (LANs) will not be able to provide the throughput required by these applications. The use of switches in LANs is an effective technique to increase the throughput of the network. These switches have finite buffers at thier input and output ports. The size of these buffers affect the packet loss rate. It also affects the delay at the switch as packets may have to wait for the ouptut buffer to become available. In this paper we study the effect of buffer sizes within these switches. We show how the buffer size is related to the performance of the switch as well as overall performance of hte LAN.

Key words: local area network, switching hub, LAN protocol, LAN performance, Ethernet, multimedia.

TR14-DIR A Formal Model of Software Subsystems by Stephen Edwards (Ph.D. Dissertation)

People form internal mental models of the things they interact with in order to understand those interactions. This psychological insight has been used by the human-computer interaction (HCI) community to build computer systems that are more intuitive for end users, but it has not yet been applied to the problems of software designers, programmers, and maintainers. Infact, conventional programming languages do little to help client programmers develop good mental models of software subsystems.

The main problem is that the conventional wisdom that software modules are merely a syntactic mechanism for organizing declarations and controlling visibility is wrong. This skewed view of the nature of software modules limits their utility as effective building-blocks for creating large, complex software systems that are comprehensible to human readers. To constitute effective building-blocks, modules must present simple mental models to the software professionals involved in assembling them into larger module, and ultimately into complete software systems. Because a module has no real semantic denotation of its own, there is no way for one to imagine such building-blocks as contributing to the understandability of the software comprising them.

This dissertation presents the basic concepts of the ACTI model (Abstract and Concrete Templates and Instances), a theoretical model of software structure and meaning that addresses these concerns formally. ACTI supports the assignment of useful semantic denotations to software subsystems (building-blocks). Further, these assigned semantics can present models of each subsystem that are much simpler to understand than a straightforward composition of the lower-level parts from which the subsystem is construed.

In addition, ACTI is a mathematically formal, language-independent model of software components that captures a unifying conceptual view of software architecture embedded in modern module-structured languages. Here, the term "software component" refers to a reusable software subsystem, where a subsystem can vary in grain size from a single module up to a large scale generic architecture. It also unifies the module-structured and object- oriented views of what a osftware component is, and usccessfully integrates the key features of inheritance with parameterized programming.

TR13Logical Time: A Way to Capture Causality in Distributed Systems by Michel Raynal and Mukesh Singhal

The concept of causality between events is fundamental to the design and analysis of parallel and distributed computing and operating systems. Usually causality is tracked using physical time, but in distributed systems setting, there is no built-in physical time and it is only possible to realize an approximation of it. As asynchronous distributed computations make progress in spurts, it turns out that the logical time, which advances in jumps, is sufficient to capture the fundamental monotonicity property associated with causality in distributed systems. This paper reviews three ways to define logical time (e.g., scalar time, vector, time and matrix time) that have been proposed to capture causality between events of a distributed computation.

Key words: Distributed systems, causality, logical time, happens before, scalar time, vector time, matrix time.

TR12Efficient and Accurate Implementation of the Algebraic Reconstruction Technique (ART) by Klaus Mueller, Roni Yagel and J. Fredrick Cornhill

The Algebraic Reconstruction Technique (ART) is used to iteratively reconstruct an object from its projections. Reconstructions obtained with ART often suffer from salt-and-pepper noise. Existing methods to reduce this noise are insufficient and increase computation time. We demonstrate that the noise is mostly due to aliasing introduced in the reconstruction procedure and consequently outline and implement appropriate measures to reduce this aliasing. Since anti-aliasing is computationally expensive within the current ART framework, we devise an alternative, more efficient implementation of ART. This approach significiantly lowers the overall computational effort of ART since most of the procedure simplifies to incremental table lookups. Reconstructions obtained with our method exhibit significantly improved quality and compute considerably faster than reconstructions obtained with previous ART implementations. Our approach also offers an efficient way to accurately implement the coefficients in the linear equation system at the core of ART. Current implementations generally have only used approximations, due to computational constraints. Finally, anti-aliased ART allows the number of projection images required for reconstruction to be reduced to 40 or less while still maintaining satisfactory reconstruction image quality.

Key Words: Algebraic Reconstruction Technique, ART, 3D-Reconstruction, Computed Tomography, Anti-aliasing.

TR11CellFlow: A Parallel Rendering Scheme for Distributed Memory Architectures by Asish Law and Roni Yagel

We present the CellFlow method for object space subdivision that exploits frame coherency to implement a look-ahead scheme of object dataflow. The implementation of this scheme exploits the communication features of modern scalable multicomputers to achieve near linear speedup by means of latency hiding. We demonstrate the performance of our approach in the field of of volume rendering by implementing incremental rotation of the volumetric object about its center. The simplicity of the algorithm, its optimal embedding in popular network topologies, and minimal congestion-free communication among processors are its main advantages. Results are shown for implementation on the Cray T3D, a distributed memory 3D torus architecture. Computation and communication load balancing issues among processors are also addressed.

TR07An Asymptotically Optimal Minimum Degree Ordering of Regular Grids by B. Kumar, P. Sadayappan, and C.-H. Huang

It has previously been shown that there exists a minimum degree ordering for regular grids that is considerably worse than nested dissection in terms of fill-in and operations for factorization [1]. This paper proves the existence of a minimum degree ordering for regular grids that has the same optimal asymptotic order complexity for fill-in and operation count as nested dissection. The analysis is verified by showing exact match between analytical prediction and experimental measurement. The analysis motivates a peripheral preordering strategy for use with the popular multiple minimum degree (MMD) algorithm, and is shown to consistently reduce fill-in and operation count for regular grids.

Keywords: Sparse Matrices, Finite Element Grids, Minimum Degree Ordering, Computational Complexity.

TR06Characterizing and Evaluating Performance Tradeoffs in Causal Multicasting in ATM Networks by Frank Adelstein and Mukesh Singhal

Because of the high-speed and QOS guarantees, ATM entworks are getting popular in multimedia applications. There are multimedia applications that require messages to be delivered in an order that preserves cause and effect relations in multicasts (called causal order multicast). Causal order multicast algorithms introduce an overhead because they append control information to messages to enforce the causal order. In this paper, we present a framework for evaluating causal multicast algorithms. We present a causal multicast algorithm called the partition method that supports multicasting and has a low overhead. We compare and analyze its cost with the cost of two existing algorithms on ATM networks, called the causal order multicast and delta causal order multicast methods. This analysis is carried out for four different sample network topologies to represent different configurations.

Keywords: ATM, multimedia, causal message order, multicast.

TR05Temporal Analysis of Load Imbalance in Distributed Computing Systems by M.G. Sriram and Mukesh Singhal.

Distributed computing systems consist of computers interconnected by communications links. In such systems, statistical fluctuations in job arrival and service patterns cause episodes of load imbalance during which some computers are lightly loaded while others are simultaneously overloaded. Load sharing is the process of transferring jobs from overloaded to under- loaded computers to improve overall system performance. Load sharing is implemented by means of load sharing algorithms. However, since little has been known about the temporal characteristics of load imbalance, various pitfalls arise in the operation of load sharing algorithms which prevent full realization of the potential benefits of load sharing.

In this paper, we present a stochastic analysis of time durations of load imbalance. The notions of Transfer Pair, and Load Sharing Window are intro- duced. Using first passage times, a general expression for the probability distribution function of the Load Sharing Window is derived. A class of rules, called quantile rules, is introduced. Their role in avoiding unproductive job redistribution as well as to make bulk job transfers, is explained. The general technique is applied to the specific case of a distributed computing system consisting of M/M/1 queues. For this case, an expression for the mean of the Load Sharing Window is derived. Numerical computations, in the form of graphs, are presented, and their significance discussed.

Key Words: Queues, Load Imbalance, Load Sharing Window, Transfer Pair, Quantiles, Bulk Job Transfer.

TR04Distributed Dynamic Channel Allocation for Mobile Computing by Ravi Prakash, Niranjan G. Shivaratri, & Mukesh Singhal

Mobile computing has found increased applications and gained importance in recent years. Mobile computing makes use of cellular/wireless communication networks to provide communication among stationary and mobile hosts. In such environments, efficient allocation of wireless channels for communication sessions is of vital importance as the bandwidth alloted for cellular communication is limited and small.

Cellular communication networks divide the geographical area they serve into smaller regions, called cells. Each cell has a base station, also referred to as the mobile service station (MSS). The mobile service stations are connected to each other by a fixed wire network. To establish a communication session/place a call, a mobile host (MH) has to send a request to the MSS of the cell in which it is present. The call can be supported if a wireless channel can be allocated for communication between the mobile host and the mobile service station. If a particular wireless channel is used concurrently by more than one call originating in a cell, or in neighboring cells, the calls will interfere with each other. Such an interference is called co-channel interference. However, the same wireless channel can be used to support calls in geographically separated cells such that their signals do not interfere with each other. This is known as frequency reuse.

FOR MORE, see 1995/TR04.ps.gz

TR03Error-Bounded and Adaptive Image Reconstruction by Raghu Machiraju, Edward Swan, and Roni Yagel Reconstruction is imperative whenever an image needs to be resampled as a result of transformation such as an affine or perspective transform, or texture mapping. We present a new method for the characterization and measurement of reconstruction error. Our method, based on spatial domain error analysis, uses approximation theory to develop error bounds. We provide, for the first time, an efficient way to guarantee an error bound at every point by varying filter size. We go further to support position- adaptive and data-adaptive reconstruction which adjust filter size to the location of reconstruction and the data in its vicinity. We demonstrate the effectiveness of our methods with 1D and 2D examples. TR02Space Deformation using Ray Deflectors by Yair Kurzion and Roni Yagel In this paper we introduce a new approach to the deformation of surface and raster models in two and three dimensions. Rathern than deforming the objects in the model, we deform the rays used to render the scene. The mechanism to specify the deformation, which we call a deflector, is a vector of gravity positioned in space. This gravity vector bends any ray that travels through its field of gravity. Images generated by these curved rays give the impression of a deformed space. Unlike previous methods that deform all the objects in the scene, our approach deforms only those parts of the model that contribute to the final image. In addition, using deflectors, our approach can deform any object type that can be rendered by a ray casting algorithm, providing a unified solution to space deformation.

TR01A Consensus-Based Approach to Implementing Semaphores in a Distributed Environment by Mahendra Ramachandran and Mukesh Singhal

Semaphores have been used for synchronization in both uniprocessor and shared memory multi-processor systems. However, in distributed systems, semaphores have not received much attention. This has been partly due to lack of shared memory in such systems. It is nonetheless desirable to support such a mechanism in distributed systems. Semaphores are a general purpose mechanism with which a variety of synchronization problems can be addressed. Lately, many research projects have focused on providing a shared memory programming model on distributed systems (DSM systems.) This makes it more appealing to provide efficient solutions for supporting the semaphore mechanism in a distributed system.

In this paper we present a distributed semaphore mechanism. The nodes in the system maintain their own copy of a semaphore. A P operation succeeds when a node performing it succeeds in performing it by consensus on all the nodes in the system. The P operation requests are time-stamped in order to break ties among concurrent requests from different nodes. We prove the correctness of the algorithm, propose some alternatives to improve the performance of the base algorithm and demonstrate the effectiveness of this approach through simulation.

Key phrases: Operating Systems, Process Synchronization, Semaphores, Distributed Memory Architectures.

Last Updated January 30 1995 by FNA (frank@cis.ohio-state.edu)

TECHNICAL REPORT SERIES

Computer and Information Science Research Center Department of Computer and Information Science

The Ohio State University 2015 Neil Avenue, Columbus, OH 43210-1277

1995

Computer and Information Science Research Center
Department of Computer and Information Science

The Ohio State University
2015 Neil Avenue,
Columbus, OH 43210-1277