TECHNICAL REPORT SERIES

Computer and Information Science Research Center
Department of Computer and Information Science

The Ohio State University
2036 Neil Avenue,
Columbus, OH 43210-1277


**********************NOTE*********************

Abstracts for 1992 NOTE: some titles are available ONLY via paper (This is noted after the title/author section.)

If you have trouble accessing a file, please send email to tfletch@cis.ohio-state.edu.


1992


TR36 Emergent Synchrony in Locally Coupled Neural Oscillators by DeLiang Wang. PAPER COPY ONLY

A fundamental aspect of perception is to bind spatially separate sensory features, essential for object identification, segmentation of different objects, and figure/ground segregation. Theoretical considerations and neurophysiological findings point to the temporal correlation of feature detectors as a binding mechanism (von der Malsburg 1981; von der Malsburg and Schneider 1986; Gray et al. 1989; Eckhorn et al. 1988). In particular, it has been demonstrated that the cat visual cortex exhibits 40-60 Hz stimulus-dependent oscillations, and synchronization exists in spatially remote columns (up to 7 mm) which reflects global stimulus properties (Gray et al. 1989; Eckhorn et al. 1988). What neural mechanisms underlie this global synchrony? Many neural models thus proposed end up relying on global connections, leading to the question whether lateral connections alone can produce remote synchronization. With a formulation different from the frequently used phase model, we find that locally coupled neural oscillators can indeed yield global synchrony. The model employs a previously suggested mechanism that the efficacy of the connections is allowed to change on a fast time scale. Based on the known connectivity of the visual cortex, the model outputs closely resemble the experimental findings. This model lays a computational foundation for Gestalt perceptual grouping.


TR35 Low-congestion Embedding of Multiple Graphs in a Hypercube by Yu-Chee Tseng, Ten-Hwang Lai, & Li-Fen Wu.

The purpose of this paper is to demonstrate the use of matrix as a representation of graph embedding in a hypercube. We denote the image of an embedding (which is a subgraph of the hypercube) was a matrix. Through such representation, we are able to simplify, unify, generalize, or improve existing results regarding multigraph embedding on a hypercube.


TR34 Detecting Termination by Weight-throwing in a Faulty Distributed System by Yu-Chee Tseng.

This paper presents a fault-tolerant termination detection algorithm for a distributed system in which processes tend to fail. Allowing arbitrary number of processes to have fail-stop behavior, the algorithm can detect termination efficiently with O(M + kn + n) control messages and O(k) detection delays, where M is the number of basic messages issued, n is the number of processes, and k is the actual number of processes that fail. This algorithm has less detection delays than existing algorithms in the literature and comparable performance in terms of message complexity.

Key Words: distributed algorithm, distributed system, termination detection, fault-tolerance, fail-stop processor, weight-throwing.


TR33 Detecting Termination of a Distributed Computation using Global Flush Messages by Ashwani Gahlot and Mukesh Singhal

The Global Flush communication primitive allows the sender to order receipt of a message with respect to receipt of other messages. Use of this primitive provides an elegant way to reason about message orderings (as it allows a process to deduce the receipt even orderings at other processes), simplifies algorithm development, and permits higher commuhnication level concurrency than some communication primitives proposed in the literature.

In this paper, we propose an algorithm to detect termination in a distributed computation. This algorithm is based on Global Flush communication primitives and does not assume any specific communication topology or FIFO channels. As opposed to other marker-based algorithms which potentially use unbounded number of control messages in the worst case (due to unsuccessful termination detection attempts), the proposed algorithm uses a bounded number of control messages. The control messages carry only a boolean variable. Thus the overhead due to the size of a control message is very low. The algorithm does not require counting of the number of messages exchanged by processes in the underlying distributed computation.

Index Terms: Distributed computation, Termination Detection, Global Flush communication, consistent cuts.


TR32 On Bounded-Probability Operators and C_P1 by Sanjay Gupta

Keywords: Computational complexity, Probabilistic classes, Polynomial-time hierarchy. (No real abstract)


TR31 Toward a Universal Framework for Data Translation by J.A. Gawkowski & S.A. Mamrak

Electronic representations of data objects vary widely. The same information can typically be represented in numerous ways. These differences can be realized at various levels of abstraction.

Such variations can be the result of 1) multiple views of data; 2) differences between application data models; 3) differing proprietary formats of applications. This plethora of underlying representations of electronic data makes it extremely difficult for information to be transported between applications. This plethora of underlying representations of electronic data makes it extremely difficult for information to be transported between applications.

The most general response to the data translation problem is to provide a mechanism whereby, given the description of two data representations, code is automatically generated to translate between them.

In an effort to approach this ultimate goal, we propose a universal framework for data translation. This proposed framework captures certain generalities that are present in all translation tasks. In addition, we are developing an architecture that is modeled after this framework. The goal of the architectgure is to validate the ideas proposed in the framework, showing that a large part of the data translation problem can be generalized.

Keywords: code generation, data translation, intermediate form, object oriented data model.

TR30 Bounded Fairness by D.N. Jayasimha and Nachum Dershowitz

We study bounded fairness, a stronger notion than the usual fairness based on eventuality. Bounded fairness can be used, for example, to relate the frequency of shared resource access of a particular process with regard to other processes which access the resource with mutual exclusion. We formalize bounded fairness by introducing a new binary operator {Note: see printed version for this symbol} into temporal logic. One main difference between this logic and explicit time logics, one which we consider to be an advantage in many cases, is that time does not appear explicitly as a parameter.

The syntax and semantics for this new logic, kTL, are given. This logic is shown to be more powerful than temporal logic with the eventuality operator and as powerful as the logic with the until operator. Though kTL has the same power as the temporal logic with the until operator, we argue that the former can be used to specify bounded fairness requirements in a more natural manner than is possible with the latter. We show that there are certain properties which can be expressed more succinctly in kTL than the temporal logic with the until operator. We also give a procedure for testing the satisfiability of kTL formulas. As applications of the bounded fairness concept, we specify requirements for some standard concurrent programming problems, and show, for example, that Dekker's mutual exclusion algorithm is fair in the conventional sense, but not bounded fair. We also give examples of bounded fair algorithms.

TR29-1992 Parallel Algorithms for Volume Rendering, Raghu Machiraju, Loren Schwiebert, Roni Yagel. THIS HAS BEEN REPLACED BY TR14-1993 WHICH IS AVAIL. BY E-MAIL OR PAPER)

TR28 High Quality Template-Based Volume Viewing by Roni Yagel

In this paper we present a three-phase template-based algorithm for parallel viewing of volumetric datasets. It starts by determining the major-plane, the image extent, and a ray-template. The major-plane is that one of the three object space planes onto which the image is projected to the largest area. From this plane, and within the projected region of the volume, parallel rays are cast into the volume by repeating the sequence of steps specified by the ray-template. We have described an efficient way to accurately compute those voxels that the ray traverses when it enters and exits the volume, by means of discrete arithmetic. In addition, we described the way to place the template in order to assure a homogeneous and complete tessellation of the volume by rays, that is, a uniform sampling of the volume by discrete rays.

Once the basic algorithm is understood, we present three ways to extend it. The first extension aims at achieving greater image quality by supporting screen supersampling based on rays emanating from sub-pixel origins. The second extension also addresses the issue of image quality by providing multiple samples in each voxel along the ray. The third one describes various paradigms for traversing the volume adaptively, that is, the type of ray used is changed according to some characteristics of the space traversed. We show that the template based approach is extremely attractive for many adaptive traversal schemes. We conclude this paper with a short description of the software implementation of our algorithm accompanied by a demonstration of its superior performance statistics.


TR27 On the Closure of Certain Function Classes under Integer Division by Polynomially-Bounded Functions by Sanjay Gupta

We show that Gap-P is closed under integer division by s(x) if and only if SPP = MODsP (Note: the s in MOD P is a subscript!).

Keywords: computational Complexity, Function classes, Closure properties.


TR26 Optimal Evaluation of Fortran-90 Array Expressions for Distributed Memory Machines by S.D. Kaushik, S.K.S. Gupta, C.- H. Huang, P. Sadayappan.

The owner-computes strategy has been used for evaluation of Fortran-90 array expressions on distributed memory machines. This strategy simplifies code generation but is often expensive in terms of the total communication cost and size of temporary memory required for its implementation. In this paper, we propose the relaxing of the owner computes strategy, to reduce the total communication and temporary storage cost. We develop cost metrics for measuring the communication and memory cost associated with the evaluation of Fortran-90 array expressions on distributed memory machines. The communication tree is introduced as a useful representation for array expressions involving associative commutative operators of one kind. Procedures for estimating communication and temporary memory costs for a communication tree are described. An efficient polynomial- time algorithm to determine an evaluation order which minimizes the communication cost is presented. We also present an efficient polynomial- time algorithm to determine the evaluation order which minimizes the memory cost involved in the evaluation of such expressions.

Keywords: Data distributions, Fortran-90, Array expressions, Arborescence, Binary expression trees.


TR25 An Implementation of Global Flush Primitive using F-channels by Ashwani Gahlot & Mukesh Singhal

The Global Flush communication primitive allows the sender to order receipt of other messages. Use of this primitive provides an elegant way to reason about message orderings (as it allows a process to deduce the receipt event orderings at other processes), simplifies algorithm development, and permits higher communication level concurrency than some communication primitives proposed in the literature. In this paper, we discuss an implementation of the Global Flush communication primitive that is based on F-channels.

Keywords: Asynchronous distributed systems, Communication architecture, Group communication, Distributed algorithms.


TR24 Design and Specification of Iterators Using the Swapping Paradigm by Bruce W. Weide, Stephen H. Edwards, Douglas E. Harms, David A. Lamb.

How should iterators be abstracted and encapsulated in modern imperative languages, e.g., Ada and C++? We consider the combined impact of several factors on this question: the need for a "common interface model" for user-defined iterator abstractions; the importance of formal methods in specifying such a model; and problems involved in modular correctness proofs of iterator implementations and clients. A series of iterator designs illustrates the advantages of the "swapping paradigm" over designs based on the traditional "copying paradigm." Specifically, swapping-based designs admit more efficient implementations, while offering straightforward formal specifications and the potential for modular reasoning about program behavior. The final proposed design schema defines a common interface model for iterators for a wide variety of generic collection abstractgions in languages such as Ada and C++.

Index Terms - Formal specification, iterator, program verification, proof of correctness, swapping.

Copyright (c) 1992 by the authors. All rights reserved.


TR23 Experiments with Virtual Threads by Wolfgang W. Kuechlin & Jeffrey A. Ward PAPER COPY ONLY

The goal of a virtual threads system is to provide to a user much the same convenience in dealing with threads that a virtual memory system provides in dealing with memory. Most importantly, the number of available threads is virtually unlimited, so that the parallelization of a application software can follow the opportunities in the program rather than the restrictions of the hardware. The convenience of programming is also complemented by efficiency in execution, with a significant reduction in the number of kernel context switches through lazy evaluation of virtual threads.

These ideas have been investigated in the past by Vandevoorde and Roberts for WorkCrews under Modula2+, and by Mohr, Kranz, and Halstead, for futures in a parallel Scheme (Multilisp). Our main contribution is the adaptation of these concepts for the specific case of the widely used C Threads environment, the discussion of scheduling alternatives, and detailed performance measurements.

Our prototype implementation extends our "S-threads" environment for parallel symbolic computation in C. While in S-threads each thread fork maps to a C thread fork, many virtual S-threads now execute on the same C thread. S-threads supports the parallel Computer Algebra system PARSAC-2. We compare the performance of PARSAC-2 algorithms for list sorting and polynomial g.c.d. computation on S-threads with that on virtual S-threads. We exhibit a case where the virtual threads concept allows us to ignore grain-size issues in the code, because the un-optimized code performs nearly as well on virtual threads as a carefully optimized version which avoids below grainsize forks in S-threads.

We discuss the issue of virtual threads inducing deadlock in the presence of signals and waits and propose various solutions to this problem. We present performance measurements on a program using signal and waits.


TR22 Distributed Rule Monitoring in Active Databases and Its Performance Analysis by Ing-Miin Hsu, Mukesh Singhal, and Ming T. Liu.

Monitoring rules in a distributed active database involves three design issues: how to decompose rules, how to distribute rules to sites, and how to evaluate distributed rules correctly. In this paper, we study these three issues for complicated rules, which are complex and time-consuming to evaluate. We propose a new relational operator, AND, and the associated algebraic manipulations of this operator to find independent parts of rule query, which can be distributed among sites. Due to problems introduced by geographical distribution of sites in a distributed system, such as communication delay and lack of a global clock, correct evalutaion of distributed rules is not an easy task. We propose a distributed evaluation algorithm which guarantees the correctness of the evaluation result of the distributed rule by collecting consistent local results from sites to form the global view. The message count and the response time of this distributed evaluation algorithm are analyzed for triggering updates.


TR21 An (N-1)-Resilient Algorithm for Distributed Termination Detection by Ten-Hwang Lai and Li-Fen Wu.

A termination detector is a distributed algorithm that, when superimposed on a distributed system of n processes, is able to determine whether the computation of the underlying system has terminated. Fault tolerance is one of the most desirable properties of distributed systems. While some problems are notoriously known to be unsolvable in the presence of faulty processes, many distributed algorithms are able to tolerate failures of processes. When a termination detector is applied to a fault-tolerant underlying system, it is desirable that the detector itself be fault-tolerant. In most existing distributed systems, processes certainly may fail, but only occasionally or even rarely. Thus, a desirable fault-tolerant algorithm is one that can tolerate any number of faults, that runs as efficiently as the best available non-fault-tolerant algorithm if no process fails during the computation, and that incurs only a reasonable amount of extra cost for each process failure that actually occurs. In this paper, we present a fault-tolerant termination detection algorithm with these nice properties. The algorithm is much more efficient that existing (fault-tolerant) ones.

Index Terms: Distributed algorithm, fault tolerance, message complexity, termination detection.


TR20 Bounding Logical Clocks by Ashwani Gahlot, Mohan Ahuja and Tim Carlson. PAPER COPY ONLY.

A distributed system is characterized by the lack of a global shared time. Many models of logical time have been proposed in the past to solve various problems. Clocks have been developed for these models to deduce causality and potential causality between events. These clocks are unbounded.

In this paper we define the notion of bounded clocks and present algorithms to bound vector clocks in a system that supports the global flush communication primitive. We briefly discuss some applications of bounded clocks.

Keywords: Bounded clocks, Causality, F-channels, Global flush primitive, Global Snapshot, Debugging.


TR19 Motion Planning amidst Transient Obstacles by Kikuo Fujimura PAPER COPY ONLY

Motion planning in the presence of time-dependent obstacles is studied. Much of the prior approaches to motion planning problems assume that the obstacles always exist in the environment, whether they are stationary or in motion. In this paper, we consider the case that the environment contains obstacles whose existing periods are dependent on time, i.e., they can appear and disappear in the environment. This formulation allows us to model a variety of time-varying situations that can arise in application domains. A concept similar to that of visibility is used to solve the problem. An algorithm is presented to generate a motion in such a dynamic domain, its time-minimality is proved, and computation time is analyzed.


TR18 Programmer's Manual to a C++ Implementation of Hierarchical F-Channels by Keith Shafer & M. Ahuja. PAPER COPY ONLY

INTRODUCTION:

For brevity, we assume that the reader is familiar with message overtaking, channel grammars and tools, reachable sets, deliverable sets, F-channels, hierarchical channels, Hierarchical F-Channels, the process-to-channelagent-to-process (PCP) model of distributed communication, the process-to-channelagent-to-channelagent-to-process (PCCP) model of distributed communication, and the PCCP-based Hierarchical F-Channel implementation derived in [9,10]. The Hierarchical F-Channel implementation found in references [9,10] has been successfully coded in C++. The user's manual for this C++ Hierarchical F-Channel implementation is presented within. This code allows processes to establish a Hierarchical F-Channel or its variants with simple calls like those currently used for sockets or InterViews, a free user interface library accessible on the Internet which contains some generic communication code. In Section 2, we relate the C++ implementation derived from the implementation found in [9, 10]. In Section 3, we give traditional Unix-like manual pages for using the code presented within. In Section 4, we describe interpretive test programs that may be used to get started. In Section 5, we conclude with future directions for further work on this implementation. Finally, in Appendix A we present most of the C++ code.


TR17 Global Flush Primitive for Sending a Message to a Group of Processes by Ashwani Gahlot, Mohan Ahuja, and Timothy Carlson. PAPER COPY ONLY

We propose a global flush communication primitive which allows the sender to order receipt of a message with respect to receipt of other messages. Use of this primitive provides a convenient way to reason about messages (as it allows a process to deduce the receipt event orderings at other processes), simplifies algorithm development, and permits higher communication level concurrency than some communication primitives proposed in the literature. We illustrate its applications and present a counter based implementation.

Keywords: Asynchronous distributed systems, Communication architecture, Group communication, Distributed algorithm, F-Channels, Termination Detection.


TR16 Hierarchy of Communication Speeds: definition, uses in designing concurrent systems, and implementation by Mohan Ahuja and Ravi Prakash. PAPER COPY ONLY.

Little work has been done to understand the concept of relative speed of communication and hierarchical channels. We present examples of applications where hierarchical channels are useful and an implementation of such a channel using selective flooding. Finally, we compare the hierarchical channels with F-channels.


TR15 An Implementation of Flush Channels for Possibly Cyclic Networks by Xuefeng Dong & Mohan Ahuja. PAPER COPY ONLY

In this paper, we propose an implementation of Flush, channels using selective flooding without assuming acyclicity of the underlying network. To prevent the occurrence of deadlocks that may occur due to cycles in the underlying network, a timeout technique is applied at each switch node. This however may lead to illegal message overtaking which are appropriately handled at the sink node.


TR14-1992 Mapping Parallel Computations to Multiprocessor Architectures Considering the Effects of Communication by Loren Schwiebert & D.N. Jayasimha. PAPER COPY ONLY

The problem of mapping parallel algorithms to parallel architectures has been investigated using numerous approaches based on various assumptions. We briefly review some of the previous methods suggested to solve the mapping problem and point out the drawbacks inherent in the past work. We then present the modifications made to correct these shortcomings. Our approach differs from previous approaches primarily by considering the effects of the communication pattern of the parallel algorithm and the communication bandwidth of the parallel architecture. We develop an iterative heuristic algorithm to implement our technique and show that our more realistic method of evaluating the performance of a mapping allows a better mapping to be chosen. Our process uses information about the current mapping to generate an improved mapping. We illustrate the advantage of incorporating this continual refinement. Experimental results are presented to demonstrate the effectiveness of our approach.

Key Words: mapping problem, parallel architectures, directed acyclic task graph, critical edges, communication pattern and communication bandwidth.


TR13-1992 On the Generation of Efficient Data Communication for Distributed-Memory Machines by S.K.S. Gupta, S.D. Kaushik, S. Mufti, S. Sharma, C.-H. Huang & P. Sadayappan PAPER COPY ONLY

A procedure for performing efficient data communication for a fixed set of data distributions - block, cyclic, and block-cyclic is presented. Closed form expressions for determining the send and receive processor sets and data sets for block and cyclic source and target distributions are derived. These closed forms are then used to generate efficient communication code for the most general distributions considered. The inter-processor communication required to implement Fortran 90 array assignment statements is characterized.

Keywords: Compiler optimizations, Data distributions, Data alignment, Vectorized data communication, Fortran 90.


TR12-1992 Toward Fundamentally Sound Definitions for the Data Flow Testing Criteria by Allen S. Parrish and Stuart H. Zweben PAPER COPY ONLY

Previous work has shown that the original definitions of the data flow testing criteria were fundamentally unsound, in that they violated the applicability property (which requires the criterion to be satisfiable when applied to any program from a general, decidable class). Such work proposed alternative definitions of these criteria to resolve this problem. However, for the data flow criteria all-du-paths and all-uses, changing the definitions in this fashion caused another fundamental property (that of statement coverage) to be violated. In addition, certain desirable, intuitive relationships involving the relative strengths (based on the idea of subsumption) of the criteria were violated by the new definitions. It is known that the problems introduced by changing the definitions can be partially resolved by placing restrictions on the programs to which the criteria apply, thereby making the criteria satisfy applicability relative to the restricted class; however, the restrictions heretofore considered are undecidable and cannot be implemented in practice.

In this paper, we show that through straightforward changes to either the definitions of the criteria or the processing environment, all-du-paths and all-uses can satisfy both the applicability and statement coverage properties; no undecidable restrictions on the programs to which the criteria apply are necessary. Moreover, the subsumption relationships among the criteria are restored to their original intuitive form.

Index Terms: Adequacy criteria, Data flow, Software testing.


TR11-1992 Generating Parallel Programs from Tensor Product Formulas: A Case Study of Strassen's Matrix Multiplication Algorithm by C.-H. Huang, J.R. Johnson, & R.W. Johnson. PAPER COPY ONLY

We present program generation of Strassen's matrix multiplication algorithm from an iterative tensor product formulation. We directly translate several mathematically equivalent tensor product formulas to obtain sequential and parallel programs with different performance characteristics for a shared-memory parallel architecture, the Encore Multimax System. The performance and the numerical accuracy of these programs are reported.

Keywords: Strassen's algorithm, tensor product, block recursive algorithm, shared-memory parallel architecture, parallel processing, performance evaluation, numerical accuracy.


TR10-1992 From Service Specification to Protocol Converter A Synchronization Transition Set Approach by Hou-Wa J. Jeng and Ming T. Liu PAPER COPY ONLY

With the proliferation of different network architectures, it has been recognized that protocl conversion is needed for achieving the interoperability between computer networks that implement different protocols [1,2]. Since 1986, many protocol converter construction algorithms based on formal models have been proposed. In the meantime, it has been observed by many researchers that algorithms based on service specification have in general less complexity at the design stage. In this paper, we propose a five-step algorithm to formally derive a protocol converter from the service specification: (1) Generate the Service System Graph from the given service specifications; (2) Derive the Conversion Service Specification from the service system graph; (3) Generate the Synchronization Sets from the conversion service specification; (4) Compose the converter using the synchronization sets; and (5) Remove the remaining service transitions to obtain the final Protocol Converter Specification. The converter so constructed has the three most desirable properties of a correct protocol converter; namely, conformity property (also known as maximum safety property), liveness property, and transparency property. Moreover, unlike many other protocol conversion algorithms, our proposed algorithm does not need a validation phase at the end to ensure the correctness of the converter as long as the protocols being converted and the given conversion service requirement and service specifications are by themselves correct; e.g. free from deadlock and livelock. The reason is that our proposed algorithm achieves the goal of preserving all the properties and functionalities of the original protocols during the derivation of the converter. In addition, our proposed algorithm has the capability to handle the conversion between sequences of transitions of different protocols. This capability is crucial when dealing with complex protocols in which the semantics of sequences of transitions/messages is different from that of a simple concatenation of the semantics of each original individual transition/message and the conversion can only be done in a unit of sequence of transitions/messages. For ease in specification and discussion, our method uses the formal CFSM (Communication Finite State Machine) model.


TR09 UP and the Low and High Hierarchies: A Relativized Separation by Ming-Jye Sheu and Timothy J. Long

The low and high hierarchies within NP were introduced by Schoening in order to classify sets in NP. It is not know whether the low and high hierarchies include all sets in NP. IN this paper, using the circuit lower bound techniques of Hastad and Ko, we construct an oracle set relative to which UP contains a language that is not in any level of the low hierarchy and such that no language in UP is in any level of the high hierarchy. Thus, in order to prove that UP contains languages that are in the high hierarchy or that UP is contained in the low hierarchyk one needs to use non-relativized proof techniques. Since it is known that UP (superscript A) is low for PP (superscript A) for all sets A, our result also shows that the interaction between UP and PP is crucial for the lowness of UP for PP; changing the base class to any level of the polynomial-time hierarchy destroys the lowness of UP.


TR8-1992 Generalized Dynamic group Barrier Synchronization by S.K.S. Gupta and D.N. Jayasimha PAPER COPY ONLY

This paper presents a process based dynamic barrier scheme which can be implemented on large scale shared memory architectures. It improves upon the dynamic barrier synchronization scheme based on group locks proposed by Dimitrovsky [5]. The set of processes which meet at a barrier is determined from the semantics of the program. The main idea behind the scheme is to identify the dynamic set of processes (the interested processes) which need to barrier synchronize at various control points in the execution of the program. This identification, which is begun at compile time is completed at run time aided by the information generated at compile time. Compared to the other schemes presented in the literature, this scheme does not delay the execution of processes which need not synchronize at a barrier (the uninterested processes). Since a minimal set of processes synchronize at a barrier, the synchronization overhead, which grows superlinearly with the number of processes, are less compared to the other schemes. This scheme also exploits the use of concurrent barriers at a finer level of granularity than all other schemes.

Keywords: shared memory multiprocessing, process based barrier synchronization, dynamic group synchronization, parallelizing compilers, control flow analysis.


TR07 Performance of a Tree-Structured Hierarchical Memory Multiprocessor by J.D. Martens and D.N. Jayasimha

Latency and synchronization overheads have been identified as fundamental problems in large-scale shared memory multiprocessing. In an earlier paper, one of the authors introduced a hierarchical shared memory multiprocessor architecture [9]. This architecture, based on hierarchical shared memory, exploits the notion of partial sharing of variables to reduce latency and synchronization costs. Using a stochastic model for generation of memory requests by each processor, a simulation-based study of the performance of the Tree-structured Hierarchical Memory Multiprocessor architecture (THMM) is compared to that of a conventional shared memory multiprocessor architecture (CMM). It is found that under a variety of stochastically generated workloads, the THMM outperforms a CMM with the same number of processors.

Key words: hierarchical shared memory multiprocessing, latency, contention, simulation.


TR6-1992 An Evaluation of the Use of the Integrated Chameleon Architecture in the Building of Bibliographic Translators by Richard R. Morris PAPER COPY ONLY

The effort involved in data translation is of concern to anyone working with multiple text-formatting products. The Chameleon Research Project has created a toolset which automates the process. This thesis evaluates the effectiveness of the Integrated Chameleon Architecture in solving the problem of data translation within a particular domain.

In this assessment, the ICA was used to build translators between two bibliographic formatting schemes. This thesis outlines the develoopment process, as well as important aspects of data translation and the functionality of the ICA tools. Our experiment's resulting ratio of user effort to code generated by the ICA is evaluated at 1:9.5. This savings leads to the conclusion that the ICA has a great deal of practicality and potential.


TR5-1992 An Algebraic Theory for Modeling Direct Interconnection Networks S.D. Kaushik, S. Sharma, C.-H. Huang, J.R. Johnson, R.W. Johnson, and P. Sadayappan PAPER COPY ONLY

We present an algebraic theory based on tensor products for modeling direct interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on shared-memory and/or vector multi-processors. We will use this theory for mapping algorithms onto distributed-memory architectures. In this paper, we focus on direct interconnection networks. Topologies of rings, n-dimensional meshes, and hypercubes are represented in tensor product form. We demonstrate the use of this theory by mapping matrix transposition and matrix multiplication onto different networks. This theory also provides a formal method for specifying and verifying network embedding.

Keywords: Tensor Product, Block Recursive Algorithm, Direct Interconnection Network, Network Embedding, Algorithm Mapping.


TR4-1992 An Algebraic Theory for Modeling Multistage Interconnection Networks by S.D. Kaushik, S. Sharma, C.-H. Huang, J.R. Johnson, R.W. Johnson, and P. Sadayappan PAPER COPY ONLY

An algebraic theory based on tensor products for modeling multistage interconnection networks is presented. This algebraic theory is suitable for mapping numerical algorithms on various network-based architectures. The tensor product representations of the baseline network and the omega network are described. Properties of multistage interconnection networks, such as network partitioning, topological equivalence, and functional equivalence are specified and verified. A mapping of the matrix transposition algorithm on a specific network using the tensor product notation is presented.

Keywords: Tensor Product, Multistage Interconnection Network, Partitionability, Topological Equivalence, Functional Equivalence, Algorithm Mapping.


TR3-1992 Technical Documentation for The Integrated Chameleon Architecture by S.A. Mamrak, C.S. O'Connell, and J. Barnes.

THIS IS NO LONGER AVAILABLE. IT has been superseded by a book which is available from Prentice-Hall. "The Integrated Chameleon Architecture: Translating Electronic Documents with Style." Sandra Mamrak, Conleth S. O'Connell, and Julie Barnes. ISBN 0-13-056418-4


TR02 A More Efficient Message-Optimal Algorithm for Distributed Termination Detection by Ten-Hwang Lai, Yu-Chee Tseng, and Xuefeng Dong.

Termination detection is a fundamental problem in distributed computing. Many algorithms have been proposed, but only the Chandrasekaran and and Venkatesan (CV) algorithm (1) is known to be optimal in worst-case message complexity. This optimal algorithm, however, has several undesirable properties. First, it always requires M' + 2*|E| + n - 1 control messages, whether it is worst case or best case, where M' is the number of basic messages issued by the underlying computation after the algorithm starts, |E| is the number of channels in the system, and n is the number of processes. Second, its worst-case detection delay is O(M'). In a message-intensive computation, that might not be tolerable. Third, the maximum amount of space needed by each process is O(M'), a quantity not known at compile time, making it necessary to use the more expensive dynamic memory allocation. Last, it works only for FIFO channels.

The purpose of this paper is to remedy these drawbacks, while keeping its strength. We propose an algorithm that requries M' + 2(n-1) control messages in the worst case, but much fewer on the average, and in the best case, it uses only 2(n - 1) control messages, no matter how large M' is. The worst-case detection delay is O(n), as compared to the CV-algorithm's O(M'); and the space complexity is Theta(n) for each process, a compile-time known quantity. The algorithm works equally well for both FIFO and Non-FIFO channels.

Index terms: Distributed systems, termination detection, algorithms, message complexity.


TR1-1992 Passive Space and Time View of Computing by M. Ahuja, T. Carlson, and A. Gahlot. PAPER COPY ONLY

We point out two problems with viewing a process as a totally ordered set of events: First the loss of all information about potential intra-process parallelism. Second, the need to view a distributed computation as a partially ordered set of events, resulting in a major shift in the reasoning framework which makes distributed computing seem very different from and more difficult than sequential computing. We argue that it is more natural to view both a distributed and a sequential computation as a partially ordered set of events. Doing so leads to a view, called passive-space and time, which we propose in this paper. In the proposed view, a point in space is a passive entity which does not order events that occur at it, and the order is determined by the interaction among the events.

We define a relation, "Affects", between pairs of events, which captures the partial order on events in the passive-space and time view. We define a mechanism for timestamping events such that the "Affects" relation, and so potential concurrency, between events can be inferred from their timestamps. We also give a timestamping mechanism such that potential concurrency can be inferred partially; the extent of such inferring depends on the costs associated with the mechanism. Inferring potential concurrency can be used for debugging. We compare the proposed view with other views by comparing the timestamping mechanisms.

Key Words:logical clocks, timestamps, event ordering, event causality, debugging and inferring causality, space time view, interleaving view.