Documented Scientific Discoveries and Technical Innovations

Welcome to the High Performance Computing and Software Laboratory Technical Report Browser

This document lists the titles of selected technical reports (published or to be published) of the High Performance Computing and Software Laboratory (since 1994) with links to corresponding ./publications/abstracts. Included in the heading of each ./publications/abstract is a link to download the actual technical report.

Papers sorted by Topics


Table of Contents

I/O Management and Data-intensitive Computing
Processor Caches and DRAM Memory Systems

Multimedia Streaming and Other Applications in Internet Systems
Performance and Reliability of Internet Systems
P2P and Overlay Networks
Wireless and Sensor Networks

Cluster Systems and Computing
Parallel Systems and Computing


I/O Management and Data-intensitive Computing

``Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems" , Proceedings of 38th ACM International Conference on Very Large Databases (VLDB 2012), Istanbul, Turkey, August 27-31, 2012.

``hStorage-DB: hhStorageDB: heterogeneity-aware data management to exploit full capacity of hybrid storage systems", Proceedings of 38th ACM International Conference on Very Large Databases (VLDB 2012), Istanbul, Turkey, August 27-31, 2012.

``DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems", Proceedings of 2nd ACM Symposium on Cloud Computing (SOCC 2011), Cascais, Portugal, October 27-28, 2011.

``YSmart: Yet another SQL-to-MapReduce Translator", Proceedings of 31st International Conference on Distributed Computing Systems (ICDCS 2011), Minneapolis, Minnesota, June 20-24, 2011. Best Paper Award .

YSmart is used both as an independent open source software and as a component in large production systems

``Hystor: Making the Best Use of Solid State Drives in High Performance Storage Systems", Proceedings of 25th ACM International Conference on Supercomputing (ICS 2011), Tucson, Arizona, May 31 - June 4, 2011. Best Paper Award .

``CAFTL: a content-aware flash translationa layer enhancing the lifespan of flash memory basedsolid state drives", Proceedings of 9th USENIX Conference on File and Storage Technologies (FAST'11), San Jose, California, February 15-17, 2011.

``Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing", Proceedings of 17th International Symposium on High Performance Computer Architecture (HPCA-17), San Antonio, Texas, February 12-16, 2011.

``RCFile: a fast and space-efficient data placement structure in MapReduce-based Warehouse systems", Proceedings of International Conference on Data Engineering (ICDE 2011), Hannover, Germany, April 11-16, 2011.

RCFile has been widely used for big data analytics in distributed systems:

``PS-BC: power-saving considerations in design of buffer caches serving heterogeneous storage devices", Proceedings of 16th ACM International Symposium on Low Power Electronics and Design (ISLPED 2010), Austin, Texas, August 18-20, 2010.

``Understanding intrinsic characteristics and system implications of flash memory based solid state drives", Proceedings of 2009 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS/Performance 2009), Seattle, WA, June 15-19, 2009.

``BP-Wrapper: a system framework making any replacement algorithms (almost) lock contention free" , Proceedings of 25th International Conference on Data Engineering (ICDE'09), Shanghai, China, March 29- April 4, 2009.

``Caching for Bursts (C-Burst): let hard disks sleep well and work energetically", Proceedings of 13th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'08), Bangalore, India, August 11-13, 2008.

``Cost-aware caching algorithms for distributed storage servers", Proceedings of the 21st International Symposium on Distributed Computing (DISC'07), Lemesos, Cyprus, September 24-26, 2007.

``STEP: Sequentiality and Thrashing Detection based Prefetching to improve performance of networked storage servers", Proceedings of the 27 International Conference on Distributed Computing Systems (ICDCS'07), Toronto, Canada, June 25-29, 2007.

``DiskSeen: exploiting disk layout and access history to enhance I/O prefetch", Proceedings of 2007 USENIX Annual Technical Conference (USENIX'07), Santa Clara, California, June 17-22, 2007.

``Coordinated multilevel buffer cache management with consistent access locality quantification", IEEE Transactions on Computers, Vol. 56, No. 1, 2007.

``SmartSaver: turning flash drive into a disk energy saver for mobile computers", Proceedings of 11th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'06), Tegernsee, Germany, October 4-6, 2006.

``A locality-aware cooperative cache management protocol to improve network file system performance", Proceedings of the 26th International Conference on Distributed Computing Systems (ICDCS'06), Lisbon, Portugal, July 4-7, 2006.

``DULO: an effective buffer cache management scheme to exploit both temporal and spatial localities", Proceedings of the 4th USENIX Conference on Files and Storage Technologies (FAST'05), San Francisco, CA, December 14-16, 2005.

``CLOCK-Pro: an effective improvement of the CLOCK replacement", Proceedings of 2005 USENIX Annual Technical Conference (USENIX'05), Anaheim, CA, April 10-15, 2005.

Clock-Pro has been adopted in NetBSD and patched in Linux Kernel:

``Making LRU friendly to weak locality workloads: a novel replacement algorithm to improve buffer cache performance", IEEE Transactions on Computers, Vol. 54, No. 8, 2005.

``Token-ordered LRU: an effective page replacement policy and its implementation in Linux systems", Performance Evaluation, Vol. 60, Issue 1-4, 2005.

The token algorithm has been adopted in Linux Kernel

``ULC: A file block placement and replacement protocol to effectively exploit hierarchical locality in multi-level buffer caches" , Proceedings of the 24th International Confernece on Distributed Computing Systems, (ICDCS'04), Tokyo, Japan, March 23-26, 2004.

``Efficient Distributed Disk Caching in Data Grid Management" Proceedings of IEEE International Confernece on Cluster Computing, (Cluster'03), December 1-4, 2003.

``LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance" , Proceedings of the 2002 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, (SIGMETRICS'02), Marina Del Rey, California, June 15-19, 2002.

The LIRS caching algorithm has been adopted in MySQL databases.

  • The LIRS caching algorithm has been adopted in MySQL, the world's most widely used database system.
  • The LIRS source code is archived in page management section of MySQL 5.1.
  • Towards an O(1) VM, Rik van Riel. (A Linux architect's view of LIRS in virtual memory)

``TPF: a system thrashing protection facility", Software: Practice and Experience, Vol. 32, Issue 3, 2002.

``Adaptive page replacement to protect thrashing in Linux", Proceedings of the 5th USENIX Annual Linux Showcase and Conference, (ALS'01), Oakland, California, November 5-10, 2001.


Processor Caches and DRAM Memory Systems

``BWS: Balanced Work Stealing for time-sharing multicores", Proceedings of ACM EuroSys'12, Bern, Switzerland, April 10-13, 2012.

``SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Caches from Thrashing in Multicores", Proceedings of ACM EuroSys'11, Salzburg, Austria, April 10-13, 2011.

``ULCC: a user-level facility for optimizing shared cache performance on multicores", Proceedings of 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2011), San Antonio, Texas, February 12-16, 2011.

``Enabling software management for multicore caches with a lightweight hardware support", Proceedings of 22nd ACM/IEEE Annual Conference on Supercomputing (SC09), Portland, Oregon, November 14-20, 2009.

``Soft-OLP: improving hardware cache performance through software-controlled object-level partitioning", Proceedings of 18th International Conference on Parallel Architectures and Compilation Techniques (PACT 2009), Raleigh, North Carolina, September 12-16, 2009.

``MCC-DB: minimizing cache conflicts in multi-core processors for databases", Proceedings of 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, France, August 24-28, 2009.

``Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems", Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA'08), Salt Lake City, Utah, February 16-20, 2008.

The cache partitioning method has been adopted by Intel:

``MESA: reducing cache conflicts by increasing static and run-time methods", Proceedings of International Symposium on Performance Analysis of Systems and Software (ISPASS-2006), Austin, Texas, March 19-21, 2006.

``Look-ahead architecture adaptation to reduce processor power consumption" IEEE Micro, Vol. 25, No. 4, July/August, 2005.

``Design and optimization of large size and low overhead off-chip caches", IEEE Transactions on Computers, Vol. 53, No. 7, 2004.

``Access-mode predictions for low-power cache design", IEEE Micro, Vol. 22, No. 2, March/April, 2002.

``Fine-grain priority scheduling on multi-channel memory systems", Proceedings of the 8th International Symposium on High Performance Computer Architecture, (HPCA-8), Cambridge, Massachusetts, February 2-6, 2002.

``Breaking address mapping symmetry at multi-level of memory hierarchy to reduce DRAM row-buffer conflicts", Journal of Instruction-Level Parallelism, Vol. 3, 2001.

``Cached DRAM for ILP processor memory access latency reduction", IEEE Micro, Vol. 21, No. 4, July/August, 2001.

``Fast bit-reversals on uniprocessors and shared-memory multiprocessors", SIAM Journal on Scientific Computing, Vol. 22, No. 6, 2001.

``A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality", Proceedings of the 33rd Annual International Symposium on Microarchitecture, (Micro-33), Monterey, California, December 10-13, 2000.

The permutation technique has been adopted in the memory controller in the Sun MicroSystems' UltraSPARC IIIi processor.

``Improving memory performance of sorting algorithms", ACM Journal on Experimental Algorithmics, Vol. 5, 2000.

``Cacheminer: a runtime approach to exploit cache locality on SMP", IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 4, 2000.

``Cache-optimal methods for bit-reversals", Proceedings of Supercomputing'99, (SC'99), November, Portland, Oregon, 1999.

``A memory-layout oriented run-time technique for locality optimization", Proceedings of 1998 International Conference on Parallel Processing, (ICPP'98), August 1998.

Exploiting Cache Locality on Symmetric Multiprocessors: A Run-Time Approach, Ph.D. Dissertation, College of William and Mary, May 1998.

``Two fast and high-associativity cache schemes", IEEE Micro, October, 1997.


Multimedia Streaming and Other Applications in Internet Systems

``Spam behavior analysis and detection in user generated content on social networks", Proceedings of 32nd International Conference on Distributed Computing Systems (ICDCS 2012), Macau, China, June 18-21, 2012.

``CUBS: coordinated upload bandwidth sharing in residential networks", Proceedings of 17th International Conference on Network Protocols (ICNP 2009), Princeton, NJ, October 13-16, 2009.

``Analyzing patterns of user content generation in online social networks", Proceedings of 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-09), Paris, France, June 28- July 1st, 2009.

``The stretched exponential distribution of Internet media access patterns" , Proceedings of 27th ACM Symposium on Principles of Distributed Computing (PODC 2008), Toronto, Canada, August 18-21, 2008.

``SProxy: a caching infrastructure to support Internet streaming", IEEE Transactions on Multimedia, Vol. 9, No. 5, 2007.

``Does Internet media traffic really follow Zipf-like distribution?", (an extended abstract), Proceedings of 2007 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'07), San Diego, California, June 12-16, 2007.

``Delving into Internet streaming media delivery: a quality and resource utilization perspective", Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC'06), Rio de Janeiro, Brazil, October 25-27, 2006.

``Segment-based streaming media proxy: modeling and optimization", IEEE Transactions on Multimedia, Vol. 8, No. 2, 2006.

``Design and evaluation of a scalable and reliable P2P assisted proxy for on-demand streaming media delivery", IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 5, 2006.

``Fast proxy delivery of multiple streaming sessions in shared running buffers", IEEE Transactions on Multimedia, Vol. 7, No. 6, December 2005.

``Segment-based proxy caching for Internet streaming media delivery", IEEE Multimedia, Vol. 12, No. 3, July-September, 2005.

``Analysis of multimedia workloads with implications for internet streaming" , Proceedings of the 14th International World Wide Web Conference, (WWW'2005), Chiba, Japan, May 10-14, 2005.

``DISC: Dynamic Interleaved Segment Caching for interactive steaming accesses", Proceedings of the 25th International Conference on Distributed Computing Systems, (ICDCS'2005), Columbus, Ohio, June 6-9, 2005.

``PROP: a scalable and reliable P2P assisted streaming proxy system" , Proceedings of the 24th International Confernece on Distributed Computing Systems, (ICDCS'04), Tokyo, Japan, March 23-26, 2004.

``SRB: Shared Running Buffers in proxy to exploit memory locality of multiple streaming media sessions" , Proceedings of the 24th International Confernece on Distributed Computing Systems, (ICDCS'04), Tokyo, Japan, March 23-26, 2004.

``Designs of high quality streaming proxy systems" , Proceedings of IEEE INFOCOM'04, Hong Kong, March 7-11, 2004.

``Investigating performance insights of segment-based proxy caching of streaming media strategies",, Proceedings of ACM International Conference on Multimedia Computing and Networking (MMCN'04), January 21-22, 2004.

``Adaptive and lazy segmentation based proxy caching for streaming media delivery" , Proceedings of 13th ACM International Workshop on Network and Operating Systems Support for Design Audio and Video, (NOSSDAV'03), Monterey, California, USA, June 1-3, 2003.


Performance and Reliability of Internet Systems

``Splitter: a proxy-based approach for post-migration testing of Web applications", Proceedings of ACM EuroSys 2010, Paris, France, April 13-16, 2010.

``Maintaining strong cache consistency for the Domain Name System", IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 8, 2007.

``DNScup: a strong cache consistency protocol for DNS", Proceedings of the 26th International Conference on Distributed Computing Systems (ICDCS'06), Lisbon, Portugal, July 4-7, 2006.

``Coordinated data prefetching for Web contents", Computer Communications, Vol. 28, Issue 17, October 2005.

``Enforcing direct communications between clients and Web servers to improve proxy performance and security", Software: Practice and Experience, Vol. 34, Issue 12, October 2004.

``Strong cache consistency support for domain name system", a poster presentation in SIGCOMM'04, Portland, Oregon, August 31 - September 3, 2004.

``Accurately modeling workload interactions for deploying prefetching in Web servers", , Proceedings of 2003 International Conference on Parallel Processing, (ICPP'03), Kaohsiung, Taiwan, China, October 6-9, 2003.

``On scalable and locality aware Web file sharing", Journal of Parallel and Distributed Computing, Vol. 63, No. 10, 2003.

``A popularity-based prediction model for Web prefetching", IEEE Computer, Vol. 36, No. 3, March, 2003.

``Detective borwsers: a software technique to improve Web access performance and security", Proceedings of the 7th International Workshop on Web Content Caching and Distribution, (WCW'02), Boulder, Colorado, August 14-16, 2002.

``Coordinated data prefetching by utilizing reference information at both proxy and Web servers", Proceedings of the ACM Workshop on Performance and Architecture of Web Servers, (PAWS-2001), Boston, Massachusetts, June 16-17, 2001.

``Exploiting neglected data locality in browsers", Proceedings of the 10th International World Wide Web Conference, (WWW10), Hong Kong, May 1-5, 2001, (an extended abstract).


P2P and Overlay Networks

``TopBT: a topology-aware and infrastructure-independent BitTorrent client", Proceedings of INFOCOM'10, San Diego, California, March 15-19, 2010. TopBT is open source software.

``A Performance Study of BitTorrent-like Peer-to-Peer Systems", IEEE Journal on Selected Areas in Communications, Vol. 25, No. 1, 2007.

``ASAP: an AS-Aware Peer-relay protocol for high quality VoIP", Proceedings of the 26th International Conference on Distributed Computing Systems (ICDCS'06), Lisbon, Portugal, July 4-7, 2006.

``Measurement, analysis, and modeling of BitTorrent-like systems" Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC'05), New Orleans, LA, October 19-21, 2005.

``Fast and low-cost search schemes by exploiting localities in P2P networks", Journal of Parallel and Distributed Computing, Vol. 65, Issue 6, 2005.

``Locality awareness in unstructured peer-to-peer systems", IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 2, February 2005.

``SCOPE: scalable consistency maintenance in structured P2P systems", Proceedings of IEEE INFOCOM 2005 Conference, Miami, Florida, March 13-17, 2005.

``Exploiting content localities for efficient search in P2P systems", Proceedings of the 18th International Symposium on Distributed Computing (DISC 2004), Amsterdam, Netherlands, October 4 - 8, 2004.

``Building a large and efficient hybrid peer-to-peer Internet caching system" , IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 6, 2004.

``SAT-Match: a self-adaptive topology matching method to achieve low lookup latency in structured P2P overlay networks" , Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04), Santa Fe, New Mexico, April 26-30, 2004.

``Locality-aware topology matching in P2P systems" , Proceedings of IEEE INFOCOM'04, Hong Kong, March 7-11, 2004.

``Low cost and reliable mutual anonymity protocols in peer-to-peer networks", IEEE Transactions on Parallel and Distributed Systems, Vol. 14, No. 9, 2003.

``LighFlood: an efficient flooding scheme for file search in unstructured peer-to-peer systems", Proceedings of 2003 International Conference on Parallel Processing, (ICPP'03), Kaohsiung, Taiwan, China, October 6-9, 2003.

``Mutual anonymity protocols for hybrid peer-to-peer systems" , Proceedings of 23rd International Conference on Distributed Computing Systems, (ICDCS'03), Providence, Rhode Island, May 19-22, 2003.

``On reliable and scalable peer-to-peer web document sharing", Proceedings of 2002 International Parallel and Distributed Processing Symposium, (IPDPS'02), Fort Lauderdale, Florida, April 15-19, 2002.


Wireless and Sensor Networks

``PSM-Throttling: minimizing energy comsumption for bulk data communications in WLANs", Proceedings of the 15th International Conference on Network Protocols, (ICNP'07), Beijing, China, October 16-19, 2007.

``SCAP: Smart Caching in wireless Access Points to improve P2P streaming", Proceedings of the 27 International Conference on Distributed Computing Systems (ICDCS'07), Toronto, Canada, June 25-29, 2007.

``Design and Analysis of Sensing Scheduling Algorithms under Partial Coverage for Object Detection in Sensor Networks", IEEE Transactions on Parallel and Distributed Systems, Vol. 18, No. 3, 2007.

``Cooperative Relay Service in a Wireless LAN", IEEE Journal on Selected Areas in Communications, Vol. 25, No. 2, 2007.

``Exploiting idle communication power to improve wireless network performance and energy efficiency", Proceedings of INFOCOM'06, Barcelona, Spain, April 23-29, 2006.

``Design and analysis of wave sensing scheduling protocols for object-tracking applications", Proceedings of the First International Conference on Distributed Computing in Sensor Systems (DCOSS '05), Marina del Rey, California, June 30 - July 1, 2005.

``Analyzing object detection quality under probabilistic coverage in sensor networks", Proceedings of the 13th International Workshop on Quality of Service, (IWQoS'05), Passau, Germany, June 21 - 23, 2005.

``A study on object tracking quality under probabilistic coverage in sensor networks", a poster presentation in MobiCom'04, Philadelphia, Pennsylvania, September 26 to October 1, 2004; an extended abstract published in ACM Mobile Computing and Communication Review (MC2R), Vol. 9, No. 1, pp 73-76, January 2005.


Cluster Systems and Computing

``Automatic software fault diagnosis by exploiting application signatures" , Proceedings of 22nd USENIX Conference on Large Installation System Administra tion (LISA'08), San Siego, california, November 9-14, 2008. (Best Paper Award ).

``Adaptive memory allocations in clusters to handle unexpectedly large data-intensive jobs" , IEEE Transactions on Parallel and Distributed Systems, Vol. 15, No. 7, 2004.

``Auto-CFD: efficiently parallelizing CFD applications on clusters" Proceedings of IEEE International Confernece on Cluster Computing, (Cluster'03), December 1-4, 2003.

``Adaptive and virtual reconfigurations for dynamic job scheduling in clusters" , Proceedings of 22nd International Conference on Distributed Computing Systems, (ICDCS'02), Vienna, Austria, July 2-5, 2002.

``Dynamic cluster resource allocations for jobs with known and unknown memory demands", IEEE Transactions on Parallel and Distributed Systems, Vol. 13, No. 3, 2002.

``Dynamic load sharing with unknown memory demands in clusters", Proceedings of the 21st International Conference on Distributed Computing Systems, (ICDCS'2001), Phoenix, Arizona, April 16-19, 2001.

``Architectural effects of symmetric multiprocessors on TPC-C commercial workload", Journal on Parallel and Distributed Computing, Vol. 61, 2001.

``Memory hierarchy considerations for cost-effective cluster computing", IEEE Transactions on Computers, Vol. 49, No. 9, 2000.

``Incorporating job migration and network RAM to share memory resources", Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing, (HPDC-9), Pittsburgh, Pennsylvania, August 1-4, 2000.

``Effective Load Sharing on Heterogeneous Networks of Workstations", Proceedings of the 2000 International Parallel and Distributed Processing Symposium, (IPDPS'2000), Cancun, Mexico, May 1-5, 2000.

``Improving distributed workload performance by sharing both CPU and memory resources", Proceedings of the 20th International Conference on Distributed Computing Systems, (ICDCS'2000), Taipei, Taiwan, April 10-13, 2000.

``Analysis of commercial workload on SMP multiprocessors", Proceedings of Performance'99 August, 1999.

``Profit-effective parallel computing", IEEE Concurrency, Vol. 7, No. 2, 1999.

``The impact of memory hierarchies on cluster computing", Proceedings of 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing, (Second Merged Symposium IPPS/SPDP'99), April, 1999.

``Engineering workstations", Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Publishers, February, 1999.

``Performance models and simulation", Chapter 6, High Performance Cluster Computing, Volume 1, edited by R. Buyya, Prentice Hall, New Jersey, 1999.

``Comparative evaluation and case studies of shared-memory and data-parallel execution patterns", Scientific Programming, Vol. 7, No. 1, 1999.

``Characterizing and scheduling communication tasks of parallel and sequential jobs on networks of workstations", Computer Communications, Vol. 21, Issue. 5, 1998.

``An Integrated Approach of Performance Prediction on Networks of Workstations" , Chapter 4, Advanced Computer System Design, K. Bagchi, J. Walrand and G.Zobrist, Eds, Gordon and Breach Publishers, 1998.

``A comparative evaluation of hierarchical network architecture of the HP-Convex Exemplar", Proceedings of ICCD'97.

``Coordinating parallel processes on networks of workstations", Journal of Parallel and Distributed Computing, Vol. 46, No. 2, 1997.

``Effectively scheduling parallel tasks and communications on networks of workstations", Proceedings of Euro-Par'97.

``Simulation of heterogeneous networks of workstations" , Proceedings of MASCOTS'96, IEEE Computer Society Press, February, 1996.

``Software support for asynchronous computing across networks" , Proceedings of the 19th Annual International Computer Software and Application Conference , IEEE Computer Society Press, August, 1995.


Parallel Systems and Computing

``Lock Bypassing: an efficient algorithm for concurrently accessing priority heaps", ACM Journal on Experimental Algorithmics, Vol. 3, No. 3, 1998.

``Nova visualization for optimization of data-parallel programs", Proceedings of Euro-Par'97.

``Distributed edge detection: issues and implementations", IEEE Computational Science and Engineering, Spring Issue, 1997.

``Software support for multiprocessor latency measurement and evaluation", IEEE Transactions on Software Engineering , Vol. 23, No. 1, 1997.

``Adaptively scheduling parallel loops on distributed shared-memory systems", IEEE Transactions on Parallel and Distributed Systems, Vol. 8, No. 1, 1997.

``Semi-empirical multiprocessor performance predictions" , Journal of Parallel and Distributed Computing, Vol. 39, No. 1, 1996.

``An effective and practical performance prediction model for parallel computing on non-dedicated heterogeneous NOW" , Journal of Parallel and Distributed Computing, Vol. 38, No. 1, 1996.

``An adaptive loop scheduling algorithm on shared-memory systems" , Proceedings of the 8th Symposium on Parallel and Distributed Processing, IEEE Computer Society Press, October, 1996.

``Evaluating and designing software mutual exclusion algorithms on shared-memory multiprocessors" , IEEE Parallel & Distributed Technology, Spring Issue, 1996.

``A fast token-chasing mutual exclusion algorithm in arbitrary network topologies" , Journal of Parallel and Distributed Computing, Vol. 35, No. 2, 1996.

``Parallelizing FDTD Methods for Solving Electromagnetic Scattering Problems" , Applications on Advanced Architecture Computers, G. Astfalk Eds., SIAM Press, 1996.

``Comparative modeling and evaluation of CC-NUMA and COMA on hierarchical ring architectures", IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 12, 1995.

``Modeling and characterizing parallel computing performance on heterogeneous networks of workstations" , Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing, IEEE Computer Society Press, October, 1995.

``*Graph: a tool for visualizing communication and optimizing layout in data-parallel programs" , Proceedings of the 1995 International Conference on Parallel Processing, CRC Press, Vol. 2, August, 1995.

``Multiprocessor scalability predictions through detailed program execution analysis" , Proceedings of the 9th ACM International Conference on Supercomputing, ACM Press, July, 1995. (Best Paper Award ).

``Comparative performance analysis and evaluation of hot spots on network-based shared-memory architectures", IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 8, 1995.

``Parallelizing an oil refining simulation: numerical methods, implementations and experience", Parallel Computing , Vol. 21, No. 4, 1995.

"Distributed image edge detection methods and performance", Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing , IEEE Computer Society Press, October, 1994.

"Performance predictions on implicit communication systems", Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing , IEEE Computer Society Press, October, 1994.

``Distributed computation of electromagnetic scattering problems using finite-difference time-domain decompositions", Proceedings of the Third IEEE International Symposium on High-Performance Distributed Computing , IEEE Computer Society Press, August, 1994.

``Latency metric: an experimental method for measuring and evaluating parallel program and architecture scalability", Journal of Parallel and Distributed Computing , Vol. 22, No. 3, 1994.

``Measuring and analyzing parallel computing scalability", Proceedings of the 1994 International Conference of Parallel Processing , CRC Press, Vol. II, August, 1994.

``Comparative performance evaluation of spin-lock synchronization on MIN-based and HR-based multiprocessors", IEEE Parallel and Distributed Technology, Spring Issue, 1994.

``Computation and communication patterns of large-scale image convolutions on parallel architectures", Proceedings of the 8th International Parallel Processing Symposium, IEEE Computer Society Press, April, 1994.

``Evaluation and measurement of multiprocessor latency patterns", Proceedings of the 8th International Parallel Processing Symposium , IEEE Computer Society Press, April, 1994.

Tutorial on Multiprocessor Performance Measurement and Evaluation , IEEE Computer Society Press, 1994.

``Triangular decomposition methods for solving reducible nonlinear systems of equations", SIAM Journal on Optimization , Vol. 5. No. 2, 1994.


Find the hidden treasure, eh?!