High-Performance Computing

This is collaborative work with Prof. Sadayappan and his group at the University of Utah. We are interested in compile-time and run-time analysis for the purposes of performance optimization. We are also considering programming abstractions and software productivity tools for programmers that are building high-performance computing applications. Some of this work was funded by NSF grants CCF-2216903 (5-year project, started July 2022, $5M total funding across all participating institutions), CCF-2118737 (planning for CCF-2216903), CCF-0926127, CCF-0811781, CNS-0509467, and ACI-1404995, as well as DARPA and the Department of Energy.

Current areas of interest are:

Efficient, scalable, and performance-portable tensor applications (CCF-2216903)

Computations on tensors are fundamental to many large-scale parallel software applications in scientific computing and machine learning. This project brings together researchers with expertise spanning the algorithm/software/hardware stack, aiming at the following impacts: (1) improved performance and energy efficiency of hardware architectures through algorithm-architecture co-design for tensor computations; (2) increased developer productivity for software applications using tensors, together with higher performance achieved on a variety of target platforms; (3) advances in scalable machine-learning and scientific computing applications.

Specific research contributions are along multiple directions: (1) compiler optimization: powerful unified methodology for automated optimization of dense tensor computations, based on non-linear cost models for multi-level hyper-rectangular tiled execution on a range of target computing platforms; (2) scalability with sparsity: multi-level blocking methodology to enhance scalability of sparse-tensor computations, based on analysis of the intrinsic sparsity patterns of the data and the corresponding data-reuse patterns; (3) algorithm-architecture co-design: by leveraging new cost models, development of powerful and general new approaches for hardware-software co-design of accelerators for dense- and sparse-tensor computations; (4) correctness and accuracy: development of techniques to ensure correctness and floating-point accuracy with compiler transformations and compiler/hardware design-space exploration; (5) applications: use of the developed methodology and tools to advance cutting-edge applications in machine learning and scientific computing, including PDE solvers, quantum many-body simulation, tensor networks in machine learning, and large-scale image analysis.

Publications from this project (CCF-2216903 and its precursor CCF-2118737): SC23, CC22, CGO22

Parallelization and code generation

We are investigating novel techniques for automatic parallelization and code generation, motivated by modern parallel architectures [SPAA21, PLDI21, ASPLOS21, SC19, CGO19, IPDPS19, ProcIEEE18, SC18, PPoPP18, PACT16, PPoPP15, WOLFHPC15, TOPC14, TACO13, SC12, PLDI12, HiPC11, ICSM10, PACT09, PPoPP09, ICS08, CC08-2, PPoPP08, SC08, PLDI07, SC06, ICCS06]. The ubiquity of multi-core processors and GPUs has brought parallel computing squarely into the mainstream. Unlike the past, when the development of parallel programs was primarily a task undertaken by small cadre of expert programmers, it is now essential to develop parallel implementations of a large number of existing sequential programs.

main page