PGAS and Hybrid MPI+PGAS Programming Models on Modern HPC Clusters

A Tutorial to be presented at The 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP-2015)
by
Dhabaleswar K. (DK) Panda and Khaled Hamidouche (The Ohio State University)

When: February 08, 2015 (8:30am-12:00noon)
Where: Mariott San Francisco Airport, California, USA

Abstract

Multi-core processors, accelerators (GPGPUs), coprocessors (Xeon Phis) and high-performance interconnects (Infini- Band, 10 GigE/iWARP and RoCE) with RDMA support are shaping the architectures for next generation clusters. Efficient programming models to design applications on these clusters as well as on future exascale systems are still evolving. Parti- tioned Global Address Space (PGAS) Models provide an attractive alternative to the traditional Message Passing Interface (MPI) model owing to their easy to use global shared memory abstractions and light-weight one-sided communication. Hybrid MPI+PGAS programming models are gaining attention as a possible solution to programming exascale systems. These hybrid models help the transition of codes designed using MPI to take advantage of PGAS models without paying the prohibitive cost of re-designing complete applications. They also enable hierarchical design of applications using the different models to suite modern architectures. In this tutorial, we provide an overview of the research and development taking place along these directions and discuss associated opportunities and challenges as we head toward exascale. We start with an in-depth overview of modern system architectures with multi-core processors, GPU accelerators, Xeon Phi coprocessors and high-performance interconnects. We present an overview of language based and library based PGAS models with focus on two popular models - UPC and OpenSHMEM. We introduce MPI+PGAS hybrid programming models and highlight the advantages and challenges of designing a unified runtime to support them. We examine the chal- lenges in designing high-performance UPC, OpenSHMEM and unified MPI+UPC/OpenSHMEM runtimes. We present case-studies using application kernels, to demonstrate how one can exploit hybrid MPI+PGAS programming models to achieve better performance without rewriting the complete code. Finally, we presents the new challenges and designs to support MPI+PGAS on GPU and MIC based systems. Using the publicly available MVAPICH-2-X software package, we provide concrete case studies and in-depth evaluation of run- time and applications-level designs that are targeted for modern systems architectures with multi-core processors, GPUs, Xeon Phis and high-performance interconnects.

Targeted Audience and Scope

This tutorial is targeted for various categories of people working in the areas of PGAS and MPI programming models, high performance communication and I/O, networking, middleware, exascale computing and applications. Specific audience this tutorial is aimed at include:

Designers, developers and users of parallel programming models (MPI and PGAS)
Scientists, engineers, researchers and students engaged in designing next-generation HPC systems and applications
Newcomers to the field of HPC and exascale computing who are interested in familiarizing themselves with programming models, accelerators, networking, and RDMA
Managers and administrators responsible for setting-up next generation HPC environment and high-end systems/facilities in their organizations/laboratories

The content level will be as follows: 30% beginner, 40% intermediate, and 30% advanced. There is no fixed pre-requisite. As long as the attendee has a general knowledge in high performance computing, networking, programming models, parallel applications, and related issues, he/she will be able to understand and appreciate it. The tutorial is designed in such a way that an attendee gets exposed to the topics in a smooth and progressive manner.

Outline of the Tutorial

Overview of the Modern HPC System Architectures

Multi-core Processors
High Performance Interconnects (InfiniBand, 10GigE/iWARP and RDMA over Converged Enhanced Ethernet (RoCE))
Heterogeneity with Accelerators (GPUs) and Coprocessors (Xeon Phis)

Introduction to Partitioned Global Address Space Models

Language-based Models: Case Study with UPC
Library-based Models: Case Study with OpenSHMEM

Overview of MPI+PGAS Hybrid Programming Models and Benefits
Designing Scalable and High Performance Support for PGAS and Hybrid MPI+PGAS Models on Modern Clusters
Application-level Case Studies for using Hybrid MPI+PGAS Models
PGAS Models and Runtimes for Clusters with Accelerators
Conclusion and Q&A

Brief Biography of Speakers

Dr. Dhabaleswar K. (DK) Panda is a Professor of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance computing, communication protocols, files systems, network-based computing, and Quality of Service. He has published over 300 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, HSE and RDMA over Converged Enhanced Ethernet (RoCE). His research group is currently collaborating with National Laboratories and leading InfiniBand and 10GigE/iWARP companies on designing various subsystems of next generation high-end systems. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM and UPC)) software packages, developed by his research group, are currently being used by more than 2,300 organizations worldwide (in 75 countries). This software has enabled several InfiniBand clusters (including the 7th one) to get into the latest TOP500 ranking. These software packages are also available with the Open Fabrics stack for network vendors (InfiniBand and iWARP), server vendors and Linux distributors. Dr. Panda's research is supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Dr. Panda, including a comprehensive CV and publications are available here.

Khaled Hamidouche is a Senior Research Associate in the Department of Computer Science and Engineering at The Ohio State University. He is a member of the Network-Based Computing Laboratory lead by Dr. D.K.Panda. His research interests include high-performance interconnects, parallel programming models, accelerator computing and high-end computing applications. His current focus is on designing high performance unified MPI, PGAS and hybrid MPI+PGAS runtimes for InfiniBand clusters and their support for accelerators. Khaled is involved in the design and development of the popular MVAPICH2 library and its derivatives MVAPICH2-MIC, MVAPICH2-GDR and MVAPICH2-X.

Last Updated: February 8, 2015