PGAS and Hybrid MPI+PGAS Programming Models on Modern HPC Clusters
When: February 08, 2015 (8:30am-12:00noon)
Where: Mariott San Francisco Airport, California, USA
Abstract
Multi-core processors, accelerators (GPGPUs), coprocessors (Xeon Phis)
and high-performance interconnects (Infini- Band, 10 GigE/iWARP and
RoCE) with RDMA support are shaping the architectures for next
generation clusters. Efficient programming models to design
applications on these clusters as well as on future exascale systems
are still evolving. Parti- tioned Global Address Space (PGAS) Models
provide an attractive alternative to the traditional Message Passing
Interface (MPI) model owing to their easy to use global shared memory
abstractions and light-weight one-sided communication. Hybrid MPI+PGAS
programming models are gaining attention as a possible solution to
programming exascale systems. These hybrid models help the transition
of codes designed using MPI to take advantage of PGAS models without
paying the prohibitive cost of re-designing complete
applications. They also enable hierarchical design of applications
using the different models to suite modern architectures. In this
tutorial, we provide an overview of the research and development
taking place along these directions and discuss associated
opportunities and challenges as we head toward exascale. We start with
an in-depth overview of modern system architectures with multi-core
processors, GPU accelerators, Xeon Phi coprocessors and
high-performance interconnects. We present an overview of language
based and library based PGAS models with focus on two popular models -
UPC and OpenSHMEM. We introduce MPI+PGAS hybrid programming models and
highlight the advantages and challenges of designing a unified runtime
to support them. We examine the chal- lenges in designing
high-performance UPC, OpenSHMEM and unified MPI+UPC/OpenSHMEM
runtimes. We present case-studies using application kernels, to
demonstrate how one can exploit hybrid MPI+PGAS programming models to
achieve better performance without rewriting the complete
code. Finally, we presents the new challenges and designs to support
MPI+PGAS on GPU and MIC based systems. Using the publicly available
MVAPICH-2-X software package, we provide
concrete case studies and in-depth evaluation of run- time and
applications-level designs that are targeted for modern systems
architectures with multi-core processors, GPUs, Xeon Phis and
high-performance interconnects.
Targeted Audience and Scope
This tutorial is targeted for various categories of people working in
the areas of PGAS and MPI programming models, high performance
communication and I/O, networking, middleware, exascale computing and
applications. Specific audience this tutorial is aimed at include:
- Designers, developers and users of parallel programming models (MPI and PGAS)
- Scientists, engineers, researchers and students engaged in designing next-generation HPC systems and applications
- Newcomers to the field of HPC and exascale computing who are interested in familiarizing themselves with programming models, accelerators, networking, and RDMA
- Managers and administrators responsible for setting-up next generation HPC environment and high-end systems/facilities in their organizations/laboratories
The content level will be as follows: 30% beginner, 40% intermediate,
and 30% advanced. There is no fixed pre-requisite. As long as the
attendee has a general knowledge in high performance computing,
networking, programming models, parallel applications, and related
issues, he/she will be able to understand and appreciate it. The
tutorial is designed in such a way that an attendee gets exposed to
the topics in a smooth and progressive manner.
Outline of the Tutorial
- Overview of the Modern HPC System Architectures
- Multi-core Processors
- High Performance Interconnects (InfiniBand, 10GigE/iWARP and
RDMA over Converged Enhanced Ethernet (RoCE))
- Heterogeneity with Accelerators (GPUs) and Coprocessors (Xeon Phis)
- Introduction to Partitioned Global Address Space Models
- Language-based Models: Case Study with UPC
- Library-based Models: Case Study with OpenSHMEM
- Overview of MPI+PGAS Hybrid Programming Models and Benefits
- Designing Scalable and High Performance Support for PGAS and Hybrid MPI+PGAS Models on Modern Clusters
- Application-level Case Studies for using Hybrid MPI+PGAS Models
- PGAS Models and Runtimes for Clusters with Accelerators
- Conclusion and Q&A
Brief Biography of Speakers
Dr. Dhabaleswar K. (DK)
Panda is a Professor of Computer Science at the Ohio State
University. He obtained his Ph.D. in computer engineering from the
University of Southern California. His research interests include
parallel computer architecture, high performance computing,
communication protocols, files systems, network-based computing, and
Quality of Service. He has published over 300 papers in major journals
and international conferences related to these research
areas. Dr. Panda and his research group members have been doing
extensive research on modern networking technologies including
InfiniBand, HSE and RDMA over Converged Enhanced Ethernet (RoCE). His
research group is currently collaborating with National Laboratories
and leading InfiniBand and 10GigE/iWARP companies on designing various
subsystems of next generation high-end systems. The MVAPICH2 (High Performance
MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X (Hybrid MPI
and PGAS (OpenSHMEM and UPC)) software packages, developed by his
research group, are currently being used by more than 2,300
organizations worldwide (in 75 countries). This software has enabled
several InfiniBand clusters (including the 7th one) to get into the
latest TOP500 ranking. These software packages are also available with
the Open Fabrics stack for network vendors (InfiniBand and iWARP),
server vendors and Linux distributors. Dr. Panda's research is
supported by funding from US National Science Foundation, US
Department of Energy, and several industry including Intel, Cisco,
SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a
member of ACM. More details about Dr. Panda, including a
comprehensive CV and publications are available
here.
Khaled
Hamidouche is a Senior Research Associate in the
Department of Computer Science and Engineering at The Ohio
State University. He is a member of the Network-Based
Computing Laboratory lead by Dr. D.K.Panda. His research
interests include high-performance interconnects, parallel
programming models, accelerator computing and high-end
computing applications. His current focus is on designing high
performance unified MPI, PGAS and hybrid MPI+PGAS runtimes for
InfiniBand clusters and their support for accelerators. Khaled
is involved in the design and development of the popular
MVAPICH2 library and its derivatives MVAPICH2-MIC,
MVAPICH2-GDR and MVAPICH2-X.
Last Updated: February 8, 2015