Accelerating Big Data Processing and Deep Learning

Accelerating Big Data Processing and Associated Deep Learning on Datacenters and HPC Clouds with Modern Architectures

A Tutorial to be presented at The 24th IEEE International Symposium On High Performance Computer Architecture (HPCA-2018)
by
Dhabaleswar K. (DK) Panda and Xiaoyi Lu (The Ohio State University)

When: February 25, 2018
Where: Vosendorf, Austria

Abstract

The convergence of HPC, Big Data, and Deep Learning is becoming the next game-changing business opportunity. Apache Hadoop, Spark, gRPC/TensorFlow, and Memcached are becoming standard building blocks in handling Big Data oriented processing and mining. Modern HPC bare-metal systems and Cloud Computing platforms have been fueled with the advances in multi-/many-core architectures, RDMA-enabled networking, NVRAMs, and NVMe-SSDs during the last decade. However, Big Data and Deep Learning middleware (such as Hadoop, Spark, Flink, and gRPC) have not embraced such technologies fully. Recent studies have shown that default designs of these components can not efficiently leverage the features of modern HPC clusters, like Remote Direct Memory Access (RDMA) enabled high-performance interconnects, high-throughput parallel storage systems (e.g. Lustre), Non-Volatile Memory (NVM). In this tutorial, we will provide an in-depth overview of the architecture of Hadoop, Spark, gRPC/TensorFlow, and Memcached. We will examine the challenges in re-designing networking and I/O components of these middleware with modern interconnects, protocols (such as InfiniBand, RoCE) and storage architectures. Using the publicly available software packages in the High-Performance Big Data project (HiBD, http://hibd.cse.ohio-state.edu), we will provide case studies of the new designs for several Hadoop/Spark/gRPC/TensorFlow/Memcached components and their associated benefits. Through these, we will also examine the interplay between high-performance interconnects, storage (HDD, NVM, and SSD), and multi-core platforms (e.g., Xeon x86, OpenPOWER) to achieve the best solutions for these components and applications on modern HPC clusters and clouds. We also present in-depth case-studies with modern Deep Learning tools (e.g., Caffe, TensorFlow, DL4J, BigDL) with RDMA-enabled Hadoop, Spark, and gRPC.

Targeted Audience and Scope

This tutorial is targeted for various categories of people working in the areas of Big Data processing, Deep Learning, Cloud Computing, and HPC on modern datacenters and HPC Clouds with high-performance networking and storage architectures. Specific audience this tutorial is aimed at include:

Scientists, engineers, researchers, and students engaged in designing next-generation Big Data and Deep Learning systems and applications over high-performance networking and storage architectures
Designers and developers of Big Data, Deep Learning, Cloud Computing, Hadoop, Spark, Memcached, gRPC, and TensorFlow middleware
Newcomers to the field of Big Data processing and Deep Learning on modern datacenters and HPC Clouds who are interested in familiarizing themselves with Hadoop, Spark, Memcached, gRPC, TensorFlow, RDMA, SR-IOV, Virtualization, high-performance networking and storage
Managers and administrators responsible for setting-up next generation Big Data and Deep Learning environment and modern high-end systems/facilities in their organizations/laboratories

The content level will be as follows: 30% beginner, 40% intermediate, and 30% advanced. There is no fixed prerequisite. As long as the attendee has a general knowledge in Big Data, Deep Learning, Hadoop, Spark, Memcached, gRPC, TensorFlow, high performance computing, Cloud Computing, networking and storage architectures, he/she will be able to understand and appreciate it. The tutorial is designed in such a way that an attendee gets exposed to the topics in a smooth and progressive manner. This tutorial is organized as a coherent talk to cover multiple topics.

Outline of the Tutorial

Introduction to Big Data and Associated Deep Learning Applications and Technologies
Overview of MapReduce and Resilient Distributed Datasets (RDD) Programming Models
Architecture Overview of Apache Hadoop, Spark, gRPC, TensorFlow, and Memcached

MapReduce (V1 and YARN), HDFS, Spark, HBase
gRPC, TensorFlow
Memcached

Overview of High-Performance Interconnects, Protocols, and Storage Architectures for Modern Datacenters

InfiniBand and RDMA
10/40 GigE, iWARP and RoCE technologies
SSD/NVM-based storage and Lustre parallel filesystem

Challenges in Accelerating Hadoop, Spark, gRPC/TensorFlow, and Memcached on Modern Networking and Storage Architectures
Overview of Benchmarks and Applications using Hadoop, Spark, gRPC/TensorFlow, and Memcached
RDMA-based Acceleration Case Studies and In-Depth Performance Evaluation
- Hadoop (HDFS, MapReduce, HBase) and Spark over InfiniBand with RDMA and Heterogeneous Storage (RAMDisk, SSD, HDD, and Lustre)
- gRPC and TensorFlow over InfiniBand with RDMA and Heterogeneous Storage (RAMDisk, SSD, HDD, and Lustre)
- Memcached over InfiniBand with RDMA and Heterogeneous Storage (RAMDisk, SSD, and Lustre)
The High-Performance Big Data (HiBD) Project and Associated Releases
Ongoing and Other Activities for Accelerating Big Data Applications
Advanced Acceleration Case Studies
- Hadoop (HDFS and MapReduce) over InfiniBand with RDMA and NVM
- Online Erasure Coding for HDFS and Memcached
- MR-Advisor for Performance Tuning
- Deep Learning Tools (such as Caffe, TensorFlow, DL4J, BigDL) over RDMA-Hadoop and RDMA-Spark
- Big Data Processing over HPC Cloud
- Big Data Processing over OpenPOWER Architecture
Conclusion and Q&A

Brief Biography of Speakers

Dr. Dhabaleswar K. (DK) Panda is a Professor of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance computing, communication protocols, files systems, network-based computing, and Quality of Service. He has published over 400 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, HSE and RDMA over Converged Enhanced Ethernet (RoCE). His research group is currently collaborating with National Laboratories and leading InfiniBand and 10GigE/iWARP companies on designing various subsystems of next generation high-end systems. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group, are currently being used by more than 2,800 organizations worldwide (in 85 countries). This software has enabled several InfiniBand clusters (including the 1st one) to get into the latest TOP500 ranking. These software packages are also available with the Open Fabrics stack for network vendors (InfiniBand and iWARP), server vendors and Linux distributors. The new RDMA-enabled Apache Hadoop and Memcached packages, consisting of acceleration for HDFS, MapReduce, RPC and Memcached, are publicly available from http://hibd.cse.ohio-state.edu. Dr. Panda's research is supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Dr. Panda, including a comprehensive CV and publications are available here.

Dr. Xiaoyi Lu Dr. Xiaoyi Lu is a Research Scientist in the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, Big Data, Hadoop/Spark/Memcached Ecosystem, Parallel Computing Models (MPI/PGAS), Virtualization, Cloud Computing, and Deep Learning. He has published over 80 papers in International journals and conferences related to these research areas. He has been actively involved in various professional activities (PC Co-Chair, PC Member, Reviewer, Session Chair) in academic journals and conferences. Recently, Dr. Lu is leading the research and development of RDMA-based accelerations for Apache Hadoop, Spark, HBase, and Memcached, and OSU HiBD micro-benchmarks, which are publicly available from (http://hibd.cse.ohio-state.edu). These libraries are currently being used by more than 245 organizations from 31 countries. More than 23,150 downloads of these libraries have taken place from the project site. He is a core member of the MVAPICH2 (High-Performance MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE) project and he is leading the research and development of MVAPICH2-Virt (high-performance and scalable MPI for hypervisor and container based HPC cloud). He is a member of IEEE and ACM. More details about Dr. Lu are available at here.