Accelerating Big Data Processing and Associated Deep Learning on Datacenters and HPC Clouds with Modern Architectures
When: February 25, 2018
Where: Vosendorf, Austria
Abstract
The convergence of HPC, Big Data, and Deep Learning is becoming the next
game-changing business opportunity. Apache Hadoop, Spark, gRPC/TensorFlow, and
Memcached are becoming standard building blocks in handling Big Data oriented
processing and mining. Modern HPC bare-metal systems and Cloud Computing
platforms have been fueled with the advances in multi-/many-core architectures,
RDMA-enabled networking, NVRAMs, and NVMe-SSDs during the last decade.
However, Big Data and Deep Learning middleware (such as Hadoop, Spark, Flink,
and gRPC) have not embraced such technologies fully. Recent studies have shown
that default designs of these components can not efficiently leverage the
features of modern HPC clusters, like Remote Direct Memory Access (RDMA)
enabled high-performance interconnects, high-throughput parallel storage
systems (e.g. Lustre), Non-Volatile Memory (NVM). In this tutorial, we will
provide an in-depth overview of the architecture of Hadoop, Spark,
gRPC/TensorFlow, and Memcached. We will examine the challenges in re-designing
networking and I/O components of these middleware with modern interconnects,
protocols (such as InfiniBand, RoCE) and storage architectures. Using the
publicly available software packages in the High-Performance Big Data project
(HiBD, http://hibd.cse.ohio-state.edu), we will provide case studies of the new
designs for several Hadoop/Spark/gRPC/TensorFlow/Memcached components and their
associated benefits. Through these, we will also examine the interplay between
high-performance interconnects, storage (HDD, NVM, and SSD), and multi-core
platforms (e.g., Xeon x86, OpenPOWER) to achieve the best solutions for these
components and applications on modern HPC clusters and clouds. We also present
in-depth case-studies with modern Deep Learning tools (e.g., Caffe, TensorFlow,
DL4J, BigDL) with RDMA-enabled Hadoop, Spark, and gRPC.
Targeted Audience and Scope
This tutorial is targeted for various categories of people working in
the areas of Big Data processing, Deep Learning, Cloud Computing, and HPC
on modern datacenters and HPC Clouds with high-performance networking and storage architectures.
Specific audience this tutorial is aimed at include:
- Scientists, engineers, researchers, and students engaged in designing
next-generation Big Data and Deep Learning systems and applications over high-performance networking and storage architectures
- Designers and developers of Big Data, Deep Learning, Cloud Computing, Hadoop, Spark,
Memcached, gRPC, and TensorFlow middleware
- Newcomers to the field of Big Data processing and Deep Learning on modern datacenters and HPC Clouds who are interested in familiarizing themselves with Hadoop, Spark, Memcached, gRPC, TensorFlow, RDMA, SR-IOV, Virtualization, high-performance networking and storage
- Managers and administrators responsible for setting-up next
generation Big Data and Deep Learning environment and modern high-end systems/facilities
in their organizations/laboratories
The content level will be as follows: 30% beginner, 40% intermediate, and 30%
advanced. There is no fixed prerequisite. As long as the attendee has a
general knowledge in Big Data, Deep Learning, Hadoop, Spark, Memcached, gRPC,
TensorFlow, high performance computing, Cloud Computing, networking and
storage architectures, he/she will be able to understand and appreciate it. The
tutorial is designed in such a way that an attendee gets exposed to the topics
in a smooth and progressive manner. This tutorial is organized as a coherent
talk to cover multiple topics.
Outline of the Tutorial
- Introduction to Big Data and Associated Deep Learning Applications and Technologies
- Overview of MapReduce and Resilient Distributed Datasets (RDD) Programming Models
- Architecture Overview of Apache Hadoop, Spark, gRPC, TensorFlow, and Memcached
- MapReduce (V1 and YARN), HDFS, Spark, HBase
- gRPC, TensorFlow
- Memcached
- Overview of High-Performance Interconnects, Protocols, and Storage Architectures for Modern Datacenters
- InfiniBand and RDMA
- 10/40 GigE, iWARP and RoCE technologies
- SSD/NVM-based storage and Lustre parallel filesystem
- Challenges in Accelerating Hadoop, Spark, gRPC/TensorFlow, and Memcached on Modern Networking and Storage Architectures
- Overview of Benchmarks and Applications using Hadoop, Spark, gRPC/TensorFlow, and Memcached
- RDMA-based Acceleration Case Studies and In-Depth Performance Evaluation
- Hadoop (HDFS, MapReduce, HBase) and Spark over InfiniBand with RDMA and Heterogeneous Storage (RAMDisk, SSD, HDD, and Lustre)
- gRPC and TensorFlow over InfiniBand with RDMA and Heterogeneous Storage (RAMDisk, SSD, HDD, and Lustre)
- Memcached over InfiniBand with RDMA and Heterogeneous Storage (RAMDisk, SSD, and Lustre)
- The High-Performance Big Data (HiBD) Project and Associated Releases
- Ongoing and Other Activities for Accelerating Big Data Applications
- Advanced Acceleration Case Studies
- Hadoop (HDFS and MapReduce) over InfiniBand with RDMA and NVM
- Online Erasure Coding for HDFS and Memcached
- MR-Advisor for Performance Tuning
- Deep Learning Tools (such as Caffe, TensorFlow, DL4J, BigDL) over RDMA-Hadoop and RDMA-Spark
- Big Data Processing over HPC Cloud
- Big Data Processing over OpenPOWER Architecture
- Conclusion and Q&A
Brief Biography of Speakers
Dr. Dhabaleswar
K. (DK) Panda is a Professor of Computer Science at the Ohio State
University. He obtained his Ph.D. in computer engineering from the University
of Southern California. His research interests include parallel computer
architecture, high performance computing, communication protocols, files
systems, network-based computing, and Quality of Service. He has published over
400 papers in major journals and international conferences related to these
research areas. Dr. Panda and his research group members have been doing
extensive research on modern networking technologies including InfiniBand, HSE
and RDMA over Converged Enhanced Ethernet (RoCE). His research group is
currently collaborating with National Laboratories and leading InfiniBand and
10GigE/iWARP companies on designing various subsystems of next generation
high-end systems. The MVAPICH2
(High Performance MPI over InfiniBand, iWARP and RoCE) open-source software
package, developed by his research group, are currently being used by more than
2,800 organizations worldwide (in 85 countries). This software has enabled
several InfiniBand clusters (including the 1st one) to get into the latest
TOP500 ranking. These software packages are also available with the Open
Fabrics stack for network vendors (InfiniBand and iWARP), server vendors and
Linux distributors. The new RDMA-enabled Apache Hadoop and Memcached packages, consisting of
acceleration for HDFS, MapReduce, RPC and Memcached, are publicly available from
http://hibd.cse.ohio-state.edu. Dr. Panda's research is supported
by funding from US National Science Foundation, US Department of Energy, and
several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and
NetApp. He is an IEEE Fellow and a member of ACM. More details about Dr.
Panda, including a comprehensive CV and publications are available here.
Dr. Xiaoyi Lu
Dr. Xiaoyi Lu is a Research Scientist in the Department of Computer Science and
Engineering at the Ohio State University, USA. His current research interests
include high performance interconnects and protocols, Big Data,
Hadoop/Spark/Memcached Ecosystem, Parallel Computing Models (MPI/PGAS),
Virtualization, Cloud Computing, and Deep Learning. He has published over 80
papers in International journals and conferences related to these research
areas. He has been actively involved in various professional activities (PC
Co-Chair, PC Member, Reviewer, Session Chair) in academic journals and
conferences. Recently, Dr. Lu is leading the research and development of
RDMA-based accelerations for Apache Hadoop, Spark, HBase, and Memcached, and
OSU HiBD micro-benchmarks, which are publicly available from
(http://hibd.cse.ohio-state.edu).
These libraries are currently being used by
more than 245 organizations from 31 countries. More than 23,150 downloads of
these libraries have taken place from the project site. He is a core member of
the MVAPICH2 (High-Performance MPI over InfiniBand, Omni-Path, Ethernet/iWARP,
and RoCE) project and he is leading the research and development of
MVAPICH2-Virt (high-performance and scalable MPI for hypervisor and container
based HPC cloud). He is a member of IEEE and ACM. More details about Dr. Lu are
available at here.
Last Updated: Sep. 11, 2017