Accelerating Big Data Processing with Hadoop and Memcached on Datacenters with Modern Networking and Storage Architecture

A Tutorial to be presented at The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014)
Dhabaleswar K. (DK) Panda and Xiaoyi Lu (The Ohio State University)

When: February 16, 2014 (1:30-5:00pm)
Where: Orlando, Florida, USA


Apache Hadoop is gaining prominence in handling Big Data and analytics. Similarly, Memcached in Web 2.0 environment is becoming important for large-scale query processing. These middleware are traditionally written with sockets and do not deliver best performance on datacenters with modern high performance networks. In this tutorial, we will provide an in-depth overview of the architecture of Hadoop components (HDFS, MapReduce, RPC, HBase, etc.) and Memcached. We will examine the challenges in re-designing the networking and I/O components of these middleware with modern interconnects, protocols (such as InfiniBand, iWARP, RoCE, and RSocket) with RDMA and storage architecture. Using the publicly available Hadoop-RDMA ( software package, we will provide case studies of the new designs for several Hadoop components and their associated benefits. Through these case studies, we will also examine the interplay between high performance interconnects, storage systems (HDD and SSD), and multi-core platforms to achieve the best solutions for these components.

Targeted Audience and Scope

The tutorial content is planned for half-a-day. This tutorial is targeted for various categories of people working in the areas of Big Data including high-performance Hadoop, high performance communication and I/O architecture, storage, networking, middleware, cloud computing and applications. Specific audience this tutorial is aimed at include: The content level will be as follows: 30% beginner, 40% intermediate, and 30% advanced. There is no fixed pre-requisite. As long as the attendee has a general knowledge in Big Data, Hadoop, high performance computing, networking and storage architecture, and related issues, he/she will be able to understand and appreciate it. The tutorial is designed in such a way that an attendee gets exposed to the topics in a smooth and progressive manner.

Outline of the Tutorial

Brief Biography of Speakers

Dr. Dhabaleswar K. (DK) Panda is a Professor of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance computing, communication protocols, files systems, network-based computing, and Quality of Service. He has published over 300 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, HSE and RDMA over Converged Enhanced Ethernet (RoCE). His research group is currently collaborating with National Laboratories and leading InfiniBand and 10GigE/iWARP companies on designing various subsystems of next generation high-end systems. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group, are currently being used by more than 2,100 organizations worldwide (in 71 countries). This software has enabled several InfiniBand clusters (including the 7th one) to get into the latest TOP500 ranking. These software packages are also available with the Open Fabrics stack for network vendors (InfiniBand and iWARP), server vendors and Linux distributors. The new RDMA-enabled Apache Hadoop package, consisting of acceleration for HDFS, MapReduce and RPC, is publicly available from Dr. Panda's research is supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Dr. Panda, including a comprehensive CV and publications are available here.

Dr. Xiaoyi Lu is a postdoctoral researcher in the Department of Computer Science and Engineering at the Ohio State University, USA. He received the Ph.D. degree in Computer Science from Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. His current research interests include high performance interconnects and protocols, Big Data, Hadoop Ecosystem, and Parallel Computing Models (MPI/PGAS). He has published over 20 papers in major journals and international conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Recently, Dr. Lu is doing research and working on design and development for the high performance Hadoop-RDMA software package ( He is a member of IEEE. More details about Dr. Lu are available here.
Last Updated: January 09, 2014