A PDF copy of the advance program is available here.

8:50 - 9:00

Opening Remarks   Slides

9:00 - 10:00

Keynote I: Fusing HPC and Big Data - Experiences with Design, Deployment and Usage of The Wrangler System at the Texas Advanced Computing Center   Slides

Speaker: Dan Stanzione, Texas Advanced Computing Center

Abstract: Traditional cluster computing has always targeted the problems of data that are very large, as some of the largest datasets ever produced have been generated by clustered supercomputers and their associated parallel filesystems. But recent developments in technology and research practice known as "Big Data" address a class of problems that don’t just have very large datasets, but have I/O requirements that traditional clusters are ill-suited to address. The Wrangler project at the Texas Advanced Computing Center is an ongoing attempt to rethink cluster computing around the needs of Big Data problems. Wrangler features a unique NAND flash-based storage system from DSSD, providing up to 1 TB/s bandwidth and a very high transaction rate for random accesses as a shared resource across the cluster. Rather than achieving these performance targets through massive scale, Wrangler reaches these performance targets with less than 100 nodes. As important as performance is flexibility - data on Wrangler can be a traditional file system, object store, relational store, or even Hadoop file system. Wrangler is now operational, and this new architecture is setting new performance benchmarks for applications that never quite “fit” on traditional HPC systems. This talk will provide an overview of the Wrangler project, the evolution of big data at TACC that led us to the Wrangler architecture, and the experiences of early users through the first few months of operations. Wrangler is supported by a grant from the US National Science Foundation (NSF).

10:00 - 10:30

Coffee Break

10:30 - 11:30

Session I: High-Performance Big Data Systems

Session Chair: Xiaoyi Lu, The Ohio State University

A Scalable Distributed Private Stream Search System   Slides

Peng Zhang (Institute of Information Engineering, Chinese Academy of Sciences; National Engineering Laboratory for Information Security Technologies), Yan Li (National Computer Network Emergency Response Technical Team Beijing, China), Qingyun Liu (Institute of Information Engineering, Chinese Academy of Sciences; National Engineering Laboratory for Information Security Technologies), and Hailun Lin (Institute of Information Engineering, Chinese Academy of Sciences)

KTV-TREE: Interactive Top-K Aggregation on Dynamic Large Dataset in Cloud   Slides

Yuzhe Tang (Syracuse University), Ling Liu (Georgia Tech), Junichi Tatemura (NEC Labs America), and Hakan Hacigumus (NEC Labs America)

11:30 - 12:15

Invited Talk I: Understanding Big Data Workloads on Modern Processors using BigDataBench   Slides

Speaker: Jianfeng Zhan, Institute of Computing Technology, Chinese Academy of Sciences, China

Abstract: BigDataBench is an open-source big data benchmark suite, and the current version 3.1 includes diverse data sets and workloads from five application domains: search engine, social networks, e-commerce, multimedia analytics, and bioinformatics. This talk presents the workload characterization of BigDataBench on modern processors. We found big data workloads have several subclasses of workloads, and exhibit disparate behaviors, e.g. IPC and pipeline front end stall. Our correlation analysis indicated that even though a part of big data analytics workloads own notable pipeline front end stalls, the main factors affecting the CPI performance are long latency data accesses rather than high front end stalls. Also, our evaluation shows the wimpy-core processor does not suit big data analytics workloads in most situations.

12:15 - 13:15

Lunch Break

13:15 - 14:15

Keynote II: Efficiency + Scalability = High-Performance Big Data Computing  Slides

Speaker: Zhiwei Xu, Institute of Computing Technology, Chinese Academy of Sciences, China

Abstract: We are entering a ZB data era, which needs scalable and efficient capabilities of sensing, communicating, and processing big data. In the past decade, great strides were made in scalability, where map-reduce based systems offer good examples. However, efficiency is still low for big data systems, at less than 1 Giga operation per Joule. In this talk, we first argue that the research community should set a bold efficiency goal of 1 Tera operation per Joule. We then present some encouraging initial results towards this objective from three directions of research: functional sensing, elastic processing, and high-performance data computing.

14:15 - 15:00

Invited Talk II: Benchmarking Big Data Systems   Slides

Speaker: Raghunath Nambiar, Cisco

Abstract: Benchmarking standards matter for end-users, vendors and researchers, from fair comparisons of technologies and products to drive innovations. This session will cover some of the defining characteristics and recent developments in the area of performance evaluation and benchmarking of Big Data Systems.

15:00 – 15:30

Coffee Break

15:30 – 16:30

Session II: Performance Studies of Big Data Systems and Applications

Session Chair: Xiaoyi Lu, The Ohio State University

A Tiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation   Slides

Jianting Zhang (Dept. of Computer Science, The City College of New York), Simin You (Dept. of Computer Science CUNY Graduate Center), and Le Gruenwald (Dept. of Computer Science, The University of Oklahoma Norman)

Optimising Bootstrapping Algorithms using R and Hadoop   Slides

Shicai Wang (Data Science Institute, Imperial College London, UK), Mihaela A. Mares (Data Science Institute, Imperial College London, UK), and Yike Guo (Data Science Institute, Imperial College London, UK; School of Computer Science, Shanghai University, China)

16:30 – 18:00

Panel: Wide Adoption of HPC Techniques in Big Data: Hype or Reality?

Panel Moderator: Jianfeng Zhan, Institute of Computing Technology, Chinese Academy of Sciences, China

Panel Members:

The panel will discuss on the following three important questions the Big Data and HPC communities are facing today:

  • What is the precondition of wide adoption of HPC technologies in Big Data?
  • Do you have any prediction on merging HPC and Big Data technologies? When and how will this merge happen?
  • What are the differences between Big Data in HPC and Big Data in the other domains?

18:00 - 18:10

Closing Remarks