Program

Room: Regency D at Gold Level

8:50 - 9:00

Opening Remarks   

9:00 - 10:00

Keynote Talk

Title: High Performance Computing and Which Big Data?  Slides

Speaker: Chaitanya Baru, Distinguished Scientist, San Diego Supercomputer Center (SDSC)

Abstract: Big Data is an all-encompassing term. The initial attributes ascribed to Big Data—volume and velocity—do imply high performance, whether for query processing or machine learning. A part of the community has always considered variety as the key challenge, both the variety of data and the variety of workloads on the same data (leading to notions of "late binding" of data to schema). The definition of Big Data grew further, to include value, veracity, and other "v" words.
The traditional HPC community is also encountering large-scale data issues as witnessed by the quest for Big Data approaches in Extreme Scale computing, as part of the exascale initiative. This is driven by the need for processing large amounts of observational and simulation data, perform data mining and machine learning operations on these data, and processing instrument data streams. The data in these applications are relatively more structured and more uniform in quality.
For data and performance, the most well-known benchmarks are from the database world, viz., the Transaction Processing Performance Council (TPC). In high-performance computing, the well-known benchmark is the Top500. One difference between the two benchmark specifications is that TPC includes price/performance as part of the metric. A third well-known set of benchmarks are from the Standard Performance Evaluation Corpopration, SPEC.
In this talk, we will take a quick tour of key aspects of Big Data benchmarking efforts that have been underway in the community, and recent advances in the area. High Performance Computing and Big Data is a rich topic area with many possible approaches. Can we create some coherence among the activities in this area, for example, via organizations like the SPEC Research Groups and others.

10:00 - 10:30

Coffee Break

10:30 - 12:00

Session I: High-Performance Big Data Applications and Systems

Session Chair: Xiaoyi Lu, The Ohio State University

Evaluation of SMP Shared Memory Machines for Use With In-Memory and OpenMP Big Data Applications   Slides

Andrew J. Younge (Indiana University), Christopher Reidy (University of Arizona), Robert Henschel (Indiana University), Geoffrey C. Fox (Indiana University)

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management   Slides

Andre Luckow (Clemson University), Ioannis Paraskevakos (Rutgers University), George Chantzialexiou (Rutgers University), Shantenu Jha (Rutgers University)

PACM: A Prediction-based Auto-adaptive Compression Model for HDFS   Slides

Ruijian Wang (University of the Chinese Academy of Sciences), Chao Wang (University of the Chinese Academy of Sciences), Li Zha (Institute of Computing Technology, Chinese Academy of Sciences)

12:00 - 13:30

Lunch Break

13:30 - 15:00

Session II: High-Performance Streaming Systems

Session Chair: Li Zha, Institute of Computing Technology, Chinese Academy of Sciences

SamzaSQL: Scalable Fast Data Management with Streaming SQL   Slides

Milinda Pathirage (School of Informatics and Computing, Indiana University), Julian Hyde (Hortonworks), Yi Pan (Yahoo!), Beth Plale (Indiana University)

Towards High Performance Processing of Streaming Data in Large Data Centers   Slides

Supun Kamburugamuve (Indiana University), Saliya Ekanayake (Indiana University), Milinda Pathirage (School of Informatics and Computing, Indiana University), Geoffrey Fox(Indiana University)

Extracting Log Patterns from System Logs in LARGE   Slides

Yining Zhao (Computer Network Information Center, Chinese Academy of Sciences), Haili Xiao (Computer Network Information Center, Chinese Academy of Sciences)

15:00 – 15:30

Coffee Break

15:30 – 16:10

Session III (Short Papers): Performance Studies of Big Data Systems and Applications

Session Chair: Xiaoyi Lu, The Ohio State University

Exploring the Performance of Spark for a Scientific Use Case   Slides

Saba Sehrish (Fermi National Accelerator Laboratory), Jim Kowalkowski (Fermi National Accelerator Laboratory), Marc Paterno (Fermi National Accelerator Laboratory)

Big Data for Medical Image Analysis: A Performance Study   

Rui Zhang (IBM Research - Almaden), Hongzhi Wang (IBM Research - Almaden), Renu Tewari (IBM Research - Almaden), Gero Schmidt (IBM Research - Almaden), Deepika Kakrania (IBM Research - Almaden)

16:10 – 17:40

Panel: Merge or Split: Mutual Influence between Big Data and HPC Techniques   Slides

Panel Moderator: Jianfeng Zhan, Institute of Computing Technology, Chinese Academy of Sciences, China

Panel Members:

The panel will discuss on the following three important questions the Big Data and HPC communities are facing today:

  • What is the impact of Big Data techniques on HPC?
  • What is the impact of HPC techniques on Big Data?
  • Future mutual influence between HPC and Big Data techniques?

17:40 - 18:00

Closing Remarks