Program

8:50 - 9:00

Opening Remarks   

9:00 - 10:00

Keynote Talk

Title: Twister2: A High-Performance Big Data Programming Environment  

Speaker: Geoffrey Fox, Professor, Indiana University; Interim Associate Dean for Intelligent Systems Engineering

Abstract: We analyse the components that are needed in programming environments for Big Data Analysis Systems with scalable HPC performance and the functionality of ABDS – the Apache Big Data Software Stack. One highlight is Harp-DAAL which is a machine library exploiting the Intel node library DAAL and HPC communication collectives within the Hadoop ecosystem. Another highlight is Twister2 which consists of a set of middleware components to support batch or streaming data capabilities familiar from Apache Hadoop, Spark, Heron and Flink but with high performance. Twister2 covers bulk synchronous and data flow communication; task management as in Mesos, Yarn and Kubernetes; dataflow graph execution models; launching of the Harp-DAAL library; streaming and repository data access interfaces, in-memory databases and fault tolerance at dataflow nodes. Similar capabilities are available in current Apache systems but as integrated packages which do not allow needed customization for different application scenarios. We discuss the synergy between cloud management (DevOps) and cloud execution systems.

Bio: Geoffrey Charles Fox received a Ph.D. in Theoretical Physics from Cambridge University where he was Senior Wrangler. He is now a distinguished professor of Engineering, Computing, and Physics at Indiana University where he is director of the Digital Science Center, and both Department Chair and Interim Associate Dean for Intelligent Systems Engineering at the School of Informatics, Computing, and Engineering. He previously held positions at Caltech, Syracuse University, and Florida State University after being a postdoc at the Institute for Advanced Study at Princeton, Lawrence Berkeley Laboratory, and Peterhouse College Cambridge. He has supervised the Ph.D. of 70 students and published around 1300 papers (over 470 with at least ten citations) in physics and computing with an hindex of 78 and over 33000 citations. He is a Fellow of APS (Physics) and ACM (Computing) and works on the interdisciplinary interface between computing and applications.

10:00 - 10:30

Coffee Break

10:30 - 12:00

Regular Paper Session I: High-Performance Data Processing Systems

Session Chair: Li Zha, Institute of Computing Technology, Chinese Academy of Sciences

Improving I/O Performance through Colocating Interrelated Input Data and Near-Optimal Load Balancing    (Best Paper Award Winner!)

Felix Seibert, Mathias Peters, and Florian Schintke

How Well Do CPU, GPU and Hybrid Graph Processing Frameworks Perform?   

Tanuj Kr Aasawat, Tahsin Reza, and Matei Ripeanu

EASIS: An Optimized Information Service for High Performance Computing Environment   

Can Wu, Xiaoning Wang, Haili Xiao, Rongqiang Cao, Yining Zhao and Xuebin Chi

12:00 - 13:30

Lunch Break

13:30 - 15:00

Regular Paper Session II: High-Performance Data Processing Applications

Session Chair: D. K. Panda, The Ohio State University

GPU Accelerated Self-join for the Distance Similarity Metric   

Michael Gowanlock and Ben Karsin

Implementing a Parallel Graph Clustering Algorithm with Sparse Matrix Computation   

Jun Chen and Peigang Zou

atSNP Infrastructure, A Case Study for Searching 37 Billions Records While Providing Significant Cost Savings over Cloud Providers   

Christopher Harrison, Sunduz Keles, Rebecca Hudson, Sunyoung Shin, and Ines Dutra

15:00 – 15:30

Coffee Break

15:30 - 16:00

Short Paper Session I: Data Processing on HPC and Cloud Environments (15 mins each)

Session Chair: Jianfeng Zhan, The Ohio State University

Improvement of the Log Pattern Extracting Algorithm Using Text Similarity   

Yining Zhao, Xiaodong Wang, Haili Xiao, and Xuebin Chi

The Performance Analysis of Cache Architecture based on Alluxio over Virtualized Infrastructure   

Xu Chang and Li Zha

16:00 – 17:30

Panel: Which Framework is the Best for High-Performance Deep Learning: Big Data Framework or HPC Framework?   

Panel Moderator: Jianfeng Zhan, Institute of Computing Technology, Chinese Academy of Sciences, China   

Panel Members:

The panel will discuss on the following three important questions the Big Data, Deep Learning, HPC, and Cloud Computing communities are facing today:

  • What are the differences of deep learning workloads from the other big data ones?
  • What are the unique challenges of deep learning in terms of system requirements?
  • Do we need other new components in addition to TPU?

17:30 - 17:45

Closing Remarks