Second International Workshop on Communication Architectures at Extreme Scale
In conjunction with International Supercomputing Conference (ISC 2016)
At Messe Frankfurt, Frankfurt, Germany, Thursday, June 23rd, 2016
Extreme Scale computing is marked by multiple-levels of hierarchy and heterogeneity ranging from the compute units to storage devices to the network interconnects. Owing to the plethora of heterogeneous communication paths with different cost models expected to be present in extreme scale systems, data movement is seen as the soul of different challenges for exascale computing. On the other hand, advances in networking technologies such as NoCs (like NVLink and Storm Lake), RDMA enabled networks and the like are constantly pushing the envelope of research in the field of novel communication and computing architectures for extreme scale computing. The goal of this workshop is to bring together researchers and software/hardware designers from academia, industry and national laboratories who are involved in creating network-based computing solutions for extreme scale architectures, to share their experiences and to learn the opportunities and challenges in designing next-generation HPC systems and applications.
ExaComm 2016 will be held in conjunction with the International Supercomputing (ISC 2016), Frankfurt, Germany, on Thursday, June 23rd, 2016.
ExaComm 2016 welcomes original submissions in a range of areas, including but not limited to:
Abstract: Lawrence Livermore National Laboratory (LLNL) has a long history of leadership in large-scale computing. Our current platform, Sequoia, is a 96 rack BlueGene/Q system that is currently number three on the Top 500 list. Our next platform, Sierra, will be a heterogeneous system delivered by a partnership between IBM, NVIDIA and Mellanox. In this talk, we will explore optimizations of applications that run on these platforms, with a focus on their networks and the software that enables their efficient use.
Abstract: Topology-aware communication is important for efficient data movement across a large-scale decentralized direct network. To encourage topology-aware optimization of communication, the system should provide topology-aware scheduling. In this talk, topology-awareness features of the Tofu and Tofu2 interconnects and the associated software stacks are presented.
Abstract: Up to the era of tens PFLOPS peak performance driven by accelerators such as GPU or MIC, all the inter-node communication between these accelerating devices depended on the common high performance interconnect such as InfiniBand. When the performance gap including latency between these communication channels and absolute performance of the accelerating devices becoms much larger than today, we need brand new solution to exploit their potential performance in the real world problems. We have been developing more direct solution for this problem, introducing FPGA technology as the glue of accelerating devices and communication channel to apply real co-designing to the system design. In this talk, I will provide such a work so far and what's we should do in the next decade.
Abstract: High performance computing has begun scaling beyond Petaflop performance towards the Exaflop mark. One of the major concerns throughout the development toward such performance capability is scalability . at the component level, system level, middleware and the application level. A Co-Design approach between the development of the software libraries and the underlying hardware can help to overcome those scalability issues and to enable a more efficient design approach towards the Exascale goal.
Abstract: Designing scalable applications requires to expose parallelism and minimize parallel overheads. In this talk we focus on the latter and present GPUDirect technologies to maximize inter GPU bandwidth and minimize latencies to improve scaling on GPU Clusters. This includes the latest addition GPUDirect Async and benchmarking results with NVLink and Telsa P100.
Abstract: This talk will be about management of Exascale systems where we elaborate on both various challenges that our community is facing and possible solutions. First we discuss on how to achieve scalability with InfiniBand (IB) where we consider OFEDs Scalable SA approach and recent research contributions, including making use of IB routers. Another important aspect of achieving Exascale scalability is the ability to carry out effective reconfigurations that are topology- and routing-aware, in order to provide reliability and maintaining performance when the traffic pattern changes. We here talk about recent research results as for example making use of SlimUpdate, generalized metabase-aided routing, and .hierarchical. reconfiguration. Finally the talk will be about ideas and challenges related to introducing self-adaptive management of multi-tenancy HPC systems, paving the path for a merge of HPC and cloud computing.
The workshop does not have a separate registration site. All attendees need to use the registration system provided by ISC'16. Please remember to select the workshop option when registering. Details about registration can be found on the main conference website.