ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

High Performance Cloud Infrastructure

(Supported by: NSF CSR SMALL , Amazon Equipment Grant, Nationwide Children's Hostpital )
(Collaborators: Nationwide Children's Hostpital, VMWare )


Research Challenge:
Over 90% of the world's digital data was produced in the last 2 years. The next generation of Internet services will access a lot of data for each user request, but neither processor speeds nor expectations for fast response times change.  We are building cloud infrastructure, e.g., key-value stores and data-parallel platforms, that can efficiently and scalably access large ammounts of data in seconds.  The increasing density of main memory serves as a building block for such services.  However, main memory systems are sensitive to performance anomalies at scale, i.e., 100 to 1000 threads per user request.  In such systems, response times can depend on the slowest main memory acceess times---not the average access times.  Unfortunately, access times in practice are not normally distributed.  The slowest accesses can 100X larger than the average, causing large delays.  It is also challenging to keep main memory systems fully utilized.  Small slowdowns caused by background jobs, compliance software, or garbage collection can ripple throughout the processing pipeline, leading to inefficient execution and slow response times.


Recommended Reading:
Entomomodel..., MASCOTS 2010 (best paper)
Balanced and Predictable Networked Storage, DCPerf 2013
Zoolander..., ICAC 2013
Managing Tiny Tasks for Data-Parallel, Subsampling Workloads, IC2E 2014

Impact:
Working with researchers at Nationwide Children's Hostpital, we have parallized EAGLET, a popular tool used to study disease causing genes by statistical geneticists. The EAGLET package comprises C, C++, Perl, and Bash code. It was written to maximize single-thread performance and usability. Rather than rewite the whole package, we integrated EAGLET into a data-parallel framework that could handle such legacy code: BashReduce. We have significantly extended BashReduce in this effort. Our parallel version will replace the existing EAGLET distribution on the software's next release. We also plan to release our BashReduce improvements as open source.

We also developed Zoolander, a key value store that support strict service level objectives (e.g., 99.99% of gets and puts complete in less than 5ms). Zoolander uses commodity hardware and can extend any open source key value store (e.g., Redis or Zookeeper). We have scaled Zoolander up to 120 nodes on Amazon EC2, achieving requests rates at the scale of TripAdvisor while meeting strict SLOs.


Media and Press Release:
We expect a production release of EAGLET with our parallel infrastructure soon. Stay tuned.