Research Challenge: Over 90% of the
world's digital data was produced in the last 2 years. The next
generation of Internet services will access a lot of data for each user
request, but neither processor speeds nor expectations for fast
response times change. We are building cloud infrastructure,
e.g., key-value stores and data-parallel platforms, that can
efficiently and scalably access large ammounts of data in
seconds. The increasing density of main memory serves as a
building block for such services. However, main memory systems
are sensitive to performance anomalies at scale, i.e., 100 to 1000
threads per user request. In such systems, response times can
depend on the slowest main memory acceess times---not the average
access times. Unfortunately, access times in practice are not
normally distributed. The slowest accesses can 100X larger than
the average, causing large delays. It is also challenging to keep
main memory systems fully utilized. Small slowdowns caused by
background jobs, compliance software, or garbage collection can ripple
throughout the processing pipeline, leading to inefficient execution
and slow response times.
Impact:
Working with researchers at Nationwide Children's Hostpital, we have parallized
EAGLET, a popular tool used to study disease causing genes by statistical
geneticists. The EAGLET package comprises C, C++, Perl, and Bash code. It was
written to maximize single-thread performance and usability. Rather than rewite
the whole package, we integrated EAGLET into a data-parallel framework that
could handle such legacy code: BashReduce. We have significantly extended
BashReduce in this effort. Our parallel version will replace the existing
EAGLET distribution on the software's next release. We also plan to release our
BashReduce improvements as open source.
We also developed Zoolander, a key value store that support strict service level
objectives (e.g., 99.99% of gets and puts complete in less than 5ms). Zoolander
uses commodity hardware and can extend any open source key value store (e.g.,
Redis or Zookeeper). We have scaled Zoolander up to 120 nodes on Amazon EC2,
achieving requests rates at the scale of TripAdvisor while meeting strict SLOs.
Media and Press Release:
We expect a production release of EAGLET with our parallel infrastructure
soon. Stay tuned.