TR-18-1.pdf

``SQLoop: high performance iterative processing in data management"   

Sofoklis Floratos, Yanfeng Zhang, Yuan Yuan, Rubao Lee, and  
Xiaodong Zhang

Proceedings of 38th International Conference on Distributed Computing
Systems (ICDCS'18), Vienna, Austria, July 2-5, 2018.


Abstract

Increasingly more iterative and recursive query
tasks are processed in data management systems, such as graph-structured
data analytics, demanding fast response time. However,
existing CTE-based recursive SQL and its implementation
ineffectively respond to this intensive query processing with
two major drawbacks. First, its iteration execution model is
based on implicit set-oriented terminating conditions that cannot
express aggregation-based tasks, such as PageRank. Second,
its synchronous execution model cannot perform asynchronous
computing to further accelerate execution in parallel. To address
these two issues, we have designed and implemented SQLoop, a
framework that extends the semantics of current SQL standard in
order to accommodate iterative SQL queries. SQLoop interfaces
between users and different database engines with two powerful
components. First, it provides an uniform SQL expression for
users to access any database engine so that they do not need to
write database dependent SQL or move datasets from a target
engine to process in their own sites. Second, SQLoop automatically
parallelizes iterative queries that contain certain aggregate
functions in both synchronous and asynchronous ways. More
specifically, SQLoop is able to take advantage of intermediate
results generated between different iterations and to prioritize
the execution of partitions that accelerate the query processing.
We have tested and evaluated SQLoop by using three popular
database engines with real-world datasets and queries, and shown
its effectiveness and high performance.