TR-21-2.pdf

"DB Spinner: making a case for iterative processing in databases"  

Sofoklis Floratos, Ahmed Ghazal, Jason Sun, Jianjun Chen and Xiaodong Zhang 

Proceedings of 37th International Conference on Data Engineering (ICDE 2021), 
Chania, Crete, Greece, April 19-22, 2021.


Abstract

Relational database management systems (RDBMS) have limited iterative processing support. Recursive queries were added to ANSI SQL, however, their semantics do not allow aggregation functions, which disqualifies their use for several applications, such as PageRank and shortest path computations. Recently, another SQL extension, iterative Common Table Expressions (CTEs), is proposed to enable users to perform general iterative computations on RDBMSs.In this work, we demonstrate how iterative CTEs can be efficiently incorporated into a production RDBMS without major intrusion to the system. We have prototyped our approach on Futurewei's MPPDB, a shared nothing relational parallel database engine. The implementation is based on a functional rewrite that translates iterative CTEs to other existing SQL operators. Thus, query plans of iterative CTEs can be optimized and executed by the engine with minimal modification to the code base. We have also applied several optimizations specifically for iterative CTEs to i) minimize data movement, ii) reuse results that remain constant and iii) push down predicates to avoid unnecessary data processing. We verified our implementation through extensive experimental evaluation using real-world datasets and queries. The results show the feasibility of the rewrite approach and the effectiveness of the optimizations, which improve performance by an order of magnitude in some cases.