TR-20-1.pdf

"Automating incremental and asynchronous evaluation for recursive aggregate data processing"  

Qiange Wang, Yanfeng Zhang, Hao Wang, Liang Ge, Rubao Lee, Xiaodong Zhang, and Ge Yu

Proceedings of 2020 ACM SIGMOD Conference on Management of Data (SIGMOD 2020), 
Portland, Oregon, USA, June 14-19, 2020.


Abstract
In database and large-scale data analytics, recursive aggregate processing 
plays an important role, which is generally implemented under a framework of 
incremental computing and executed synchronously and/or asynchronously. 
We identify three barriers in existing recursive aggregate data processing. 
First, the processing scope is largely limited to monotonic programs. Second, 
checking on conditions for monotonicity and correctness for async processing 
is sophisticated and manually done. Third, execution engines may be suboptimal 
due to separation of sync and async execution. In this paper, we lay an 
analytical foundation for conditions to check if a recursive aggregate program 
that is monotonic or even non-monotonic can be executed incrementally and 
asynchronously with its correct result. We design and implement a condition 
verification tool that can automatically check if a given program satisfies 
the conditions. We further propose a unified sync-async engine to execute 
these programs for high performance. To integrate all these effective methods 
together, we have developed a distributed Datalog system, called PowerLog. 
Our evaluation shows that PowerLog can outperform three representative 
Datalog systems on both monotonic and non-monotonic recursive programs.