Reading list: UDF compilation

Posted on Jan 21, 2024 | 253 words | ~2mins

Results on UDF compilation discussed in the “Server-side Logic Execution” lecture of CMU 15-721 were interesting.

The first result in this area was: K. Ramachandra, et al., Froid: Optimization of Imperative Programs in a Relational Database (VLDB, 2017) https://arxiv.org/abs/1712.00498
- A subset of UDF can be compiled to SQL. Then standard query optimization routines apply.
- Decorrelation of user defined function invocations in queries (2014, link) is what the Froid paper is based on.
C. Duta, et al., Compiling PL/SQL Away (CIDR, 2020, https://arxiv.org/abs/1909.03291)
- Instead of compiling UDFs to SQL, target “common table expressions”.
- Supports more UDF constructs (loops!).
- Also, the code is set up as a middleware between the application and the database. UDFs are converted to SQL before they reach the database.
- “More PL-ey”.
Umbra’s optimizer is better at “de-correlating” than Froid. (source: lecture & links later in this post)
- Orthogonal optimization of subqueries and aggregation (SIGMOD, 2001, link) is what Froid uses to decorrelate subqueries.
- Unnesting Arbitrary Queries (BTW 2015, link) is what Umbra uses. The lecturer says “they can systematically decorrelate any subquery”. What does “any subquery” here mean?

Looking up the lecturer, I found this paper on batching: Dear User-Defined Functions, Inlining isn’t working out so great for us. Let’s try batching to make our relationship work. Sincerely, SQL (CIDR 2024, slides)

The gist of it seems to be that if the database system supports Neumann-style (the BTW 2015 paper above), then inlining is preferred. But there are cases where batching is better (e.g. certain query shapes against MS SQL).