Results on UDF compilation discussed in the “Server-side Logic Execution” lecture of CMU 15-721 were interesting.
-
The first result in this area was: K. Ramachandra, et al., Froid: Optimization of Imperative Programs in a Relational Database (VLDB, 2017) https://arxiv.org/abs/1712.00498
- A subset of UDF can be compiled to SQL. Then standard query optimization routines apply.
- Decorrelation of user defined function invocations in queries (2014, link) is what the Froid paper is based on.
-
C. Duta, et al., Compiling PL/SQL Away (CIDR, 2020, https://arxiv.org/abs/1909.03291)
- Instead of compiling UDFs to SQL, target “common table expressions”.
- Supports more UDF constructs (loops!).
- Also, the code is set up as a middleware between the application and the database. UDFs are converted to SQL before they reach the database.
- “More PL-ey”.
-
Umbra’s optimizer is better at “de-correlating” than Froid. (source: lecture & links later in this post)
Looking up the lecturer, I found this paper on batching: Dear User-Defined Functions, Inlining isn’t working out so great for us. Let’s try batching to make our relationship work. Sincerely, SQL (CIDR 2024, slides)
- The gist of it seems to be that if the database system supports Neumann-style (the BTW 2015 paper above), then inlining is preferred. But there are cases where batching is better (e.g. certain query shapes against MS SQL).