Reading list: UDF compilation

Posted on | 253 words | ~2mins

Results on UDF compilation discussed in the “Server-side Logic Execution” lecture of CMU 15-721 were interesting.

  • The first result in this area was: K. Ramachandra, et al., Froid: Optimization of Imperative Programs in a Relational Database (VLDB, 2017) https://arxiv.org/abs/1712.00498

    • A subset of UDF can be compiled to SQL. Then standard query optimization routines apply.
    • Decorrelation of user defined function invocations in queries (2014, link) is what the Froid paper is based on.
  • C. Duta, et al., Compiling PL/SQL Away (CIDR, 2020, https://arxiv.org/abs/1909.03291)

    • Instead of compiling UDFs to SQL, target “common table expressions”.
    • Supports more UDF constructs (loops!).
    • Also, the code is set up as a middleware between the application and the database. UDFs are converted to SQL before they reach the database.
    • “More PL-ey”.
  • Umbra’s optimizer is better at “de-correlating” than Froid. (source: lecture & links later in this post)

    • Orthogonal optimization of subqueries and aggregation (SIGMOD, 2001, link) is what Froid uses to decorrelate subqueries.
    • Unnesting Arbitrary Queries (BTW 2015, link) is what Umbra uses. The lecturer says “they can systematically decorrelate any subquery”. What does “any subquery” here mean?

Looking up the lecturer, I found this paper on batching: Dear User-Defined Functions, Inlining isn’t working out so great for us. Let’s try batching to make our relationship work. Sincerely, SQL (CIDR 2024, slides)

  • The gist of it seems to be that if the database system supports Neumann-style (the BTW 2015 paper above), then inlining is preferred. But there are cases where batching is better (e.g. certain query shapes against MS SQL).