Posts
Motivating Example Suppose that you have an MLIP:
$$\begin{align} P_0: \text{maximize} \quad & c_1x_1 + c_2x_2 \\ \text{subject to} \quad & a_1x_1 \leq L_1 \tag{1L, R_1 limit} \\ \quad & a_2x_2 \leq L_2 \tag{2L, R_2 limit} \\ \quad & b_{1,1}x_1 + b_{2,1}x_2 \geq D_1 \tag{1D} \\ \quad & b_{1,2}x_1 + b_{2,2}x_2 \geq D_2 \tag{2D} \\ \quad & x_1 \geq 0, \quad x_2 \geq 0 \\ \quad & x_1, x_2 \in \mathbb{N} \end{align}$$
Read more
Reading list: UDF compilation
Results on UDF compilation discussed in the “Server-side Logic Execution” lecture of CMU 15-721 were interesting.
The first result in this area was: K. Ramachandra, et al., Froid: Optimization of Imperative Programs in a Relational Database (VLDB, 2017) https://arxiv.org/abs/1712.00498
A subset of UDF can be compiled to SQL. Then standard query optimization routines apply. Decorrelation of user defined function invocations in queries (2014, link) is what the Froid paper is based on.
Read more
How to compute IIS
Gurobi says their algorithm is based on the paper “Identifying Minimally Infeasible Subsystems of Inequalities” from 1990.
This thread from Gurobi has some discussion on how Gurobi does it.
An article by Chinneck. These slides from Chinneck. “Identifying Minimally Infeasible Subsystems of Inequalities” By John Gleeson, Jennifer Ryan. 1990.
Link
Notes:
\( y_i := 1 \text{ iff the i^th constraint is deleted.} \) \( S_i := \text{ the set of constraint indices in the j^th IIS.
Read more
Simple properties of IIS
IIS stands for “Irreducible Infeasible Subsystem”. There is no Wikipedia entry for it. I’ve been using the definition from a Gurobi article.
Let \( I_0 \) be a subset of constraints. \( I_0 \) is an IIS iff:
\( I_0 \) is infeasible. Removing any single constraint (“or bound”) from \( I_0 \) makes it feasible. (I’ve never thought about what the “or bound” part is doing.)
Property 1: an IIS does not consist of disjoint infeasible systems Example: the following constraints do not comprise an IIS.
Read more
Notes: Data Diffs
I came across Data diffs: Algorithms for explaining what changed in a dataset.
The two papers discussed are: Scorpion by Eugene Wu and Sam Madden. Finds common properties of outlier points. DIFF. SQL implementation of Scorpion and similar. The computated can be distributed. An author of the DIFF paper, Peter Bailis, founded sisudata.com. The comments on HN list some related work:
Dolt is a company whose product is a version-controlled SQL database.
Read more
Notes: Compressed Sensing
Lecture video, and what Terrence Tao has to say about the topic.
The video is associated with this textbook. I came across the Coursera version of this course around 2013. The data analysis section of this course seems to be motivated by signal processing. The course has some overlap with Stanford’s EE263 (linear methods).
Learning Optimization, Stats, and ML
Optimization I’m interested in learning mathematical optimization (LP, ILP, convex opt), and stochastic problems of a similar flavor (stochastic control?). Probably for historical reasons, convex optimization is taught in EE, and integer programming is in OR.
Stanford EE263 Linear Dynamical Systems
An applied linear algebra course. Interesting content, good problems. “Linear methods can solve lots of things” is a statement that I often hear but without examples. This course makes that statement concrete.
Read more
ML theory papers
Mert Pilanci
Neural Networks are Convex Regularizers
Mert Pilanci. Any finite two-layer neural network with ReLU has an equivalent convex problem. Talk at Stanford. An audience member said that you want SDG to converge to a local minumum (for generalization?). That doesn’t sound right? A previous result showed that an infinite-width network is convex. This paper on the other hand gives a convex problem that can be implemented and solved. All Local Minima are Global for Two-Layer ReLU Neural Networks
Read more
Learning CS Systems
Online resources that I’ve found useful for learning computer science, especially systems. Last updated on 2019-11-23.
Systems Beyond CMU 15-213.
CMU 15-440 (Fall version)
The textbook (Van Steen & Tanenbaum) is readable. The slides are informative as supplements to corresponding textbook chapters and referenced articles. Lab assignments are done in Go. Tons of debugging concurrency issues. 0. Go warm-up. A simple key-value server. Implement a subset of TCP starting with UDP.
Read more
Notes: Curse of Dimensionality
Geometry In \( \mathbb{R}^n \) where \( n \) is large,
The volume of the unit ball goes to 0 as the dimension goes to infinity. Most of the volume of a n-ball is concentrated around its boundary. Pick two vectors on the surface of the unit ball independently. The two are orthogonal with high probability. Johnson-Lindenstrauss lemma: a set of points can be embedded almost isometrically w.r.t. \( L_2 \) into a space of a much lower dimension.
Read more