Posts
Model Serving FLIP-23. The document also discusses implementing model training as well as model serving. Two linked documents—“Flink ML Roadmap” and “Flink-MS”—are also worth reading. Boris Lublinsky’s book “Serving Machine Learning Models”, and talk. Distributed Systems Please Stop Calling Database Systems AP or CP. Kate Matsudaira on distributed systems Distributed Systems for Fun and Profit Martin Kleppmann’s book and interview. Use of Formal Methods at Amazon Web Services A tale of two clusters: Mesos and YARN A visual explanation of Raft: link Streaming Systems The Dataflow Model, Apache Beam, and Cloud Dataflow Tyler Akidau’s article on the Dataflow model.
Read more
Freely available online resources that I’ve found useful for learning Statistics, Machine Learning, Distributed Computing, Database Systems, and other CS and SWE topics.
Statistics and Machine Learning ML Expositions on machine learning that are freely accessible (like the ones on Coursera) tend to sweep theoretical foundation and mathematical rigor under the carpet; these are resources that don’t skimp on the hard stuff.
Stanford CS229 Machine Learning
One-line summary: the course teaches you how to set up a cost function based on a model and data, and figure out how to optimize it.
Read more
I should come up with a solution that uses optimization instead of doing regression twice.
Introduction The model we consider is \(Y_i = \alpha + \beta x_i + \epsilon_i\), where \( \epsilon_i \) are uncorrelated, and \( \mathbb{V}(\epsilon_i) \) depends on \( i \). We discuss two solutions to finding estimators of \( \alpha, \beta \). Weighted least squares regression leads to best linear unbiased estimators (BLUE). Also, with stronger assumptions on \( \epsilon_i \), maximum likelihood estimators (MLE) can be found.
Read more