Notes: Curse of Dimensionality

Posted on Oct 20, 2018 | 237 words | ~2mins

Geometry

In \( \mathbb{R}^n \) where \( n \) is large,

The volume of the unit ball goes to 0 as the dimension goes to infinity.
Most of the volume of a n-ball is concentrated around its boundary.
Pick two vectors on the surface of the unit ball independently. The two are orthogonal with high probability.
Johnson-Lindenstrauss lemma: a set of points can be embedded almost isometrically w.r.t. \( L_2 \) into a space of a much lower dimension. The \( L_1 \) analogue doesn’t hold. Look up fast JL transform.
Distances between points are concentrated: \( \frac{ \max_{x, y} d(x,y) - \min_{x, y} d(x,y) } { \min_{x, y} d(x,y) } \to 0 \) where \( d \) is the \( L_p \) norm where \( p \geq 1 \).

Suppose the data has \( p \) features,

Suppose that the features are binary categorical. The cardinality of the feature space grows exponentially with \( p \). Suppose the size of the training set is fixed; then, the model is highly likely to overfit when \( p \) is large.
Suppose that the features are continuous and in \( [0,1] \). Then, most points are located outside of the sphere inscribed in the cube.