Unlocking Dimensionality Reduction: The Johnson-Lindenstrauss Lemma

Chapter 1: Understanding Dimensionality Reduction

This article delves into a remarkable concept in the realm of dimensionality reduction, specifically the Johnson-Lindenstrauss Lemma. This lemma is one of the most compelling and satisfying results concerning dimensionality reduction in Euclidean spaces. It honors the mathematicians William B. Johnson (1944-) and Joram Lindenstrauss (1936–2012), who proved it in 1984.

Essentially, the lemma asserts that for a set of N points in N-dimensional Euclidean space, these points can be mapped into a significantly smaller Euclidean space (particularly one of logarithmic dimension), while maintaining the pairwise Euclidean distances within a multiplicative factor of 1 + ε.

In simpler terms, if you possess N points with N coordinates each, you can always derive N new points with approximately log N coordinates, ensuring that the pairwise distances remain consistent within a small constant factor.

For those in the field of Computer Science, the implications are immediately apparent. The explicit representation of the original N points requires O(N²) space, assuming that the space for each coordinate is constant. Similarly, storing all pairwise distances demands O(N²) space. However, by utilizing the reduced space of logarithmic dimensions suggested by the lemma, the total space to store all points can be reduced to O(N log N). This represents a significant enhancement!

To illustrate further, there are numerous scenarios where pairwise distances among points need to be accessed repeatedly. Ideally, we would precompute and store these distances in an N²-sized table, allowing for O(1) time retrieval. While this would be optimal, N might reach tens of millions or even billions, making such a vast table impractical. Moreover, even the original point descriptions come at a high cost, necessitating an N² table just to hold the coordinates.

In summary, without the Johnson-Lindenstrauss lemma, we face two options:

Maintain both the table of points and the table of pairwise distances throughout the application's runtime.
Keep the table of points but calculate distances on-the-fly, requiring O(N) time for each computation.

The third option appears most viable. We can compute a new set of N points and store their coordinates in a table sized at O(N log N). Whenever needed, distances can then be computed on-the-spot, now requiring only O(log N) time. This approach leads to nearly linear space usage instead of quadratic, and each computation takes logarithmic time, a substantial improvement over the linear time needed in option (ii). The trade-off is a small margin of error in the results, with distances potentially deviating by, say, 1%. This encapsulates the power of the Johnson-Lindenstrauss Lemma!

Before we dive into a discussion on the proof of this lemma, let's consider an insightful response from the cstheory.stackexchange regarding when to apply the JL lemma as opposed to Singular Value Decomposition (SVD), for those who may have questions on the topic.

When to Choose Johnson-Lindenstrauss over SVD?

A visual explanation of the Johnson-Lindenstrauss lemma.

This response by Suresh Venkatasubramanian provides clarity on the use cases for the JL lemma.

Chapter 2: Proving the Johnson-Lindenstrauss Lemma

We will now touch upon the proof of the Johnson-Lindenstrauss Lemma, though we won't delve into a formal proof since various online resources offer comprehensive explanations. Instead, we will highlight the key lemma essential for deriving the result.

Firstly, here is the formal statement of the Johnson-Lindenstrauss Lemma. It is worth noting that although this is a legitimate theorem, it retains the name "lemma" for historical reasons.

We emphasize that the JL Lemma is precise regarding the dependence of K on N and ε, a property confirmed by Kasper Green Larsen and Jelani Nelson (2017).

As with many significant results, randomness plays a key role. The original proof of the lemma essentially suggests selecting a random K-dimensional linear subspace and then projecting the points orthogonally onto it. This method allows us to demonstrate that with a positive probability, all pairwise distances are preserved within the 1 + ε factor. Thus, a linear transformation that meets the desired criteria must exist.

To further illustrate, one potential approach to create such a random linear projection involves employing Gaussian random variables. Formally, one can assert the following lemma.

The lemma illustrating the nearly isometric properties of the linear map.

The above lemma indicates that this random linear map acts as an "almost isometry," meaning it nearly preserves the length of any given vector x. The proof relies on the concentration properties of the Gaussian distribution, with an excellent explanation provided in Jiri Matousek's "Lecture notes on Metric Embeddings."

Once we have established the above lemma, the completion of our proof is straightforward. We set the parameter K in the JL Lemma to a specific value.

With this choice of K, the lemma states that for any vector x, the likelihood that its length is not "almost preserved" is

Probability statement regarding vector length preservation.

Next, we focus on the exponent in the denominator.

Exponent analysis for the probability statement.

By substituting this value back into our previous inequality, we arrive at

Now we analyze the implications of the above lemma concerning the vectors.

Implications of the lemma on vector distances.

This lemma implies that for any given i and j,

By applying the Union Bound, we find that the probability of at least one distance not being "almost preserved" is at most

Union Bound implications on pairwise distances.

Ultimately, we conclude that with a probability of at least 1/2, all pairwise distances are nearly preserved. Since the mapping f is linear, we can simply consider the N points

Final considerations on the linear mapping of points.

Note: To accurately determine the constant C in the JL Lemma statement, one can transition from natural logarithm to common logarithm while adjusting for the resulting constants.

Conclusion

In closing, a few remarks are in order:

The random map employed is linear, which simplifies analysis.
The result is tight, applicable to non-linear maps as well; thus, the parameters of the JL Lemma are optimal.
The probability of success can be enhanced by repeating the "experiment" multiple times, indicating that the proof is constructive and allows for the computation of such maps with high probability.
Other proofs of the JL Lemma exist, including a particularly fascinating one by Dimitris Achlioptas, demonstrating that a random projection matrix with entries {-1, 0, 1} can effectively achieve the desired result, showing that even biased coin flips can be useful!

Chapter 3: Video Resources for Further Understanding

To deepen your understanding of the Johnson-Lindenstrauss lemma, consider watching the following videos:

The first video, "Class 8, Video 1: Johnson-Lindenstrauss Lemma," provides an insightful overview of the lemma's concepts and applications.

The second video, "MIT 6.854 Spring 2016 Lecture 5: Johnson Lindenstrauss Lemma and Extensions," explores the lemma in greater depth, discussing its extensions and implications in various fields.