Logo High Dimensional Data Visualization

Visualizing High-Dimensional Data

Settings

Why is ML Hard?

This demo shows data points in a 20-dimensional space, projected down to 2D. The original data has 3 clusters that are easily separable in high dimensional space, but:

  • Information is lost when projecting from high dimensions to 2D
  • Different projection methods preserve different aspects of the data
  • The "curse of dimensionality" makes distance measures less meaningful
  • Visualizing high-dimensional relationships becomes extremely difficult

PCA finds the directions of maximum variance in the data and projects it onto a lower-dimensional subspace.

Generating data...

What This Demonstrates:

  1. Dimensionality Reduction Trade-offs: Each method (PCA, t-SNE, UMAP) makes different compromises when reducing dimensions
  2. Information Loss: Notice how clusters that are separate in high-dimensional space might overlap in 2D
  3. Feature Importance: In real ML problems, determining which dimensions (features) matter most is challenging
  4. Visualization Limits: Humans can only visualize 2D/3D, but ML models work in much higher dimensions
  5. The Reality: Real-world ML often deals with thousands or millions of dimensions, making this problem far more complex