Why is ML Hard?
This demo shows data points in a 20-dimensional space, projected down to 2D.
The original data has 3 clusters that are easily separable in high dimensional space, but:
- Information is lost when projecting from high dimensions to 2D
- Different projection methods preserve different aspects of the data
- The "curse of dimensionality" makes distance measures less meaningful
- Visualizing high-dimensional relationships becomes extremely difficult
PCA finds the directions of maximum variance in the data and projects it onto a lower-dimensional subspace.