umap  by lmcinnes

Dimension reduction technique for visualization and general non-linear reduction

created 8 years ago
7,883 stars

Top 6.7% on sourcepulse

GitHubView on GitHub
Project Summary

UMAP (Uniform Manifold Approximation and Projection) is a powerful dimension reduction technique for visualization and general non-linear dimensionality reduction. It's designed for users needing to explore high-dimensional data, offering a faster and often more globally representative alternative to t-SNE, with added capabilities like density preservation and supervised learning.

How It Works

UMAP models data with a fuzzy topological structure, aiming to find a low-dimensional projection that closely matches this structure. It leverages Riemannian geometry and fuzzy simplicial sets, with performance optimized by the numba library and the optional pynndescent library for nearest neighbor searches. This approach allows UMAP to preserve both local and global data structure effectively.

Quick Start & Requirements

  • Install: pip install umap-learn or conda install -c conda-forge umap-learn.
  • Requirements: Python 3.6+, numpy, scipy, scikit-learn, numba, tqdm, pynndescent. Optional: matplotlib, datashader, holoviews for plotting; tensorflow for Parametric UMAP.
  • Setup: Installation is straightforward via conda or pip. Performance is enhanced with pynndescent.
  • Docs: https://umap-learn.readthedocs.io/

Highlighted Details

  • Significantly faster than t-SNE, scaling well with high-dimensional and large datasets.
  • Supports various distance metrics, including cosine and correlation.
  • Offers densMAP for preserving local density and Parametric UMAP for neural network-based transformations.
  • Can be used as a scikit-learn transformer, supporting transform for new data and supervised/semi-supervised learning.

Maintenance & Community

The project is actively maintained by Leland McInnes and contributors. Community support is available via GitHub Issues.

Licensing & Compatibility

  • License: 3-clause BSD.
  • Compatibility: Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

While UMAP is robust, the densMAP variant recommends larger n_neighbors (e.g., 30) for reliable density estimation. Parametric UMAP is noted as experimental.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
152 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.