Scientific tool for latent space investigation
Top 48.9% on sourcepulse
Latent Scope provides a comprehensive workflow and interactive web interface for exploring latent spaces derived from unstructured data. It targets data scientists and researchers needing to visualize, cluster, and annotate high-dimensional embeddings, offering an intuitive way to gain insights from complex datasets.
How It Works
The tool orchestrates a multi-step process: embedding unstructured data into high-dimensional vectors using models like BAAI/bge-small-en-v1.5, reducing dimensionality with UMAP, clustering the resulting points with HDBSCAN, and labeling clusters using LLMs (e.g., Zephyr-7b-beta, GPT-3.5-turbo). This pipeline is accessible via both a Python API and a suite of command-line scripts, with all intermediate and final outputs stored as flat files for easy portability and inspection.
Quick Start & Requirements
pip install latentscope
ls-init <data_dir> [--openai_key=XXX] [--mistral_key=YYY]
followed by ls-serve
. Access via http://localhost:5001
.Highlighted Details
Maintenance & Community
The project is actively maintained by the author, enjalot. Further details on contributing and the roadmap are available in CONTRIBUTION.md
and DEVELOPMENT.md
.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration with closed-source projects.
Limitations & Caveats
The README does not specify licensing, which may impact commercial adoption. While it supports various models, the integration of new embedding or chat models might require code modifications.
2 months ago
1 day