latent-scope  by enjalot

Scientific tool for latent space investigation

created 2 years ago
717 stars

Top 48.9% on sourcepulse

GitHubView on GitHub
Project Summary

Latent Scope provides a comprehensive workflow and interactive web interface for exploring latent spaces derived from unstructured data. It targets data scientists and researchers needing to visualize, cluster, and annotate high-dimensional embeddings, offering an intuitive way to gain insights from complex datasets.

How It Works

The tool orchestrates a multi-step process: embedding unstructured data into high-dimensional vectors using models like BAAI/bge-small-en-v1.5, reducing dimensionality with UMAP, clustering the resulting points with HDBSCAN, and labeling clusters using LLMs (e.g., Zephyr-7b-beta, GPT-3.5-turbo). This pipeline is accessible via both a Python API and a suite of command-line scripts, with all intermediate and final outputs stored as flat files for easy portability and inspection.

Quick Start & Requirements

  • Install: pip install latentscope
  • Prerequisites: Python 3.12 recommended. Optional API keys for OpenAI or Mistral.
  • Setup: Run ls-init <data_dir> [--openai_key=XXX] [--mistral_key=YYY] followed by ls-serve. Access via http://localhost:5001.
  • Docs: https://github.com/enjalot/latent-scope#getting-started

Highlighted Details

  • Supports local execution of open-source embedding models and proprietary API services.
  • All data and process metadata are stored as flat files (Parquet, JSON, PNG) for portability.
  • Interactive web UI allows seamless switching between different processing configurations (scopes).
  • Command-line scripts enable reproducible, step-by-step data processing and exploration.

Maintenance & Community

The project is actively maintained by the author, enjalot. Further details on contributing and the roadmap are available in CONTRIBUTION.md and DEVELOPMENT.md.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration with closed-source projects.

Limitations & Caveats

The README does not specify licensing, which may impact commercial adoption. While it supports various models, the integration of new embedding or chat models might require code modifications.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 20 hours ago
Feedback? Help us improve.