Project_Golem  by CyberMagician

Interactive 3D RAG memory visualizer

Created 1 month ago
255 stars

Top 98.7% on SourcePulse

GitHubView on GitHub
Project Summary

Project Golem provides a novel 3D interface for visualizing Retrieval-Augmented Generation (RAG) memory structures in real-time. It is designed for AI researchers and developers who need to understand and debug the semantic relationships within their RAG systems, offering a visual representation of how concepts are associated and queried, thereby enhancing interpretability.

How It Works

The project visualizes high-dimensional embeddings (768d) by projecting them down to a 3D interactive space using UMAP for dimensionality reduction. It leverages Google's embedding-gemma-300m for vectorization and LanceDB or local NumPy for vector storage and fast retrieval. The frontend, built with Three.js and WebGL, renders this "cortex," dynamically highlighting specific neural pathways related to user queries to enable visual debugging of concept association.

Quick Start & Requirements

  • Primary install: pip install -r requirements.txt
  • Non-default prerequisites: GPU (MPS on Mac recommended for speed), Python.
  • Setup:
    1. Run python ingest.py to scrape Wikipedia, vectorize data, and generate golem_cortex.json and golem_vectors.npy. This step requires a GPU for reasonable performance.
    2. Run python GolemServer.py to start the backend server.
    3. Access the visualization at http://localhost:8000.
  • Customization: Edit TARGETS in ingest.py to point to custom datasets (PDFs, Obsidian vaults). Integration with external vector DBs like Qdrant/Pinecone is possible by fetching vectors, applying UMAP, and modifying server.py.

Highlighted Details

  • Utilizes Google embedding-gemma-300m via sentence-transformers.
  • Employs UMAP for dimensionality reduction to 3D.
  • Supports LanceDB for storage and local NumPy for fast cosine similarity.
  • Frontend built with Three.js and WebGL for interactive 3D visualization.
  • Allows customization of knowledge sources and integration with external vector databases.

Maintenance & Community

No specific details on maintainers, community channels (like Discord/Slack), sponsorships, or roadmap were found in the provided README.

Licensing & Compatibility

The README does not specify a software license. This lack of information requires clarification for adoption, particularly concerning commercial use or integration into closed-source projects.

Limitations & Caveats

Project Golem is described as an "experiment." The ingestion process requires a GPU for acceptable speed. The absence of a specified license is a significant caveat for potential adopters.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Dominik Moritz Dominik Moritz(Research Scientist at Apple; Professor at CMU) and Casey Caruso Casey Caruso(Managing Partner of Topology Ventures).

latent-scope by enjalot

0%
750
Scientific tool for latent space investigation
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.