lennyhub-rag  by traversaal-ai

Production RAG system for podcast knowledge exploration

Created 1 month ago
502 stars

Top 62.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

LennyHub RAG offers a production-ready Retrieval-Augmented Generation system built on 297 podcast transcripts featuring industry leaders. It provides structured access to expert insights on product management, growth, and leadership. The system features a user-friendly setup, an interactive web interface, and advanced knowledge graph-based retrieval, benefiting researchers and professionals seeking curated expert knowledge.

How It Works

This RAG system uses the RAG-Anything framework with LightRAG for entity and relationship extraction (GPT-4o-mini) and OpenAI's text-embedding-3-small for embeddings. Data is stored locally in Qdrant. Queries leverage a hybrid search strategy, combining local entity-focused, global relationship-focused, and pure vector similarity searches for comprehensive results, synthesized by GPT-4o-mini.

Quick Start & Requirements

Installation involves cloning the repository and installing Python dependencies (pip install -r requirements.txt). A crucial prerequisite is an OpenAI API key. The automated setup script, setup_rag.py, handles Qdrant installation and data indexing. A quick test with 10 transcripts takes approximately 5 minutes; processing 50 transcripts in parallel takes 6-8 minutes; and indexing all 297 transcripts in parallel requires 25-35 minutes. Recommended RAM is 4GB+ for full indexing.

Highlighted Details

  • One-Command Setup: setup_rag.py automates installation, Qdrant setup, and data indexing.
  • Visual Web Interface: A Streamlit application provides an interactive querying experience with status monitoring and transcript browsing.
  • Interactive Knowledge Graph: Visualizes connections between 544 individuals mentioned across transcripts, featuring a clickable network visualization.
  • Local Qdrant: Utilizes Qdrant for local, production-grade vector storage without Docker.
  • Advanced Retrieval: Employs LightRAG for entity and relationship extraction, enabling sophisticated RAG capabilities.
  • Parallel Processing: Offers 5-10x faster indexing compared to sequential methods.

Maintenance & Community

Contributions are welcomed, with a clear project structure and comprehensive documentation provided. Specific details on active maintainers, community channels (like Discord/Slack), or sponsorship are not detailed in the README.

Licensing & Compatibility

The project's license is specified in a separate LICENSE file. Commercial use compatibility is not explicitly stated but depends on the terms of OpenAI's API and Qdrant's licensing.

Limitations & Caveats

Operation is dependent on an active OpenAI API key, incurring per-query costs (though caching significantly reduces this). System performance and storage requirements scale with the number of transcripts processed, necessitating adequate RAM and disk space.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
42 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

RAG-Anything by HKUDS

1.4%
14k
All-in-one multimodal RAG system
Created 8 months ago
Updated 1 day ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

LightRAG by HKUDS

0.8%
29k
RAG framework for fast, simple retrieval-augmented generation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.