LightRAG  by HKUDS

RAG framework for fast, simple retrieval-augmented generation

created 10 months ago
18,945 stars

Top 2.4% on sourcepulse

GitHubView on GitHub
Project Summary

LightRAG is a Python library designed for efficient and straightforward Retrieval-Augmented Generation (RAG). It aims to simplify the RAG pipeline for developers and researchers by offering flexible storage options, multiple retrieval modes, and easy integration with various LLM and embedding models. The library supports advanced features like knowledge graph integration, custom prompt engineering, and conversation history, enabling more sophisticated and context-aware AI applications.

How It Works

LightRAG employs a modular architecture that separates data indexing, retrieval, and generation. It supports hybrid search strategies combining vector similarity with knowledge graph traversal for richer context. The system allows users to inject custom LLM and embedding functions, offering compatibility with OpenAI-like APIs, Hugging Face models, and Ollama. Data can be stored in various backends, including simple JSON key-value stores, PostgreSQL (with pgvector and AGE), Neo4j, and Faiss, providing flexibility for different deployment needs.

Quick Start & Requirements

  • Installation: pip install "lightrag-hku[api]" or pip install -e . from source.
  • Prerequisites: Python, and optionally an OpenAI API key for default models.
  • Demo: A quick start demo using OpenAI is available, requiring an OPENAI_API_KEY environment variable. The demo downloads a document and runs a query. See examples/lightrag_openai_demo.py.

Highlighted Details

  • Supports multiple storage backends: JSON KV, PostgreSQL, Neo4j, Faiss, Chroma, Milvus, Qdrant, Redis, MongoDB.
  • Offers diverse retrieval modes: "local", "global", "hybrid", "naive", "mix".
  • Integrates knowledge graph capabilities for structured data.
  • Enables custom prompts and conversation history for multi-turn dialogues.
  • Provides tools for token usage tracking and data export.

Maintenance & Community

  • Active development with frequent updates and new feature releases (e.g., citation functionality, PostgreSQL support, GUI).
  • Community support is available via a Discord channel.
  • Links to documentation and introductory videos are provided.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README, but the project is open-source. Compatibility for commercial use would require verification of the license terms.

Limitations & Caveats

  • Some sample files are community contributions and may not be fully tested or optimized.
  • Specific storage backends like Apache AGE might have known issues requiring compilation from source.
  • Performance tuning, especially for low-RAM GPUs, may require careful configuration of context window sizes and model choices.
Health Check
Last commit

15 hours ago

Responsiveness

1 day

Pull Requests (30d)
84
Issues (30d)
310
Star History
3,191 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
20 more.

llama_index by run-llama

0.3%
43k
Data framework for building LLM-powered agents
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.