LightRAG by HKUDS

RAG framework for fast, simple retrieval-augmented generation

Created 1 year ago

27,191 stars

Top 1.4% on SourcePulse

View on GitHub

4 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Jason Knight

Director AI Compilers at NVIDIA; Cofounder of OctoML

Project Summary

LightRAG is a Python library designed for efficient and straightforward Retrieval-Augmented Generation (RAG). It aims to simplify the RAG pipeline for developers and researchers by offering flexible storage options, multiple retrieval modes, and easy integration with various LLM and embedding models. The library supports advanced features like knowledge graph integration, custom prompt engineering, and conversation history, enabling more sophisticated and context-aware AI applications.

How It Works

LightRAG employs a modular architecture that separates data indexing, retrieval, and generation. It supports hybrid search strategies combining vector similarity with knowledge graph traversal for richer context. The system allows users to inject custom LLM and embedding functions, offering compatibility with OpenAI-like APIs, Hugging Face models, and Ollama. Data can be stored in various backends, including simple JSON key-value stores, PostgreSQL (with pgvector and AGE), Neo4j, and Faiss, providing flexibility for different deployment needs.

Quick Start & Requirements

Installation: pip install "lightrag-hku[api]" or pip install -e . from source.
Prerequisites: Python, and optionally an OpenAI API key for default models.
Demo: A quick start demo using OpenAI is available, requiring an OPENAI_API_KEY environment variable. The demo downloads a document and runs a query. See examples/lightrag_openai_demo.py.

Highlighted Details

Supports multiple storage backends: JSON KV, PostgreSQL, Neo4j, Faiss, Chroma, Milvus, Qdrant, Redis, MongoDB.
Offers diverse retrieval modes: "local", "global", "hybrid", "naive", "mix".
Integrates knowledge graph capabilities for structured data.
Enables custom prompts and conversation history for multi-turn dialogues.
Provides tools for token usage tracking and data export.

Maintenance & Community

Active development with frequent updates and new feature releases (e.g., citation functionality, PostgreSQL support, GUI).
Community support is available via a Discord channel.
Links to documentation and introductory videos are provided.

Licensing & Compatibility

The specific license is not explicitly stated in the README, but the project is open-source. Compatibility for commercial use would require verification of the license terms.

Limitations & Caveats

Some sample files are community contributions and may not be fully tested or optimized.
Specific storage backends like Apache AGE might have known issues requiring compilation from source.
Performance tuning, especially for low-RAM GPUs, may require careful configuration of context window sizes and model choices.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1,390 stars in the last 30 days