RAG-Anything by HKUDS

All-in-one multimodal RAG system

Created 8 months ago

13,739 stars

Top 3.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

RAG-Anything is an all-in-one Retrieval-Augmented Generation (RAG) system designed to process and query complex documents containing diverse multimodal content, including text, images, tables, and equations. It targets researchers, technical writers, and knowledge management professionals who need to extract insights from rich, mixed-content documents without relying on multiple specialized tools. The system offers a unified interface for seamless multimodal document analysis and querying.

How It Works

RAG-Anything employs a multi-stage multimodal pipeline that extends traditional RAG architectures. It begins with document parsing using MinerU for high-fidelity extraction and adaptive content decomposition across various formats (PDFs, Office docs, images). Next, a concurrent multi-pipeline architecture processes textual and multimodal content separately and in parallel. Specialized analyzers handle visual content, structured data, and mathematical expressions, with an extensible handler for custom types. The system constructs a multimodal knowledge graph by extracting entities and cross-modal relationships, followed by modality-aware retrieval that fuses vector similarity search with graph traversal for contextually integrated information delivery.

Quick Start & Requirements

Installation: pip install raganything or pip install 'raganything[all]' for all optional dependencies. Install from source via git clone and pip install -e ..
Prerequisites: Office document processing requires LibreOffice installation. GPU acceleration is supported via MinerU configuration.
API Keys: OpenAI API keys are required for full RAG processing with LLM integration.
Documentation: Usage examples are available in the examples/ directory.

Highlighted Details

End-to-end multimodal pipeline from ingestion to query answering.
Universal document support including PDFs, Office documents, images, and text files.
Specialized processors for images, tables, and mathematical equations with LaTeX support.
Multimodal knowledge graph construction for enhanced understanding and retrieval.

Maintenance & Community

The project has reached 1K stars on GitHub. Links to related projects like LightRAG, VideoRAG, and MiniRAG are provided. Contributions are welcomed.

Licensing & Compatibility

The project is released under the Apache 2.0 license, which permits commercial use and linking with closed-source applications.

Limitations & Caveats

Full multimodal processing and LLM integration require API keys for services like OpenAI. Office document processing depends on a separate LibreOffice installation. The system's performance and accuracy will be influenced by the quality of the underlying MinerU parsing and the chosen LLM models.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1,475 stars in the last 30 days