RAG-Anything  by HKUDS

All-in-one multimodal RAG system

Created 3 months ago
5,038 stars

Top 9.9% on SourcePulse

GitHubView on GitHub
Project Summary

RAG-Anything is an all-in-one Retrieval-Augmented Generation (RAG) system designed to process and query complex documents containing diverse multimodal content, including text, images, tables, and equations. It targets researchers, technical writers, and knowledge management professionals who need to extract insights from rich, mixed-content documents without relying on multiple specialized tools. The system offers a unified interface for seamless multimodal document analysis and querying.

How It Works

RAG-Anything employs a multi-stage multimodal pipeline that extends traditional RAG architectures. It begins with document parsing using MinerU for high-fidelity extraction and adaptive content decomposition across various formats (PDFs, Office docs, images). Next, a concurrent multi-pipeline architecture processes textual and multimodal content separately and in parallel. Specialized analyzers handle visual content, structured data, and mathematical expressions, with an extensible handler for custom types. The system constructs a multimodal knowledge graph by extracting entities and cross-modal relationships, followed by modality-aware retrieval that fuses vector similarity search with graph traversal for contextually integrated information delivery.

Quick Start & Requirements

  • Installation: pip install raganything or pip install 'raganything[all]' for all optional dependencies. Install from source via git clone and pip install -e ..
  • Prerequisites: Office document processing requires LibreOffice installation. GPU acceleration is supported via MinerU configuration.
  • API Keys: OpenAI API keys are required for full RAG processing with LLM integration.
  • Documentation: Usage examples are available in the examples/ directory.

Highlighted Details

  • End-to-end multimodal pipeline from ingestion to query answering.
  • Universal document support including PDFs, Office documents, images, and text files.
  • Specialized processors for images, tables, and mathematical equations with LaTeX support.
  • Multimodal knowledge graph construction for enhanced understanding and retrieval.

Maintenance & Community

The project has reached 1K stars on GitHub. Links to related projects like LightRAG, VideoRAG, and MiniRAG are provided. Contributions are welcomed.

Licensing & Compatibility

The project is released under the Apache 2.0 license, which permits commercial use and linking with closed-source applications.

Limitations & Caveats

Full multimodal processing and LLM integration require API keys for services like OpenAI. Office document processing depends on a separate LibreOffice installation. The system's performance and accuracy will be influenced by the quality of the underlying MinerU parsing and the chosen LLM models.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
9
Issues (30d)
20
Star History
1,406 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Chenlin Meng Chenlin Meng(Cofounder of Pika), and
9 more.

clip-retrieval by rom1504

0.2%
3k
CLIP retrieval system for semantic search
Created 4 years ago
Updated 1 month ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

LightRAG by HKUDS

1.2%
21k
RAG framework for fast, simple retrieval-augmented generation
Created 11 months ago
Updated 2 days ago
Feedback? Help us improve.