RAG-Anything  by HKUDS

All-in-one multimodal RAG system

created 1 month ago
1,952 stars

Top 22.9% on sourcepulse

GitHubView on GitHub
Project Summary

RAG-Anything is an all-in-one Retrieval-Augmented Generation (RAG) system designed to process and query complex documents containing diverse multimodal content, including text, images, tables, and equations. It targets researchers, technical writers, and knowledge management professionals who need to extract insights from rich, mixed-content documents without relying on multiple specialized tools. The system offers a unified interface for seamless multimodal document analysis and querying.

How It Works

RAG-Anything employs a multi-stage multimodal pipeline that extends traditional RAG architectures. It begins with document parsing using MinerU for high-fidelity extraction and adaptive content decomposition across various formats (PDFs, Office docs, images). Next, a concurrent multi-pipeline architecture processes textual and multimodal content separately and in parallel. Specialized analyzers handle visual content, structured data, and mathematical expressions, with an extensible handler for custom types. The system constructs a multimodal knowledge graph by extracting entities and cross-modal relationships, followed by modality-aware retrieval that fuses vector similarity search with graph traversal for contextually integrated information delivery.

Quick Start & Requirements

  • Installation: pip install raganything or pip install 'raganything[all]' for all optional dependencies. Install from source via git clone and pip install -e ..
  • Prerequisites: Office document processing requires LibreOffice installation. GPU acceleration is supported via MinerU configuration.
  • API Keys: OpenAI API keys are required for full RAG processing with LLM integration.
  • Documentation: Usage examples are available in the examples/ directory.

Highlighted Details

  • End-to-end multimodal pipeline from ingestion to query answering.
  • Universal document support including PDFs, Office documents, images, and text files.
  • Specialized processors for images, tables, and mathematical equations with LaTeX support.
  • Multimodal knowledge graph construction for enhanced understanding and retrieval.

Maintenance & Community

The project has reached 1K stars on GitHub. Links to related projects like LightRAG, VideoRAG, and MiniRAG are provided. Contributions are welcomed.

Licensing & Compatibility

The project is released under the Apache 2.0 license, which permits commercial use and linking with closed-source applications.

Limitations & Caveats

Full multimodal processing and LLM integration require API keys for services like OpenAI. Office document processing depends on a separate LibreOffice installation. The system's performance and accuracy will be influenced by the quality of the underlying MinerU parsing and the chosen LLM models.

Health Check
Last commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
14
Issues (30d)
24
Star History
1,998 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 18 hours ago
Feedback? Help us improve.