ragflow  by infiniflow

Open-source RAG engine for deep document understanding

Created 1 year ago
64,611 stars

Top 0.3% on SourcePulse

GitHubView on GitHub
Project Summary

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine designed for deep document understanding and truthful question-answering. It targets businesses and individuals seeking to extract reliable insights from complex, multi-format data, offering a streamlined RAG workflow with grounded citations and reduced hallucinations.

How It Works

RAGFlow employs a deep document understanding approach to extract knowledge from unstructured data, including complex formats and even images within documents. It supports handling virtually unlimited token contexts and features template-based chunking for intelligent, explainable data segmentation. The system prioritizes grounded citations with visualizations for traceability and offers a configurable RAG orchestration with multiple recall and fused re-ranking strategies.

Quick Start & Requirements

  • Install/Run: Docker Compose is the primary method.
  • Prerequisites: CPU >= 4 cores, RAM >= 16 GB, Disk >= 50 GB, Docker >= 24.0.0, Docker Compose >= v2.26.1. Requires vm.max_map_count set to at least 262144.
  • Setup: Cloning the repo and running docker compose -f docker-compose.yml up -d (CPU) or docker compose -f docker-compose-gpu.yml up -d (GPU).
  • Docs: Documentation

Highlighted Details

  • Supports multi-modal models for understanding images within PDFs/DOCX.
  • Integrates with Internet search (Tavily) for deep research capabilities.
  • Handles heterogeneous data sources including Word, slides, Excel, images, structured data, and web pages.
  • Offers text-to-SQL statement generation via RAG.

Maintenance & Community

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

  • Docker images are built for x86 platforms; ARM64 requires custom builds.
  • Switching the document engine to Infinity is not officially supported on Linux/arm64.
Health Check
Last Commit

12 hours ago

Responsiveness

1 day

Pull Requests (30d)
252
Issues (30d)
329
Star History
2,199 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Nir Gazit Nir Gazit(Cofounder of Traceloop), and
4 more.

llmware by llmware-ai

0.6%
14k
Framework for enterprise RAG pipelines using small, specialized models
Created 2 years ago
Updated 1 month ago
Feedback? Help us improve.