opencraig by opencraig

Production-ready RAG with structure-aware reasoning

Created 3 months ago

915 stars

Top 38.9% on SourcePulse

Project Summary

Summary

ForgeRAG provides a production-ready Retrieval-Augmented Generation (RAG) system, addressing naive approach limitations with structure-aware reasoning, knowledge graph (KG) multi-hop traversal, and LLM tree navigation. It delivers grounded answers with pixel-precise, verifiable citations, targeting engineers and researchers seeking advanced RAG capabilities and superior performance.

How It Works

ForgeRAG mimics domain expert reasoning: BM25/vector search retrieves candidates, a KG connects concepts, and LLM tree navigation pinpoints exact information. This fused approach handles multi-hop queries via KG path extraction and dual-level retrieval. It injects a distilled KG knowledge layer into the LLM prompt, grounding answers in original text and providing pixel-precise citations for verification, mitigating hallucination risks.

Quick Start & Requirements

Prerequisites: Python 3.10+, Node.js 18+ (frontend), LLM API key (LiteLLM compatible). Recommended: 4+ CPU, 8GB+ RAM (16GB+ for KG extraction). Local setup: clone, install dependencies (pip, npm), configure LLM keys (scripts/setup.py), run main.py. Docker: docker compose up -d. Web UI at http://localhost:8000. MinerU recommended for complex PDFs. Extensive documentation available.

Highlighted Details

Dual-reasoning retrieval: Fuses BM25, vector search, LLM tree navigation, and KG retrieval via RRF.
Pixel-precise citations: Every claim links to exact page/bounding box within source documents.
Full retrieval tracing: Inspect query paths, expansion decisions, and merge logic.
Multi-turn conversations: Supports context-aware follow-ups.
Multi-format ingestion: Handles PDF, DOCX, PPTX, XLSX, HTML, Markdown, TXT.
Performance: Outperforms LightRAG with a 55.48% overall win rate on the UltraDomain benchmark.

Maintenance & Community

Roadmap includes expanded benchmarks, scaling to 1M+ documents, multi-language support, a Python SDK, and improved configuration diagnostics. A contributing guide is provided. Related projects like LightRAG, GraphRAG, and PageIndex are noted.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

The UltraDomain benchmark evaluates Comprehensiveness, Diversity, and Empowerment, but not factual accuracy. While ForgeRAG provides citations for verification, the benchmark's focus means factual correctness is not directly measured. KG extraction can be resource-intensive, requiring significant RAM for large documents.

opencraig by opencraig

Explore Similar Projects

Diver by AQ-MedAI

LeanRAG by KnowledgeXLab

KG-LLM-MDQA by yuwvandy

RAG-QA-Generator by wangxb96

pageindex-mcp by VectifyAI

graphrag-rs by automataIA

rag-all-in-one by lehoanglong95

TrustRAG by gomate-community

paper-qa by Future-House

llm_wiki by nashsu

RAG_Techniques by NirDiamant

ragflow by infiniflow