arag  by Ayanami0730

Agentic RAG for scalable, multi-hop question answering

Created 2 months ago
256 stars

Top 98.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

A-RAG is an advanced Retrieval-Augmented Generation (RAG) framework designed to overcome the limitations of static RAG systems by enabling LLMs to autonomously control retrieval. It targets researchers and developers building sophisticated multi-hop question-answering systems, offering improved accuracy and scalability by leveraging LLM reasoning for dynamic information retrieval.

How It Works

A-RAG operates on three core principles: autonomous strategy selection, iterative execution, and interleaved tool use within a ReAct-like loop. It exposes hierarchical retrieval interfaces—keyword search, semantic search, and chunk reading—directly to the LLM. This allows the agent to dynamically adapt its retrieval strategy across different granularities (keyword, sentence, chunk) based on task characteristics, enabling more efficient and context-aware information gathering compared to traditional Graph RAG or predefined Workflow RAG paradigms.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using uv sync --extra full or pip install -e ".[full]". uv is recommended.
  • Prerequisites: Python, CUDA-enabled GPU (for indexing and inference), embedding models (e.g., Qwen/Qwen3-Embedding-0.6B), OpenAI API key, and compatible OpenAI API endpoint.
  • Setup: Building the embedding index requires specifying chunk data, output directory, embedding model, and device. Running agents requires setting environment variables for API keys and model details.
  • Links: Paper, Website, GitHub, HuggingFace for datasets.

Highlighted Details

  • Hierarchical Retrieval: Offers keyword, sentence-level semantic, and chunk-level access for flexible information gathering.
  • True Agentic Autonomy: Implements autonomous strategy, iterative execution, and interleaved tool use for dynamic RAG.
  • Test-Time Scaling: Performance scales with increased compute resources.
  • Context Efficiency: Achieves superior accuracy with comparable or fewer retrieved tokens.
  • Benchmark Performance: Claims state-of-the-art results on multi-hop QA datasets like MuSiQue and HotpotQA, outperforming various baselines.

Maintenance & Community

Contributions are welcomed. The project is associated with authors from arXiv:2602.03442. Specific community channels or active maintainer information are not detailed in the README.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

The roadmap indicates planned support for additional benchmarks and LLM providers (Anthropic, Gemini), suggesting current implementation is primarily focused on OpenAI-compatible APIs. Features like ablation studies and visualization tools are also listed as future work.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
27 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.