HiRAG  by hhy-huang

Retrieval-Augmented Generation with Hierarchical Knowledge

Created 6 months ago
365 stars

Top 77.1% on SourcePulse

GitHubView on GitHub
Project Summary

HiRAG addresses the challenge of improving retrieval-augmented generation (RAG) by incorporating hierarchical knowledge. It is designed for researchers and developers working with large language models who need to enhance the accuracy and comprehensiveness of generated text by providing more structured and relevant information during the retrieval process. The primary benefit is a significant improvement in response quality across various metrics compared to existing RAG methods.

How It Works

HiRAG implements a hierarchical retrieval mechanism that organizes knowledge into a tree-like structure. This allows the model to first retrieve broader, high-level information and then progressively drill down into more specific details. This approach is advantageous because it mimics human cognitive processes for information retrieval, leading to more contextually relevant and accurate results. The hierarchical structure helps in disambiguating information and providing more focused answers, especially for complex queries.

Quick Start & Requirements

  • Install: pip install -e . (after cloning the repository)
  • Prerequisites: Python, and potentially specific LLM configurations (e.g., DeepSeek, ChatGLM, OpenAI) and API keys as detailed in ./config.yaml.
  • Usage: The README provides Python code snippets for initializing HiRAG, inserting context, and performing queries with hierarchical retrieval. Examples for integrating with third-party retrieval APIs are also available in the ./ directory.

Highlighted Details

  • Achieves significantly higher scores across Comprehensiveness, Empowerment, and Diversity metrics compared to Naive RAG, GraphRAG, LightRAG, FastGraphRAG, and KAG.
  • Demonstrates near-perfect scores (e.g., 99.2% on Comprehensiveness for the Mix dataset) when compared to FastGraphRAG.
  • Supports various retrieval modes, including hierarchical, naive, and combinations of local/global/bridge knowledge.
  • The evaluation framework allows for testing with different datasets from Hugging Face and various LLM backends.

Maintenance & Community

  • The project is associated with the paper "Retrieval-Augmented Generation with Hierarchical Knowledge" accepted to EMNLP 2025 Findings.
  • Acknowledgements mention the use of open-source projects like nano-graphrag and RAPTOR.
  • Citation details are provided for the associated paper.

Licensing & Compatibility

  • The README does not explicitly state a license. Further clarification on licensing and compatibility for commercial or closed-source use would be necessary.

Limitations & Caveats

  • The README does not specify any explicit limitations or known issues. However, the absence of a stated license could be a significant adoption blocker for commercial applications. The setup might also require careful configuration of API keys and LLM parameters.
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
144 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.