HippoRAG  by OSU-NLP-Group

RAG framework for LLMs, inspired by human long-term memory

Created 2 years ago
3,545 stars

Top 13.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

HippoRAG is a novel Retrieval-Augmented Generation (RAG) framework designed to imbue Large Language Models (LLMs) with human-like long-term memory capabilities. It enables continuous knowledge integration from external documents, enhancing associative reasoning and sense-making in complex contexts. The framework is optimized for cost and latency efficiency in online operations and requires fewer resources for offline indexing compared to graph-based RAG alternatives.

How It Works

HippoRAG integrates RAG with knowledge graphs and Personalized PageRank algorithms. It processes documents to extract factual triples, builds a knowledge graph, and then uses graph traversal techniques to improve retrieval. This approach allows LLMs to better recognize and utilize connections within new knowledge, mimicking human long-term memory functions for improved multi-hop retrieval and context integration.

Quick Start & Requirements

Highlighted Details

  • Achieves state-of-the-art performance in factual memory, sense-making, and associativity across multiple benchmarks.
  • Supports OpenAI models and locally deployed vLLM instances for LLM inference.
  • Offers efficient offline indexing compared to other graph-based RAG solutions.
  • Includes comprehensive testing scripts for OpenAI and local deployments.

Maintenance & Community

  • Developed by the OSU-NLP-Group, with contributions from Bernal Jiménez Gutiérrez, Yiheng Shu, and Yu Su.
  • Contact: File an issue or contact the developers directly.
  • Citation details provided for both HippoRAG and HippoRAG 2 papers.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Support for additional embedding models and embedding endpoints is listed as a TODO.
  • Integration with vector databases is also a planned feature.
  • Custom dataset formatting requires adherence to specific JSON structures for corpus and queries.
Health Check
Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
108 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.