HippoRAG  by OSU-NLP-Group

RAG framework for LLMs, inspired by human long-term memory

created 1 year ago
2,616 stars

Top 18.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

HippoRAG is a novel Retrieval-Augmented Generation (RAG) framework designed to imbue Large Language Models (LLMs) with human-like long-term memory capabilities. It enables continuous knowledge integration from external documents, enhancing associative reasoning and sense-making in complex contexts. The framework is optimized for cost and latency efficiency in online operations and requires fewer resources for offline indexing compared to graph-based RAG alternatives.

How It Works

HippoRAG integrates RAG with knowledge graphs and Personalized PageRank algorithms. It processes documents to extract factual triples, builds a knowledge graph, and then uses graph traversal techniques to improve retrieval. This approach allows LLMs to better recognize and utilize connections within new knowledge, mimicking human long-term memory functions for improved multi-hop retrieval and context integration.

Quick Start & Requirements

Highlighted Details

  • Achieves state-of-the-art performance in factual memory, sense-making, and associativity across multiple benchmarks.
  • Supports OpenAI models and locally deployed vLLM instances for LLM inference.
  • Offers efficient offline indexing compared to other graph-based RAG solutions.
  • Includes comprehensive testing scripts for OpenAI and local deployments.

Maintenance & Community

  • Developed by the OSU-NLP-Group, with contributions from Bernal Jiménez Gutiérrez, Yiheng Shu, and Yu Su.
  • Contact: File an issue or contact the developers directly.
  • Citation details provided for both HippoRAG and HippoRAG 2 papers.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Support for additional embedding models and embedding endpoints is listed as a TODO.
  • Integration with vector databases is also a planned feature.
  • Custom dataset formatting requires adherence to specific JSON structures for corpus and queries.
Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
11
Star History
350 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 21 hours ago
Feedback? Help us improve.