raptor  by parthsarthi03

Retrieval-augmented language model research paper

created 1 year ago
1,336 stars

Top 30.7% on sourcepulse

GitHubView on GitHub
Project Summary

RAPTOR offers a novel retrieval-augmented generation (RAG) approach by building a recursive tree structure from documents, enabling more efficient and context-aware information retrieval. It is designed for researchers and developers working with large text corpora who need to improve the accuracy and relevance of language model responses.

How It Works

RAPTOR constructs a hierarchical tree of summaries from input documents. It recursively summarizes chunks of text, then summarizes those summaries, creating an abstractive tree. This structure allows for targeted retrieval of relevant information by traversing the tree, leading to more precise answers from language models. The framework is extensible, allowing users to integrate custom summarization, question-answering, and embedding models.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires Python 3.8+
  • Requires an OpenAI API key set as an environment variable (OPENAI_API_KEY).
  • See demo.ipynb for examples with custom models.

Highlighted Details

  • Implements a recursive summarization strategy to build a tree-like document index.
  • Supports integration of custom summarization, QA, and embedding models (e.g., Llama, Mistral, SBERT).
  • Allows saving and loading of the constructed document tree for persistence.
  • Cited at ICLR 2024.

Maintenance & Community

The project is the official implementation of the RAPTOR paper, co-authored by Christopher D. Manning. Further examples and configuration guides are planned.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The project is marked as "Work in Progress" (WIP) with forthcoming documentation and advanced features. Initial setup requires an OpenAI API key, and custom model integration details are still being developed.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
142 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
5 more.

BERTopic by MaartenGr

0.2%
7k
Topic modeling with transformers and c-TF-IDF
created 4 years ago
updated 3 weeks ago
Feedback? Help us improve.