raptor by parthsarthi03

Retrieval-augmented language model research paper

Created 1 year ago

1,525 stars

Top 26.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

RAPTOR offers a novel retrieval-augmented generation (RAG) approach by building a recursive tree structure from documents, enabling more efficient and context-aware information retrieval. It is designed for researchers and developers working with large text corpora who need to improve the accuracy and relevance of language model responses.

How It Works

RAPTOR constructs a hierarchical tree of summaries from input documents. It recursively summarizes chunks of text, then summarizes those summaries, creating an abstractive tree. This structure allows for targeted retrieval of relevant information by traversing the tree, leading to more precise answers from language models. The framework is extensible, allowing users to integrate custom summarization, question-answering, and embedding models.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Requires Python 3.8+
Requires an OpenAI API key set as an environment variable (OPENAI_API_KEY).
See demo.ipynb for examples with custom models.

Highlighted Details

Implements a recursive summarization strategy to build a tree-like document index.
Supports integration of custom summarization, QA, and embedding models (e.g., Llama, Mistral, SBERT).
Allows saving and loading of the constructed document tree for persistence.
Cited at ICLR 2024.

Maintenance & Community

The project is the official implementation of the RAPTOR paper, co-authored by Christopher D. Manning. Further examples and configuration guides are planned.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The project is marked as "Work in Progress" (WIP) with forthcoming documentation and advanced features. Initial setup requires an OpenAI API key, and custom model integration details are still being developed.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

36 stars in the last 30 days