self-rag by AkariAsai

Self-RAG implementation for learning retrieval, generation, and critique via self-reflection

Created 2 years ago

2,289 stars

Top 19.7% on SourcePulse

View on GitHub

6 Experts Love This Project

Andreas Jansson

Cofounder of Replicate

Pawel Garbacki

Cofounder of Fireworks AI

and 2 more!

Project Summary

Self-RAG is a framework for training Large Language Models (LLMs) to retrieve, generate, and critique text, enhancing factual accuracy and quality. It targets researchers and developers aiming to improve LLM factuality without sacrificing versatility, offering on-demand retrieval and self-critique capabilities.

How It Works

Self-RAG integrates retrieval and critique as integral parts of the generation process. The model learns to predict "reflection tokens" to assess retrieved passages and its own generations across multiple fine-grained aspects. A segment-wise beam search then selects outputs that maximize user-defined preferences, allowing for dynamic retrieval or skipping based on query needs.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Recommended inference: vllm library (ensure latest version for skip_special_tokens parameter).
Models available on HuggingFace Hub (e.g., selfrag/selfrag_llama2_7b).
For retrieval setup, requires downloading corpus and embeddings (9GB demo data provided).
Official website: https://selfrag.github.io/
Models: 7B Model, 13B Model

Highlighted Details

Achieved ICLR 2024 Oral (top 1%) presentation.
Supports adaptive retrieval, no retrieval, or always retrieve modes.
Segment-wise beam search for preference optimization.
Training data (150K instances) and critic/generator models are available.

Maintenance & Community

Initial release in October 2023.
Contact: Open GitHub issues mentioning @AkariAsai or email akari[at]cs.washington.edu.
Community-trained Mistral-7B version available: SciPhi-Self-RAG-Mistral-7B-32k.

Licensing & Compatibility

License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Full Wikipedia retrieval requires significant RAM (100GB+) and multiple GPUs.
Current implementation is optimized for specific evaluation datasets; a more user-friendly interface is planned.
Long-form generation inference code is still under development for speed and memory optimization.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days