self-rag  by AkariAsai

Self-RAG implementation for learning retrieval, generation, and critique via self-reflection

created 1 year ago
2,154 stars

Top 21.4% on sourcepulse

GitHubView on GitHub
Project Summary

Self-RAG is a framework for training Large Language Models (LLMs) to retrieve, generate, and critique text, enhancing factual accuracy and quality. It targets researchers and developers aiming to improve LLM factuality without sacrificing versatility, offering on-demand retrieval and self-critique capabilities.

How It Works

Self-RAG integrates retrieval and critique as integral parts of the generation process. The model learns to predict "reflection tokens" to assess retrieved passages and its own generations across multiple fine-grained aspects. A segment-wise beam search then selects outputs that maximize user-defined preferences, allowing for dynamic retrieval or skipping based on query needs.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Recommended inference: vllm library (ensure latest version for skip_special_tokens parameter).
  • Models available on HuggingFace Hub (e.g., selfrag/selfrag_llama2_7b).
  • For retrieval setup, requires downloading corpus and embeddings (9GB demo data provided).
  • Official website: https://selfrag.github.io/
  • Models: 7B Model, 13B Model

Highlighted Details

  • Achieved ICLR 2024 Oral (top 1%) presentation.
  • Supports adaptive retrieval, no retrieval, or always retrieve modes.
  • Segment-wise beam search for preference optimization.
  • Training data (150K instances) and critic/generator models are available.

Maintenance & Community

  • Initial release in October 2023.
  • Contact: Open GitHub issues mentioning @AkariAsai or email akari[at]cs.washington.edu.
  • Community-trained Mistral-7B version available: SciPhi-Self-RAG-Mistral-7B-32k.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Full Wikipedia retrieval requires significant RAM (100GB+) and multiple GPUs.
  • Current implementation is optimized for specific evaluation datasets; a more user-friendly interface is planned.
  • Long-form generation inference code is still under development for speed and memory optimization.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
101 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.