self-rag  by AkariAsai

Self-RAG implementation for learning retrieval, generation, and critique via self-reflection

Created 1 year ago
2,199 stars

Top 20.6% on SourcePulse

GitHubView on GitHub
Project Summary

Self-RAG is a framework for training Large Language Models (LLMs) to retrieve, generate, and critique text, enhancing factual accuracy and quality. It targets researchers and developers aiming to improve LLM factuality without sacrificing versatility, offering on-demand retrieval and self-critique capabilities.

How It Works

Self-RAG integrates retrieval and critique as integral parts of the generation process. The model learns to predict "reflection tokens" to assess retrieved passages and its own generations across multiple fine-grained aspects. A segment-wise beam search then selects outputs that maximize user-defined preferences, allowing for dynamic retrieval or skipping based on query needs.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Recommended inference: vllm library (ensure latest version for skip_special_tokens parameter).
  • Models available on HuggingFace Hub (e.g., selfrag/selfrag_llama2_7b).
  • For retrieval setup, requires downloading corpus and embeddings (9GB demo data provided).
  • Official website: https://selfrag.github.io/
  • Models: 7B Model, 13B Model

Highlighted Details

  • Achieved ICLR 2024 Oral (top 1%) presentation.
  • Supports adaptive retrieval, no retrieval, or always retrieve modes.
  • Segment-wise beam search for preference optimization.
  • Training data (150K instances) and critic/generator models are available.

Maintenance & Community

  • Initial release in October 2023.
  • Contact: Open GitHub issues mentioning @AkariAsai or email akari[at]cs.washington.edu.
  • Community-trained Mistral-7B version available: SciPhi-Self-RAG-Mistral-7B-32k.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Full Wikipedia retrieval requires significant RAM (100GB+) and multiple GPUs.
  • Current implementation is optimized for specific evaluation datasets; a more user-friendly interface is planned.
  • Long-form generation inference code is still under development for speed and memory optimization.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
33 stars in the last 30 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

Hands-On-Large-Language-Models by HandsOnLLM

1.4%
16k
Code examples for "Hands-On Large Language Models" book
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.