Discover and explore top open-source AI tools and projects—updated daily.
appleBridging retrieval and generation for efficient RAG
Top 39.9% on SourcePulse
CLaRa addresses limitations in Retrieval-Augmented Generation (RAG) by unifying retrieval and generation optimization through continuous latent reasoning and efficient document compression. It targets researchers and engineers seeking to improve RAG efficiency and semantic preservation, offering significant compression rates (32x-64x) while maintaining high performance on question-answering tasks.
How It Works
CLaRa employs a novel three-stage training approach to overcome disjoint optimization and semantic bias in compressed representations. Stage 1 (Compression Pretraining) uses a Salient Compressor Pretraining (SCP) framework with QA-based supervision to retain key semantics. Stage 2 (Compression Instruction Tuning) fine-tunes the compressor on instruction-following tasks. Stage 3 (End-to-End Fine-tuning) jointly trains a reranker and generator in a shared continuous space using a differentiable top-k estimator, unifying retrieval and generation optimization to avoid redundant encoding.
Quick Start & Requirements
Setup involves cloning the repository, creating a conda environment (python=3.10), and installing dependencies (pip install -r requirements.txt). Key requirements include PyTorch >= 2.0, Transformers >= 4.20, DeepSpeed >= 0.18, Flash Attention 2, and Accelerate. Data must be prepared in JSONL format for each stage. Training is initiated via shell scripts (scripts/train_pretraining.sh, scripts/train_instruction_tuning.sh, scripts/train_stage_end_to_end.sh). A video instruction guide is available: https://youtu.be/al2VoAKn8GU.
Highlighted Details
Maintenance & Community
Implementation built upon the OpenRLHF framework. Models are available on Huggingface (link not provided). No explicit community channels (Discord/Slack) or roadmap detailed.
Licensing & Compatibility
License type is not specified in the README, posing a potential blocker for commercial adoption or integration.
Limitations & Caveats
This is a research-oriented project; production readiness is not explicitly stated. The lack of clear licensing information is a significant adoption hurdle. Multi-stage training and specific dependencies (e.g., Flash Attention 2) may complicate setup.
3 weeks ago
Inactive
ContextualAI