gritlm  by ContextualAI

Research paper and models for generative representational instruction tuning

created 1 year ago
662 stars

Top 51.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the code and resources for Generative Representational Instruction Tuning (GRIT), a novel method for training large language models to excel at both text generation and embedding tasks. It targets researchers and developers seeking to unify these capabilities in a single model, offering significant speedups for applications like Retrieval-Augmented Generation (RAG).

How It Works

GRIT trains a single model to handle both generative and embedding tasks by using specific instructions to differentiate between them. This approach, implemented via custom modeling files that enable bidirectional attention for embeddings and causal attention for generation, allows for a unified model that achieves state-of-the-art performance on embedding benchmarks (MTEB) and competitive results on generative tasks, without performance degradation compared to specialized models.

Quick Start & Requirements

  • Install via pip: pip install gritlm
  • Requires PyTorch (tested with 2.2.0), CUDA 12.2.
  • For multi-GPU loading of larger models (e.g., 8x7B), device_map="auto" is recommended.
  • Official documentation and examples are available in the README.

Highlighted Details

  • GritLM-7B achieves 66.8 on MTEB and 55.5 on generative tasks.
  • GritLM-8x7B achieves 65.7 on MTEB and 65.7 on generative tasks, outperforming other open models.
  • GRIT speeds up RAG by over 60% for long documents by eliminating the need for separate retrieval and generation models.
  • Supports caching mechanisms for both document and query embeddings to optimize inference.

Maintenance & Community

  • Developed by ContextualAI, with contributions welcomed.
  • Model weights and training logs are available on Hugging Face and Weights & Biases, respectively.
  • Links to relevant papers, videos, and slides are provided.

Licensing & Compatibility

  • The repository and models are released under a permissive license, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Training with DeepSpeed and --mode unified with gradient_accumulation_steps > 1 is not supported.
  • torch.compile fails in unified mode.
  • QLoRA/LoRA integration is not well-tested.
  • Multi-node training with FSDP may encounter checkpoint saving timeouts; requires careful configuration of accelerate and transformers.
Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
38 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.