gritlm by ContextualAI

Research paper and models for generative representational instruction tuning

Created 1 year ago

681 stars

Top 49.9% on SourcePulse

View on GitHub

7 Experts Love This Project

Jesse Clark

Cofounder of Marqo

Amanpreet Singh

Cofounder of Contextual AI

Binyuan Hui

Research Scientist at Alibaba Qwen

Xiaofan Luan

VP Engineering at Zilliz

and 3 more!

Project Summary

This repository provides the code and resources for Generative Representational Instruction Tuning (GRIT), a novel method for training large language models to excel at both text generation and embedding tasks. It targets researchers and developers seeking to unify these capabilities in a single model, offering significant speedups for applications like Retrieval-Augmented Generation (RAG).

How It Works

GRIT trains a single model to handle both generative and embedding tasks by using specific instructions to differentiate between them. This approach, implemented via custom modeling files that enable bidirectional attention for embeddings and causal attention for generation, allows for a unified model that achieves state-of-the-art performance on embedding benchmarks (MTEB) and competitive results on generative tasks, without performance degradation compared to specialized models.

Quick Start & Requirements

Install via pip: pip install gritlm
Requires PyTorch (tested with 2.2.0), CUDA 12.2.
For multi-GPU loading of larger models (e.g., 8x7B), device_map="auto" is recommended.
Official documentation and examples are available in the README.

Highlighted Details

GritLM-7B achieves 66.8 on MTEB and 55.5 on generative tasks.
GritLM-8x7B achieves 65.7 on MTEB and 65.7 on generative tasks, outperforming other open models.
GRIT speeds up RAG by over 60% for long documents by eliminating the need for separate retrieval and generation models.
Supports caching mechanisms for both document and query embeddings to optimize inference.

Maintenance & Community

Developed by ContextualAI, with contributions welcomed.
Model weights and training logs are available on Hugging Face and Weights & Biases, respectively.
Links to relevant papers, videos, and slides are provided.

Licensing & Compatibility

The repository and models are released under a permissive license, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

Training with DeepSpeed and --mode unified with gradient_accumulation_steps > 1 is not supported.
torch.compile fails in unified mode.
QLoRA/LoRA integration is not well-tested.
Multi-node training with FSDP may encounter checkpoint saving timeouts; requires careful configuration of accelerate and transformers.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days