gritlm  by ContextualAI

Research paper and models for generative representational instruction tuning

Created 1 year ago
672 stars

Top 50.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the code and resources for Generative Representational Instruction Tuning (GRIT), a novel method for training large language models to excel at both text generation and embedding tasks. It targets researchers and developers seeking to unify these capabilities in a single model, offering significant speedups for applications like Retrieval-Augmented Generation (RAG).

How It Works

GRIT trains a single model to handle both generative and embedding tasks by using specific instructions to differentiate between them. This approach, implemented via custom modeling files that enable bidirectional attention for embeddings and causal attention for generation, allows for a unified model that achieves state-of-the-art performance on embedding benchmarks (MTEB) and competitive results on generative tasks, without performance degradation compared to specialized models.

Quick Start & Requirements

  • Install via pip: pip install gritlm
  • Requires PyTorch (tested with 2.2.0), CUDA 12.2.
  • For multi-GPU loading of larger models (e.g., 8x7B), device_map="auto" is recommended.
  • Official documentation and examples are available in the README.

Highlighted Details

  • GritLM-7B achieves 66.8 on MTEB and 55.5 on generative tasks.
  • GritLM-8x7B achieves 65.7 on MTEB and 65.7 on generative tasks, outperforming other open models.
  • GRIT speeds up RAG by over 60% for long documents by eliminating the need for separate retrieval and generation models.
  • Supports caching mechanisms for both document and query embeddings to optimize inference.

Maintenance & Community

  • Developed by ContextualAI, with contributions welcomed.
  • Model weights and training logs are available on Hugging Face and Weights & Biases, respectively.
  • Links to relevant papers, videos, and slides are provided.

Licensing & Compatibility

  • The repository and models are released under a permissive license, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Training with DeepSpeed and --mode unified with gradient_accumulation_steps > 1 is not supported.
  • torch.compile fails in unified mode.
  • QLoRA/LoRA integration is not well-tested.
  • Multi-node training with FSDP may encounter checkpoint saving timeouts; requires careful configuration of accelerate and transformers.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

Hands-On-Large-Language-Models by HandsOnLLM

1.4%
16k
Code examples for "Hands-On Large Language Models" book
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.