recurrentgemma  by google-deepmind

Open-weights language model based on the Griffin architecture

Created 1 year ago
651 stars

Top 51.2% on SourcePulse

GitHubView on GitHub
Project Summary

RecurrentGemma provides open-weights language models based on Google DeepMind's Griffin architecture, designed for efficient long-sequence generation through a hybrid attention-recurrence mechanism. It targets researchers and developers needing high-performance LLMs for tasks involving extended text, offering optimized Flax and reference PyTorch implementations.

How It Works

The Griffin architecture replaces global attention with a combination of local attention and linear recurrences. This approach significantly speeds up inference for long sequences by reducing the computational complexity associated with traditional self-attention mechanisms, making it more efficient for generating lengthy outputs.

Quick Start & Requirements

  • Installation: Use Poetry (poetry install -E full) or pip (pip install .[full]). Library-specific installs are available (-E jax, -E torch, -E test).
  • Model Weights: Download from Kaggle (http://kaggle.com/models/google/recurrentgemma).
  • Running Examples: python examples/sampling_jax.py --path_checkpoint=/path/to/weights --path_tokenizer=/path/to/tokenizer.model
  • Colab Notebooks: Available for sampling and fine-tuning (requires Kaggle account and license acceptance).
  • Hardware: Supports CPU, GPU, and TPU. Flax implementation is optimized for TPUs with Pallas kernels. Sampling is supported on T4, P100, V100, A100, TPUv2, and TPUv3+. Fine-tuning is supported on T4, P100, V100, A100, and TPUv3+.

Highlighted Details

  • Novel Griffin architecture with local attention and linear recurrences for fast long-sequence generation.
  • Optimized Flax implementation with low-level Pallas kernels for TPU performance.
  • Reference PyTorch implementation provided.
  • Includes tutorials for sampling and fine-tuning via Colab notebooks.

Maintenance & Community

  • Open to bug reports and issues; details for PRs are in CONTRIBUTING.md.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0. Permissive for commercial use and closed-source linking.

Limitations & Caveats

  • Fine-tuning is not supported on TPUv2.
  • Requires accepting Gemma license terms and conditions from Kaggle to use Colab notebooks.
Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0%
827
Pretraining code for depth-recurrent language model research
Created 7 months ago
Updated 1 week ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 21 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
7 more.

gemma.cpp by google

0.1%
7k
C++ inference engine for Google's Gemma models
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.