recurrentgemma by google-deepmind

Open-weights language model based on the Griffin architecture

Created 1 year ago

660 stars

Top 51.0% on SourcePulse

1 Expert Loves This Project

Edward-Sun

Research Scientist at Meta Superintelligence Lab

Project Summary

RecurrentGemma provides open-weights language models based on Google DeepMind's Griffin architecture, designed for efficient long-sequence generation through a hybrid attention-recurrence mechanism. It targets researchers and developers needing high-performance LLMs for tasks involving extended text, offering optimized Flax and reference PyTorch implementations.

How It Works

The Griffin architecture replaces global attention with a combination of local attention and linear recurrences. This approach significantly speeds up inference for long sequences by reducing the computational complexity associated with traditional self-attention mechanisms, making it more efficient for generating lengthy outputs.

Quick Start & Requirements

Installation: Use Poetry (poetry install -E full) or pip (pip install .[full]). Library-specific installs are available (-E jax, -E torch, -E test).
Model Weights: Download from Kaggle (http://kaggle.com/models/google/recurrentgemma).
Running Examples: python examples/sampling_jax.py --path_checkpoint=/path/to/weights --path_tokenizer=/path/to/tokenizer.model
Colab Notebooks: Available for sampling and fine-tuning (requires Kaggle account and license acceptance).
Hardware: Supports CPU, GPU, and TPU. Flax implementation is optimized for TPUs with Pallas kernels. Sampling is supported on T4, P100, V100, A100, TPUv2, and TPUv3+. Fine-tuning is supported on T4, P100, V100, A100, and TPUv3+.

Highlighted Details

Novel Griffin architecture with local attention and linear recurrences for fast long-sequence generation.
Optimized Flax implementation with low-level Pallas kernels for TPU performance.
Reference PyTorch implementation provided.
Includes tutorials for sampling and fine-tuning via Colab notebooks.

Maintenance & Community

Open to bug reports and issues; details for PRs are in CONTRIBUTING.md.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. Permissive for commercial use and closed-source linking.

Limitations & Caveats

Fine-tuning is not supported on TPUv2.
Requires accepting Gemma license terms and conditions from Kaggle to use Colab notebooks.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

5 stars in the last 30 days

Explore Similar Projects

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai) and

Jesse Clark

Jesse Clark(Cofounder of Marqo).

neural-cherche by raphaelsty

Library for neural search model fine-tuning and efficient inference

Created 2 years ago

Updated 10 months ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

2 more.

recurrent-pretraining by seal-rg

Pretraining code for depth-recurrent language model research

Created 11 months ago

Updated 1 week ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Ji Yichao

Ji Yichao(Cofounder of Manus), and

1 more.

OLMoE by allenai

Open MoE language model research paper

Created 1 year ago

Updated 3 months ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 1 year ago

Updated 1 day ago

Starred by

Théophile Gervet

Théophile Gervet(Cofounder of Genesis AI),

Jason Knight

Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and

7 more.

lingua by facebookresearch

LLM research codebase for training and inference

Created 1 year ago

Updated 5 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA) and

Alex Chen

Alex Chen(Cofounder of Nexa AI).

EasyR1 by hiyouga

RL training framework for multi-modality models

Created 10 months ago

Updated 6 days ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic),

Jesse Clark

Jesse Clark(Cofounder of Marqo), and

12 more.

text-embeddings-inference by huggingface

Inference solution for text embeddings models

Created 2 years ago

Updated 22 hours ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Tim J. Baek

Tim J. Baek(Founder of Open WebUI), and

7 more.

gemma.cpp by google

C++ inference engine for Google's Gemma models

Created 1 year ago

Updated 2 days ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

4 more.

gemma by google-deepmind

JAX library for using and fine-tuning Gemma LLMs

Created 1 year ago

Updated 2 days ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM), and

15 more.

torchtitan by pytorch

PyTorch platform for generative AI model training research

Created 2 years ago

Updated 1 day ago

Starred by

Beyang Liu

Beyang Liu(Cofounder of Sourcegraph),

Chaoyu Yang

Chaoyu Yang(Founder of Bento), and

12 more.

OpenRLHF by OpenRLHF

RLHF framework for scalable training of large language models

Created 2 years ago

Updated 3 days ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Jasper Zhang

Jasper Zhang(Cofounder of Hyperbolic), and

21 more.

lm-evaluation-harness by EleutherAI

Framework for few-shot language model evaluation

Created 5 years ago

Updated 3 days ago

Feedback? Help us improve.