gemma_pytorch by google

PyTorch implementation for Google's Gemma models

Created 1 year ago

5,591 stars

Top 9.0% on SourcePulse

View on GitHub

6 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Jeff Hammerbacher

Cofounder of Cloudera

Zack Li

Cofounder of Nexa AI

Omar Sanseviero

DevRel at Google DeepMind

and 2 more!

Project Summary

This repository provides the official PyTorch implementation for Google's Gemma family of large language models, offering text-only and multimodal variants. It targets researchers and developers seeking to leverage state-of-the-art, lightweight models derived from Google's Gemini research, with support for inference across CPU, GPU, and TPU.

How It Works

The implementation utilizes PyTorch and PyTorch/XLA for efficient model execution. It supports various Gemma model sizes (1B to 27B parameters) and versions (v1.1, v2, v3, CodeGemma), with pre-trained and instruction-tuned checkpoints available on Kaggle and Hugging Face. The project includes scripts for running inference, with options for quantization.

Quick Start & Requirements

Installation: Docker is the primary method for running inference.
- Build PyTorch image: DOCKER_URI=gemma:${USER} docker build -f docker/Dockerfile ./ -t ${DOCKER_URI}
- Build PyTorch/XLA image (CPU/TPU): DOCKER_URI=gemma_xla:${USER} docker build -f docker/xla.Dockerfile ./ -t ${DOCKER_URI}
- Build PyTorch/XLA image (GPU): DOCKER_URI=gemma_xla_gpu:${USER} docker build -f docker/xla_gpu.Dockerfile ./ -t ${DOCKER_URI}
Prerequisites: Docker, model checkpoints (downloadable via huggingface-cli or Kaggle).
Resources: Requires significant disk space for model checkpoints. GPU/TPU recommended for performance.
Documentation: Gemma on Google AI, Colab Notebook

Highlighted Details

Supports Gemma v1.1, v2, v3, and CodeGemma models.
Inference available for CPU, GPU, and TPU via PyTorch and PyTorch/XLA.
Includes multimodal model variants.
Offers int8 quantization for reduced memory footprint.

Maintenance & Community

The project is actively updated with new Gemma versions. Model checkpoints are hosted on Kaggle and Hugging Face.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. Model weights are subject to the Gemma Terms of Use. Compatibility for commercial use or closed-source linking depends on the specific Gemma model license.

Limitations & Caveats

The README states this is "not an officially supported Google product." The tokenizer reserves 99 unused tokens for fine-tuning purposes.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days