gemma_pytorch  by google

PyTorch implementation for Google's Gemma models

Created 1 year ago
5,565 stars

Top 9.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for Google's Gemma family of large language models, offering text-only and multimodal variants. It targets researchers and developers seeking to leverage state-of-the-art, lightweight models derived from Google's Gemini research, with support for inference across CPU, GPU, and TPU.

How It Works

The implementation utilizes PyTorch and PyTorch/XLA for efficient model execution. It supports various Gemma model sizes (1B to 27B parameters) and versions (v1.1, v2, v3, CodeGemma), with pre-trained and instruction-tuned checkpoints available on Kaggle and Hugging Face. The project includes scripts for running inference, with options for quantization.

Quick Start & Requirements

  • Installation: Docker is the primary method for running inference.
    • Build PyTorch image: DOCKER_URI=gemma:${USER} docker build -f docker/Dockerfile ./ -t ${DOCKER_URI}
    • Build PyTorch/XLA image (CPU/TPU): DOCKER_URI=gemma_xla:${USER} docker build -f docker/xla.Dockerfile ./ -t ${DOCKER_URI}
    • Build PyTorch/XLA image (GPU): DOCKER_URI=gemma_xla_gpu:${USER} docker build -f docker/xla_gpu.Dockerfile ./ -t ${DOCKER_URI}
  • Prerequisites: Docker, model checkpoints (downloadable via huggingface-cli or Kaggle).
  • Resources: Requires significant disk space for model checkpoints. GPU/TPU recommended for performance.
  • Documentation: Gemma on Google AI, Colab Notebook

Highlighted Details

  • Supports Gemma v1.1, v2, v3, and CodeGemma models.
  • Inference available for CPU, GPU, and TPU via PyTorch and PyTorch/XLA.
  • Includes multimodal model variants.
  • Offers int8 quantization for reduced memory footprint.

Maintenance & Community

The project is actively updated with new Gemma versions. Model checkpoints are hosted on Kaggle and Hugging Face.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. Model weights are subject to the Gemma Terms of Use. Compatibility for commercial use or closed-source linking depends on the specific Gemma model license.

Limitations & Caveats

The README states this is "not an officially supported Google product." The tokenizer reserves 99 unused tokens for fine-tuning purposes.

Health Check
Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
2
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech) and Clément Renault Clément Renault(Cofounder of Meilisearch).

lm.rs by samuel-vitorino

0.2%
1k
Minimal LLM inference in Rust
Created 1 year ago
Updated 1 year ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.1%
3k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 5 hours ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.7%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.