gemma_pytorch  by google

PyTorch implementation for Google's Gemma models

Created 1 year ago
5,547 stars

Top 9.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for Google's Gemma family of large language models, offering text-only and multimodal variants. It targets researchers and developers seeking to leverage state-of-the-art, lightweight models derived from Google's Gemini research, with support for inference across CPU, GPU, and TPU.

How It Works

The implementation utilizes PyTorch and PyTorch/XLA for efficient model execution. It supports various Gemma model sizes (1B to 27B parameters) and versions (v1.1, v2, v3, CodeGemma), with pre-trained and instruction-tuned checkpoints available on Kaggle and Hugging Face. The project includes scripts for running inference, with options for quantization.

Quick Start & Requirements

  • Installation: Docker is the primary method for running inference.
    • Build PyTorch image: DOCKER_URI=gemma:${USER} docker build -f docker/Dockerfile ./ -t ${DOCKER_URI}
    • Build PyTorch/XLA image (CPU/TPU): DOCKER_URI=gemma_xla:${USER} docker build -f docker/xla.Dockerfile ./ -t ${DOCKER_URI}
    • Build PyTorch/XLA image (GPU): DOCKER_URI=gemma_xla_gpu:${USER} docker build -f docker/xla_gpu.Dockerfile ./ -t ${DOCKER_URI}
  • Prerequisites: Docker, model checkpoints (downloadable via huggingface-cli or Kaggle).
  • Resources: Requires significant disk space for model checkpoints. GPU/TPU recommended for performance.
  • Documentation: Gemma on Google AI, Colab Notebook

Highlighted Details

  • Supports Gemma v1.1, v2, v3, and CodeGemma models.
  • Inference available for CPU, GPU, and TPU via PyTorch and PyTorch/XLA.
  • Includes multimodal model variants.
  • Offers int8 quantization for reduced memory footprint.

Maintenance & Community

The project is actively updated with new Gemma versions. Model checkpoints are hosted on Kaggle and Hugging Face.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. Model weights are subject to the Gemma Terms of Use. Compatibility for commercial use or closed-source linking depends on the specific Gemma model license.

Limitations & Caveats

The README states this is "not an officially supported Google product." The tokenizer reserves 99 unused tokens for fine-tuning purposes.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 30 days

Explore Similar Projects

Starred by Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech) and Clément Renault Clément Renault(Cofounder of Meilisearch).

lm.rs by samuel-vitorino

0%
1k
Minimal LLM inference in Rust
Created 1 year ago
Updated 10 months ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.2%
2k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 14 hours ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.