gemma-tuner-multimodal  by mattmireles

Fine-tune multimodal Gemma models on Apple Silicon

Created 4 days ago

New!

1,159 stars

Top 33.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a specialized tool for fine-tuning Google's Gemma language models, enabling multimodal capabilities (text, image, audio) directly on Apple Silicon Macs. It targets engineers and researchers who need to adapt Gemma for specific tasks without relying on expensive cloud GPUs or large local storage, offering efficient LoRA-based training and the ability to stream massive datasets from cloud storage.

How It Works

The project leverages Hugging Face's Gemma checkpoints and PEFT's LoRA (Low-Rank Adaptation) for efficient fine-tuning. It utilizes PyTorch with Metal Performance Shaders (MPS) for native acceleration on Apple Silicon, eliminating the need for CUDA. The system supports text-only, image+text (captioning, VQA), and audio+text fine-tuning. A key innovation is its ability to stream data directly from Google Cloud Storage (GCS) or BigQuery, allowing training on terabyte-scale datasets without requiring local disk space.

Quick Start & Requirements

  • Primary install: pip install -e . (within a Python 3.10+ virtual environment).
  • Prerequisites: macOS 12.3+ (for MPS), native arm64 Python installation, 16 GB+ RAM (32 GB+ recommended). Hugging Face authentication (huggingface-cli login or HF_TOKEN) is required for gated Gemma weights. Optional pip install .[gcp] for BigQuery/GCS streaming.
  • Links: Source: github.com/mattmireles/gemma-tuner-multimodal. Guides and specifications are available within the repository's README/guides/ and README/specifications/ directories.
  • Setup Time: Initial setup (including model download) takes minutes; subsequent training runs can start in seconds.

Highlighted Details

  • Apple Silicon Native Multimodal Training: Fine-tune Gemma models on text, images, and audio using PyTorch/MPS, bypassing NVIDIA GPU requirements.
  • Cloud Data Streaming: Train on datasets exceeding local storage capacity by streaming shards on-demand from GCS or BigQuery.
  • Real-time In-Browser Visualizer: Monitor training progress with live loss curves, attention heatmaps, gradient signals, memory usage, and token predictions without external tools like TensorBoard.
  • LoRA for Gemma: Efficiently fine-tune Gemma 3n and 4 models, significantly reducing computational and memory overhead.

Maintenance & Community

The project acknowledges contributions from Google's Gemma team, Hugging Face, and PyTorch MPS maintainers. Specific community links (Discord, Slack) or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Larger Gemma 4 models (e.g., 26B/31B) are not yet supported due to architectural differences. Some utility commands may not fully support Gemma 4 IDs. Text-only training in v1 still loads audio tower weights into memory. Careful management of MPS fallback behavior is advised to prevent silent CPU usage.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
12
Star History
1,163 stars in the last 4 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0.2%
419
Lightweight training framework for model pre-training
Created 2 years ago
Updated 7 months ago
Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

ultravox by fixie-ai

0.2%
4k
Multimodal LLM for real-time voice interactions
Created 1 year ago
Updated 4 months ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Travis Fischer Travis Fischer(Founder of Agentic), and
8 more.

corenet by apple

0.0%
7k
DNN toolkit for training standard and novel models
Created 2 years ago
Updated 6 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
5 more.

ai-toolkit by ostris

0.8%
10k
Training toolkit for finetuning diffusion models
Created 2 years ago
Updated 20 hours ago
Feedback? Help us improve.