gemma-tuner-multimodal by mattmireles

Fine-tune multimodal Gemma models on Apple Silicon

Created 3 months ago

1,482 stars

Top 27.0% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

This repository provides a specialized tool for fine-tuning Google's Gemma language models, enabling multimodal capabilities (text, image, audio) directly on Apple Silicon Macs. It targets engineers and researchers who need to adapt Gemma for specific tasks without relying on expensive cloud GPUs or large local storage, offering efficient LoRA-based training and the ability to stream massive datasets from cloud storage.

How It Works

The project leverages Hugging Face's Gemma checkpoints and PEFT's LoRA (Low-Rank Adaptation) for efficient fine-tuning. It utilizes PyTorch with Metal Performance Shaders (MPS) for native acceleration on Apple Silicon, eliminating the need for CUDA. The system supports text-only, image+text (captioning, VQA), and audio+text fine-tuning. A key innovation is its ability to stream data directly from Google Cloud Storage (GCS) or BigQuery, allowing training on terabyte-scale datasets without requiring local disk space.

Quick Start & Requirements

Primary install: pip install -e . (within a Python 3.10+ virtual environment).
Prerequisites: macOS 12.3+ (for MPS), native arm64 Python installation, 16 GB+ RAM (32 GB+ recommended). Hugging Face authentication (huggingface-cli login or HF_TOKEN) is required for gated Gemma weights. Optional pip install .[gcp] for BigQuery/GCS streaming.
Links: Source: github.com/mattmireles/gemma-tuner-multimodal. Guides and specifications are available within the repository's README/guides/ and README/specifications/ directories.
Setup Time: Initial setup (including model download) takes minutes; subsequent training runs can start in seconds.

Highlighted Details

Apple Silicon Native Multimodal Training: Fine-tune Gemma models on text, images, and audio using PyTorch/MPS, bypassing NVIDIA GPU requirements.
Cloud Data Streaming: Train on datasets exceeding local storage capacity by streaming shards on-demand from GCS or BigQuery.
Real-time In-Browser Visualizer: Monitor training progress with live loss curves, attention heatmaps, gradient signals, memory usage, and token predictions without external tools like TensorBoard.
LoRA for Gemma: Efficiently fine-tune Gemma 3n and 4 models, significantly reducing computational and memory overhead.

Maintenance & Community

The project acknowledges contributions from Google's Gemma team, Hugging Face, and PyTorch MPS maintainers. Specific community links (Discord, Slack) or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: The permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Larger Gemma 4 models (e.g., 26B/31B) are not yet supported due to architectural differences. Some utility commands may not fully support Gemma 4 IDs. Text-only training in v1 still loads audio tower weights into memory. Careful management of MPS fallback behavior is advised to prevent silent CPU usage.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days