Discover and explore top open-source AI tools and projects—updated daily.
mattmirelesFine-tune multimodal Gemma models on Apple Silicon
New!
Top 33.1% on SourcePulse
This repository provides a specialized tool for fine-tuning Google's Gemma language models, enabling multimodal capabilities (text, image, audio) directly on Apple Silicon Macs. It targets engineers and researchers who need to adapt Gemma for specific tasks without relying on expensive cloud GPUs or large local storage, offering efficient LoRA-based training and the ability to stream massive datasets from cloud storage.
How It Works
The project leverages Hugging Face's Gemma checkpoints and PEFT's LoRA (Low-Rank Adaptation) for efficient fine-tuning. It utilizes PyTorch with Metal Performance Shaders (MPS) for native acceleration on Apple Silicon, eliminating the need for CUDA. The system supports text-only, image+text (captioning, VQA), and audio+text fine-tuning. A key innovation is its ability to stream data directly from Google Cloud Storage (GCS) or BigQuery, allowing training on terabyte-scale datasets without requiring local disk space.
Quick Start & Requirements
pip install -e . (within a Python 3.10+ virtual environment).huggingface-cli login or HF_TOKEN) is required for gated Gemma weights. Optional pip install .[gcp] for BigQuery/GCS streaming.Highlighted Details
Maintenance & Community
The project acknowledges contributions from Google's Gemma team, Hugging Face, and PyTorch MPS maintainers. Specific community links (Discord, Slack) or roadmap details are not explicitly provided in the README.
Licensing & Compatibility
Limitations & Caveats
Larger Gemma 4 models (e.g., 26B/31B) are not yet supported due to architectural differences. Some utility commands may not fully support Gemma 4 IDs. Text-only training in v1 still loads audio tower weights into memory. Careful management of MPS fallback behavior is advised to prevent silent CPU usage.
1 day ago
Inactive
InternLM
ml-explore
fixie-ai
apple
ostris
NVIDIA