huggingface-gemma-recipes by huggingface

Gemma model recipes for multimodal AI

Created 6 months ago

276 stars

Top 93.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides minimal, ready-to-use code examples for interacting with Hugging Face's Gemma family of models, targeting developers and researchers looking for quick integration of multimodal AI capabilities. It simplifies inference and fine-tuning across text, image, and audio modalities.

How It Works

The recipes leverage the 🤗 Transformers library for model loading, processing, and generation. It utilizes the pipeline abstraction for straightforward inference and provides detailed examples using AutoProcessor and AutoModelForImageTextToText for more granular control. The approach supports interleaved multimodal inputs, allowing for complex prompts combining text, images, and audio.

Quick Start & Requirements

Install with: $ pip install -U -q transformers timm
Requires torch and a compatible CUDA-enabled GPU for optimal performance.
Links: Gemma 3n Conversational Fine tuning 2B on a Free Colab Notebook, Gemma 3n Multimodal Finetuning 2B/4B on a Free Colab Notebook, Multimodal inference using Gemma 3n via pipeline

Highlighted Details

Demonstrates multimodal inference with text, image, and audio inputs.
Offers comprehensive fine-tuning recipes, including conversational, multimodal, and retrieval-augmented generation (RAG) use cases.
Includes examples using Unsloth for optimized fine-tuning performance.
Provides scripts for fine-tuning with TRL and integrating object detection capabilities.

Maintenance & Community

This repository is maintained by Hugging Face. Community interaction and support are typically channeled through Hugging Face's official platforms.

Licensing & Compatibility

The repository itself appears to be under a permissive license, but the underlying Gemma models have specific usage terms set by Google. Users must adhere to both.

Limitations & Caveats

While designed for ease of use, advanced fine-tuning or specific multimodal integrations might require deeper understanding of the underlying libraries and model architectures. Some Colab notebooks may have resource limitations on free tiers.

Health Check

Last Commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days