smol-vision by merveenoyan

Recipes for vision and multimodal AI model shrinking, optimization, and customization

Created 1 year ago

1,840 stars

Top 23.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

Smol Vision provides practical recipes for optimizing and customizing cutting-edge vision and multimodal AI models. It targets researchers and engineers looking to reduce model size, improve inference speed, and adapt models for specific tasks, offering a collection of runnable examples.

How It Works

The project leverages libraries like Hugging Face Transformers, Optimum, ONNX Runtime, and PyTorch's torch.compile to implement various optimization techniques. These include quantization (e.g., using Quanto), knowledge distillation, and ONNX export for faster inference. For fine-tuning, it demonstrates methods like QLoRA for efficient adaptation of large vision-language models (VLMs).

Quick Start & Requirements

Installation typically involves cloning the repository and installing dependencies via pip install -r requirements.txt.
Requires Python 3.8+ and PyTorch. Specific examples may require GPU acceleration and CUDA.
Links to specific notebooks and scripts are provided within the README for individual recipes.

Highlighted Details

Demonstrates quantization of state-of-the-art models like OWLv2 with Optimum ONNX Runtime.
Features fine-tuning recipes for VLMs such as PaliGemma, Florence-2, and IDEFICS3.
Includes examples for knowledge distillation in image classification and optimizing inference with torch.compile.
Showcases multimodal RAG pipelines using models like ColPali and Qwen2-VL.

Maintenance & Community

The repository is maintained by Merve Noyan. Further community engagement details (e.g., Discord, Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README snippet. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some "SOON" features indicate ongoing development. The project focuses on specific optimization techniques and model architectures, and broader model support or general-purpose optimization tools are not guaranteed.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

46 stars in the last 30 days