smol-vision  by merveenoyan

Recipes for vision and multimodal AI model shrinking, optimization, and customization

created 1 year ago
1,540 stars

Top 27.5% on sourcepulse

GitHubView on GitHub
Project Summary

Smol Vision provides practical recipes for optimizing and customizing cutting-edge vision and multimodal AI models. It targets researchers and engineers looking to reduce model size, improve inference speed, and adapt models for specific tasks, offering a collection of runnable examples.

How It Works

The project leverages libraries like Hugging Face Transformers, Optimum, ONNX Runtime, and PyTorch's torch.compile to implement various optimization techniques. These include quantization (e.g., using Quanto), knowledge distillation, and ONNX export for faster inference. For fine-tuning, it demonstrates methods like QLoRA for efficient adaptation of large vision-language models (VLMs).

Quick Start & Requirements

  • Installation typically involves cloning the repository and installing dependencies via pip install -r requirements.txt.
  • Requires Python 3.8+ and PyTorch. Specific examples may require GPU acceleration and CUDA.
  • Links to specific notebooks and scripts are provided within the README for individual recipes.

Highlighted Details

  • Demonstrates quantization of state-of-the-art models like OWLv2 with Optimum ONNX Runtime.
  • Features fine-tuning recipes for VLMs such as PaliGemma, Florence-2, and IDEFICS3.
  • Includes examples for knowledge distillation in image classification and optimizing inference with torch.compile.
  • Showcases multimodal RAG pipelines using models like ColPali and Qwen2-VL.

Maintenance & Community

The repository is maintained by Merve Noyan. Further community engagement details (e.g., Discord, Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README snippet. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some "SOON" features indicate ongoing development. The project focuses on specific optimization techniques and model architectures, and broader model support or general-purpose optimization tools are not guaranteed.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
129 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.