efficientvit  by mit-han-lab

Vision models for high-resolution generation/perception tasks

created 2 years ago
3,005 stars

Top 16.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides EfficientViT, a family of lightweight vision foundation models designed for high-resolution generation and perception tasks. It offers accelerated versions of models like Segment Anything (SAM) and enables efficient high-resolution diffusion models through its Deep Compression Autoencoder (DC-AE). The target audience includes researchers and developers working on efficient computer vision, particularly for deployment on resource-constrained devices or for high-throughput applications.

How It Works

EfficientViT utilizes a novel architecture that incorporates multi-scale attention mechanisms, allowing it to efficiently process high-resolution images. The Deep Compression Autoencoder (DC-AE) family offers high spatial compression ratios (up to 128x) while maintaining reconstruction quality, significantly accelerating latent diffusion models. EfficientViT-SAM replaces the heavy image encoder in SAM with EfficientViT, achieving substantial speedups (e.g., 48.9x TensorRT speedup on A100) without accuracy loss.

Quick Start & Requirements

  • Install: conda create -n efficientvit python=3.10, conda activate efficientvit, pip install -U -r requirements.txt
  • Prerequisites: Python 3.10, Conda environment. Specific models may require GPU acceleration (e.g., TensorRT for EfficientViT-SAM).
  • Resources: Pretrained models are available. Training and evaluation may require significant computational resources.
  • Links:

Highlighted Details

  • DC-AE+USiT-2B achieves 1.72 FID on ImageNet 512x512, surpassing SOTA diffusion and autoregressive models.
  • EfficientViT-SAM offers a 48.9x measured TensorRT speedup over SAM-ViT-H without accuracy loss.
  • DC-AE enables efficient text-to-image generation on laptops (e.g., SANA project).
  • EfficientViT backbones are integrated into Grounding DINO 1.5 Edge and MedficientSAM (1st place in CVPR 2024 Segment Anything In Medical Images On Laptop Challenge).

Maintenance & Community

The project is associated with the MIT Han Lab. Notable integrations include NVIDIA Jetson Generative AI Lab, timm, X-AnyLabeling, and Grounding DINO 1.5 Edge. Papers have been accepted to ICLR 2025, CVPR 2024, and ICCV 2023.

Licensing & Compatibility

The README does not explicitly state the license. However, the project is open-source and has been integrated into various third-party projects, suggesting broad compatibility.

Limitations & Caveats

While aiming for efficiency, achieving maximum performance (e.g., TensorRT speedups) may require specific hardware and software configurations. The project is actively developing new models and features, with recent updates indicating ongoing research and releases.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
201 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.