Vision models for high-resolution generation/perception tasks
Top 16.3% on sourcepulse
This repository provides EfficientViT, a family of lightweight vision foundation models designed for high-resolution generation and perception tasks. It offers accelerated versions of models like Segment Anything (SAM) and enables efficient high-resolution diffusion models through its Deep Compression Autoencoder (DC-AE). The target audience includes researchers and developers working on efficient computer vision, particularly for deployment on resource-constrained devices or for high-throughput applications.
How It Works
EfficientViT utilizes a novel architecture that incorporates multi-scale attention mechanisms, allowing it to efficiently process high-resolution images. The Deep Compression Autoencoder (DC-AE) family offers high spatial compression ratios (up to 128x) while maintaining reconstruction quality, significantly accelerating latent diffusion models. EfficientViT-SAM replaces the heavy image encoder in SAM with EfficientViT, achieving substantial speedups (e.g., 48.9x TensorRT speedup on A100) without accuracy loss.
Quick Start & Requirements
conda create -n efficientvit python=3.10
, conda activate efficientvit
, pip install -U -r requirements.txt
Highlighted Details
Maintenance & Community
The project is associated with the MIT Han Lab. Notable integrations include NVIDIA Jetson Generative AI Lab, timm, X-AnyLabeling, and Grounding DINO 1.5 Edge. Papers have been accepted to ICLR 2025, CVPR 2024, and ICCV 2023.
Licensing & Compatibility
The README does not explicitly state the license. However, the project is open-source and has been integrated into various third-party projects, suggesting broad compatibility.
Limitations & Caveats
While aiming for efficiency, achieving maximum performance (e.g., TensorRT speedups) may require specific hardware and software configurations. The project is actively developing new models and features, with recent updates indicating ongoing research and releases.
3 months ago
1 day