Sana  by NVlabs

Image synthesis research paper using a linear diffusion transformer

created 9 months ago
4,406 stars

Top 11.3% on sourcepulse

GitHubView on GitHub
Project Summary

Sana is a text-to-image generation framework designed for efficient, high-resolution image synthesis. It targets researchers and content creators seeking fast, high-quality image generation with strong text-image alignment, even on consumer hardware. The core benefit is achieving state-of-the-art results with significantly reduced computational requirements and faster inference times compared to larger models.

How It Works

Sana employs a novel architecture combining a 32x downsampling Deep Convolutional Autoencoder (DC-AE) to reduce latent token count, and a Linear Diffusion Transformer (Linear DiT) that replaces standard attention with linear attention for efficiency at high resolutions. It also utilizes a decoder-only LLM as a text encoder, enhanced with instruction tuning for improved image-text alignment. For faster sampling, it introduces Flow-DPM-Solver, reducing inference steps.

Quick Start & Requirements

  • Installation: Clone the repository and run ./environment_setup.sh sana or install components manually.
  • Prerequisites: Python >= 3.10.0, PyTorch >= 2.0.1+cu12.1.
  • Hardware: 9GB VRAM for 0.6B models, 12GB VRAM for 1.6B models for inference. Training requires 32GB VRAM. Quantized versions can run on <8GB VRAM.
  • Demos & Docs: Online demo available at https://nv-sana.mit.edu/. diffusers integration: SanaPipeline, SanaPAGPipeline. ComfyUI nodes: ComfyUI_ExtraModels.

Highlighted Details

  • Achieves 2K and 4K resolution image generation.
  • Supports ControlNet for guided generation.
  • Enables Dreambooth and LoRA fine-tuning.
  • Offers 8-bit and 4-bit quantization for reduced VRAM usage.
  • Claims up to 100x faster throughput and 20x smaller model size than comparable large models (e.g., Flux-12B).
  • SANA-Sprint models achieve 1-4 step generation.

Maintenance & Community

The project is actively developed by NVlabs, with recent updates in March 2025 including SANA-Sprint release and SANA-1.5 updates. Community support and integration are evident through active diffusers and ComfyUI contributions.

Licensing & Compatibility

The codebase license was changed to Apache 2.0 on January 11, 2025. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

While highly efficient, the README notes that specific GPU versions may yield different performance metrics. The project is under active development, with some features like video generation listed under "TODO".

Health Check
Last commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
1
Star History
355 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.