Discover and explore top open-source AI tools and projects—updated daily.
NVlabsDiffusion Vision Transformers for high-fidelity image generation
Top 60.4% on SourcePulse
Summary
DiffiT addresses high-fidelity image generation by combining Diffusion Models with Vision Transformers (ViTs). It introduces Time-dependent Multihead Self Attention (TMSA) for precise control over the denoising process across timesteps. This approach targets researchers and engineers in generative AI, offering state-of-the-art performance on class-conditional image synthesis tasks.
How It Works
The core innovation lies in integrating ViTs with diffusion models, enhanced by TMSA. This novel attention mechanism allows for fine-grained, timestep-specific adjustments during the denoising pipeline. This architectural choice enables DiffiT to achieve superior control and generation quality compared to prior methods.
Quick Start & Requirements
The repository provides official PyTorch code and pretrained model checkpoints for DiffiT. Image sampling is initiated via sample.py, with example commands for ImageNet-256 and ImageNet-512 resolutions, requiring configuration of log directories and model paths. Evaluation of generated images, including FID scores, is handled by eval_run.sh, mirroring the openai/guided-diffusion evaluation protocol. Ready-to-use Slurm scripts are also available for batch processing.
Highlighted Details
Maintenance & Community
The project is an official release from NVIDIA Research, with code and pretrained models made available on March 8, 2026. It was accepted to ECCV 2024. No community channels (e.g., Discord, Slack) or roadmap details are provided in the README.
Licensing & Compatibility
The source code is released under the NVIDIA Source Code License-NC, which restricts commercial use. Pre-trained models are shared under the CC-BY-NC-SA-4.0 license, requiring any derivative works to be distributed under the same non-commercial, share-alike terms.
Limitations & Caveats
The primary limitation is the non-commercial (NC) nature of both the source code and pre-trained model licenses, preventing integration into commercial products or services. Derivative works must adhere to the same restrictive licensing.
1 month ago
Inactive
huggingface
NVlabs
openai
google-research