Discover and explore top open-source AI tools and projects—updated daily.
nv-tlabsPixel diffusion decoder for high-resolution latent-to-image generation
New!
Top 47.4% on SourcePulse
PiD is a novel diffusion decoder designed to replace traditional VAE/RAE components in latent-based generative models. It addresses the challenge of efficiently and accurately decoding latent representations into high-resolution images by reformulating the process as a conditional pixel-space diffusion model. This approach allows for direct denoising in high-resolution pixel space, unifying decoding and upsampling into a single, fast generative pass, benefiting researchers and developers working with large-scale image generation models.
How It Works
PiD reformulates the latent-to-pixel decoding task as a conditional pixel-space diffusion model. Instead of a separate decoding and upsampling stage, PiD directly denoises in high-resolution pixel space, producing a super-resolved image in a single generative pass. This unified approach offers a more efficient and potentially higher-quality alternative to traditional VAE/RAE decoders, leveraging the power of diffusion models for precise pixel-level generation.
Quick Start & Requirements
pip install -e . after installing utility dependencies, or use conda env create -f environment.yml for a full environment.transformers>=4.57.x, diffusers>=0.37. Additional dependencies include hydra-core, omegaconf, pyyaml, attrs, einops, loguru, termcolor, fvcore, iopath, wandb, imageio, opencv-python-headless, pandas, safetensors, sentencepiece, boto3, botocore. DINOv2/SigLIP backbones require optional dependencies detailed in docs/dinov2_siglip.md.Highlighted Details
2k (2048px trained) and 2kto4k (up to 4K resolution trained).from_clean_* (image -> encode -> PiD) and from_ldm_* (text/class -> LDM -> PiD).Maintenance & Community
The project saw a significant release on May 25, 2026, including the paper, code, and model weights. Upcoming features include PiD options for Qwen-Image and SD-XL, undistilled checkpoints, and training scripts. No specific community channels (e.g., Discord, Slack) or notable sponsorships are mentioned in the README.
Licensing & Compatibility
The PiD codebase is licensed under the Apache License 2.0. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.
Limitations & Caveats
The 2kto4k decoder variant is noted to perform worse than the 2k variant at 2048px resolution. Training scripts are planned but not yet released. The DINOv2 and SigLIP backbones require additional setup for their respective Latent Diffusion Models (LDMs) as they do not integrate with the Hugging Face diffusers library directly.
1 week ago
Inactive
albarji
madebyollin
luosiallen
ai-forever
NVlabs
lucidrains
CompVis
Stability-AI