Discover and explore top open-source AI tools and projects—updated daily.
shallowdream204Powerful multimodal autoregressive model for efficient visual generation
New!
Top 81.4% on SourcePulse
BitDance is an open-source, 14B parameter autoregressive multimodal model designed for efficient visual generation. It addresses limitations of discrete autoregressive models, such as poor tokenizer reconstruction and slow generation, by introducing a large-vocabulary binary tokenizer, a binary diffusion head, and a novel next-patch diffusion paradigm. This approach enables rapid, high-resolution, photorealistic image synthesis, targeting researchers and power users seeking scalable generative capabilities.
How It Works
BitDance utilizes a decoder-only architecture incorporating a large-vocabulary binary tokenizer and a binary diffusion head. Its key innovation is the "next-patch diffusion paradigm," which allows for parallel prediction of up to 64 visual tokens per step. This contrasts with traditional token-by-token generation, offering a significant speedup (over 30x reported) and improved efficiency for generating high-resolution images. The unified multimodal framework is designed for scalability and simplicity.
Quick Start & Requirements
https://github.com/shallowdream204/BitDance.git), create a Python 3.11 Conda environment, activate it, and install dependencies via pip install -r requirements.txt and pip install flash_attn==2.8.2 --no-build-isolation.flash-attn (v2.8.2), CUDA (implied for GPU usage).hf download commands for T2I and ImageNet models.Highlighted Details
diffusers versions.Maintenance & Community
Recent updates (February 2026) include the release of a diffusers version and UniWeTok, a unified binary tokenizer. A project website and interactive demo are available.
Licensing & Compatibility
Limitations & Caveats
As a research project, training code is still being organized and will be released later. Specific limitations or known bugs are not detailed in the README.
3 days ago
Inactive