Discover and explore top open-source AI tools and projects—updated daily.
inclusionAIUnified multimodal understanding and generation model
New!
Top 58.4% on SourcePulse
LLaDA2.0-Uni unifies multimodal understanding and generation tasks within a single Diffusion Large Language Model (dLLM) architecture. It targets researchers and developers seeking a versatile model for complex visual-linguistic applications, offering integrated capabilities for image comprehension, generation, and editing with a focus on efficiency and high fidelity.
How It Works
The project introduces a unified dLLM-based Mixture-of-Experts (MoE) backbone, built upon LLaDA 2.0, which employs a Mask Token Prediction paradigm for seamless multimodal integration. Visual inputs are converted into discrete semantic tokens using SigLIP-VQ, enhancing understanding. High-fidelity generation is achieved through a specialized diffusion decoder, optimized for rapid 8-step inference via distillation.
Quick Start & Requirements
requirements.txt.Highlighted Details
Maintenance & Community
The initial version was released on April 23, 2026. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README. SGLang support is listed as "Coming Soon."
Licensing & Compatibility
The project is licensed under the Apache License 2.0, which is generally permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
SPRINT acceleration automatically falls back to the baseline method when using Editing CFG (three-way guidance). The provided inference code examples rely on local asset files (e.g., ./assets/understanding_example.png).
4 days ago
Inactive
YangLing0818