Discover and explore top open-source AI tools and projects—updated daily.
bytedanceUnified multimodal AI for generation and editing
Top 94.5% on SourcePulse
Summary
MammothModa is a unified AR-Diffusion model family designed for seamless multimodal understanding, generation, and editing. It addresses the need for a single architecture capable of handling text-to-image, text-to-video, image editing, and video editing tasks, offering significant efficiency gains and consolidated functionality for AI practitioners.
How It Works
The project leverages a Mixture-of-Experts (MoE) architecture, specifically a Fine-Grained MoE with 128 experts and Top-8 routing, enabling approximately 3 billion active parameters out of a total of 25 billion. This design, integrated with Diffusion Transformer (DiT) models and Qwen-VL foundation models, enables efficient inference (12x faster) and supporting multimodal tasks within a single model.
Quick Start & Requirements
Model weights and inference code for Mamoda2.5 (video generation/editing) and Mamoda2-Dev (image editing) are available on HuggingFace. Users may need to switch to specific branches (e.g., qwen25vl for preview versions). Explicit installation commands are not provided.
Project Page: https://mammothmoda.github.io/
Tech Reports: https://arxiv.org/abs/2511.18262, https://arxiv.org/abs/2605.02641
Highlighted Details
Maintenance & Community
The project is actively developed by the Moderation LLM Team at ByteDance, evidenced by numerous recent publications in top AI conferences (ICLR, ICCV, NeurIPS). Contact: liuchang.lab@bytedance.com. No community channels (e.g., Discord, Slack) or public roadmap are specified.
Licensing & Compatibility
The README does not specify a software license. Open-source model weights are noted as being "under internal review," suggesting potential future release with current uncertainty on terms of use.
Limitations & Caveats
Open-source model weights are currently under internal review and not fully released. Users may need specific branches for certain versions (e.g., preview models), indicating potential instability.
2 weeks ago
Inactive