Video generation model using autoregressive chunk-wise denoising
Top 14.5% on sourcepulse
MAGI-1 is an autoregressive video generation model designed for high-fidelity, temporally consistent video synthesis, particularly for text-to-video and image-to-video tasks. It targets researchers and developers seeking controllable, scalable video generation with potential for real-time streaming, offering a promising alternative to existing closed-source solutions.
How It Works
MAGI-1 employs a Transformer-based VAE architecture with significant spatial and temporal compression. It generates videos autoregressively by predicting chunks of frames, denoising each chunk holistically. This chunk-wise approach allows for concurrent processing of multiple chunks, enhancing generation efficiency. The model is built on a Diffusion Transformer, incorporating innovations like Block-Causal Attention and QK-Norm for improved training stability and scale. A shortcut distillation algorithm enables variable inference budgets by enforcing self-consistency across different step sizes.
Quick Start & Requirements
docker pull sandai/magi:latest
followed by docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
Highlighted Details
Maintenance & Community
The project is actively developed by SandAI-org, with recent updates in April 2025. Contact is available via GitHub issues or research@sand.ai.
Licensing & Compatibility
Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The 24B model has significant hardware requirements (8x H100/H800). While 4.5B models are more accessible, they still require substantial GPU memory (24GB). Some distilled and quantized models are listed as "Coming soon."
1 month ago
Inactive