MAGI-1  by SandAI-org

Video generation model using autoregressive chunk-wise denoising

Created 1 year ago
3,691 stars

Top 13.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MAGI-1 is an autoregressive video generation model designed for high-fidelity, temporally consistent video synthesis, particularly for text-to-video and image-to-video tasks. It targets researchers and developers seeking controllable, scalable video generation with potential for real-time streaming, offering a promising alternative to existing closed-source solutions.

How It Works

MAGI-1 employs a Transformer-based VAE architecture with significant spatial and temporal compression. It generates videos autoregressively by predicting chunks of frames, denoising each chunk holistically. This chunk-wise approach allows for concurrent processing of multiple chunks, enhancing generation efficiency. The model is built on a Diffusion Transformer, incorporating innovations like Block-Causal Attention and QK-Norm for improved training stability and scale. A shortcut distillation algorithm enables variable inference budgets by enforcing self-consistency across different step sizes.

Quick Start & Requirements

  • Docker: docker pull sandai/magi:latest followed by docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
  • Source Code: Requires Python 3.10.12, PyTorch 2.4.0 with CUDA 12.4, ffmpeg 4.4, and the MagiAttention submodule.
  • Hardware: 24B models require 8x H100/H800 GPUs. 4.5B models require 1x RTX 4090 (24GB VRAM).
  • Links: Technical Report, MagiAttention

Highlighted Details

  • Achieves state-of-the-art performance on image-to-video tasks, outperforming models like Kling and Sora in instruction following and motion quality.
  • Demonstrates superior precision in predicting physical behavior on the Physics-IQ benchmark via video continuation.
  • Supports controllable generation through chunk-wise prompting for smooth transitions and fine-grained text control.
  • Offers distilled and quantized models for efficient inference, including an fp8 quantized version.

Maintenance & Community

The project is actively developed by SandAI-org, with recent updates in April 2025. Contact is available via GitHub issues or research@sand.ai.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The 24B model has significant hardware requirements (8x H100/H800). While 4.5B models are more accessible, they still require substantial GPU memory (24GB). Some distilled and quantized models are listed as "Coming soon."

Health Check
Last Commit

11 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 30 days

Explore Similar Projects

Starred by Zhuohan Li Zhuohan Li(Coauthor of vLLM), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
2 more.

FastVideo by hao-ai-lab

0.6%
4k
Framework for accelerated video generation
Created 1 year ago
Updated 22 hours ago
Feedback? Help us improve.