MAGI-1 by SandAI-org

Video generation model using autoregressive chunk-wise denoising

Created 8 months ago

3,622 stars

Top 13.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

MAGI-1 is an autoregressive video generation model designed for high-fidelity, temporally consistent video synthesis, particularly for text-to-video and image-to-video tasks. It targets researchers and developers seeking controllable, scalable video generation with potential for real-time streaming, offering a promising alternative to existing closed-source solutions.

How It Works

MAGI-1 employs a Transformer-based VAE architecture with significant spatial and temporal compression. It generates videos autoregressively by predicting chunks of frames, denoising each chunk holistically. This chunk-wise approach allows for concurrent processing of multiple chunks, enhancing generation efficiency. The model is built on a Diffusion Transformer, incorporating innovations like Block-Causal Attention and QK-Norm for improved training stability and scale. A shortcut distillation algorithm enables variable inference budgets by enforcing self-consistency across different step sizes.

Quick Start & Requirements

Docker: docker pull sandai/magi:latest followed by docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
Source Code: Requires Python 3.10.12, PyTorch 2.4.0 with CUDA 12.4, ffmpeg 4.4, and the MagiAttention submodule.
Hardware: 24B models require 8x H100/H800 GPUs. 4.5B models require 1x RTX 4090 (24GB VRAM).
Links: Technical Report, MagiAttention

Highlighted Details

Achieves state-of-the-art performance on image-to-video tasks, outperforming models like Kling and Sora in instruction following and motion quality.
Demonstrates superior precision in predicting physical behavior on the Physics-IQ benchmark via video continuation.
Supports controllable generation through chunk-wise prompting for smooth transitions and fine-grained text control.
Offers distilled and quantized models for efficient inference, including an fp8 quantized version.

Maintenance & Community

The project is actively developed by SandAI-org, with recent updates in April 2025. Contact is available via GitHub issues or research@sand.ai.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The 24B model has significant hardware requirements (8x H100/H800). While 4.5B models are more accessible, they still require substantial GPU memory (24GB). Some distilled and quantized models are listed as "Coming soon."

Health Check

Last Commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

48 stars in the last 30 days