MAGI-1  by SandAI-org

Video generation model using autoregressive chunk-wise denoising

created 3 months ago
3,414 stars

Top 14.5% on sourcepulse

GitHubView on GitHub
Project Summary

MAGI-1 is an autoregressive video generation model designed for high-fidelity, temporally consistent video synthesis, particularly for text-to-video and image-to-video tasks. It targets researchers and developers seeking controllable, scalable video generation with potential for real-time streaming, offering a promising alternative to existing closed-source solutions.

How It Works

MAGI-1 employs a Transformer-based VAE architecture with significant spatial and temporal compression. It generates videos autoregressively by predicting chunks of frames, denoising each chunk holistically. This chunk-wise approach allows for concurrent processing of multiple chunks, enhancing generation efficiency. The model is built on a Diffusion Transformer, incorporating innovations like Block-Causal Attention and QK-Norm for improved training stability and scale. A shortcut distillation algorithm enables variable inference budgets by enforcing self-consistency across different step sizes.

Quick Start & Requirements

  • Docker: docker pull sandai/magi:latest followed by docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
  • Source Code: Requires Python 3.10.12, PyTorch 2.4.0 with CUDA 12.4, ffmpeg 4.4, and the MagiAttention submodule.
  • Hardware: 24B models require 8x H100/H800 GPUs. 4.5B models require 1x RTX 4090 (24GB VRAM).
  • Links: Technical Report, MagiAttention

Highlighted Details

  • Achieves state-of-the-art performance on image-to-video tasks, outperforming models like Kling and Sora in instruction following and motion quality.
  • Demonstrates superior precision in predicting physical behavior on the Physics-IQ benchmark via video continuation.
  • Supports controllable generation through chunk-wise prompting for smooth transitions and fine-grained text control.
  • Offers distilled and quantized models for efficient inference, including an fp8 quantized version.

Maintenance & Community

The project is actively developed by SandAI-org, with recent updates in April 2025. Contact is available via GitHub issues or research@sand.ai.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The 24B model has significant hardware requirements (8x H100/H800). While 4.5B models are more accessible, they still require substantial GPU memory (24GB). Some distilled and quantized models are listed as "Coming soon."

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
6
Star History
566 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.