Open-Sora-Plan by PKU-YuanGroup

Open-source project aiming to reproduce Sora-like T2V model

Created 1 year ago

12,111 stars

Top 4.1% on SourcePulse

View on GitHub

7 Experts Love This Project

Cofounder of Fireworks AI

Paras Jain

Cofounder of Genmo

and 3 more!

Project Summary

This project aims to replicate OpenAI's Sora text-to-video model, providing an open-source platform for community contributions. It targets researchers and developers interested in state-of-the-art video generation, offering a scalable framework for training and inference.

How It Works

Open-Sora Plan utilizes a diffusion transformer architecture with a focus on spatiotemporal feature learning through 3D attention mechanisms. It incorporates a CausalVideoVAE for efficient video compression and generation, achieving high compression ratios with low training costs. Recent updates include WF-VAE, prompt refiners, data filtering, sparse attention, and bucket training strategies for improved quality and efficiency.

Quick Start & Requirements

Installation: Clone the repository and install dependencies via pip install -e . (Python >= 3.8, PyTorch >= 2.1.0). NPU support requires specific torch_npu installation and decord compilation.
Demo: A Gradio web UI is available via python -m opensora.serve.gradio_web_server.
Resources: Pre-trained models and datasets are available on Hugging Face.

Highlighted Details

Supports image-to-video (I2V) and transition generation.
Achieves 24GB VRAM inference with memory-saving techniques.
Offers training capabilities on Huawei Ascend NPU systems.
Latest version v1.3.0 supports arbitrary resolutions with a stride of 32.

Maintenance & Community

The project is actively developed with frequent updates and welcomes community contributions via pull requests. Links to Discord and WeChat communities are provided for engagement.

Licensing & Compatibility

The project is licensed under the Apache License 2.0, permitting commercial use and linking with closed-source projects.

Limitations & Caveats

Some v1.2.0 models trained on Panda70M may produce watermarks. Inference resolutions must be multiples of 32, and frame counts must follow a specific pattern (e.g., 4n+1).

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

38 stars in the last 30 days