Open-Sora-Plan  by PKU-YuanGroup

Open-source project aiming to reproduce Sora-like T2V model

created 1 year ago
12,009 stars

Top 4.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project aims to replicate OpenAI's Sora text-to-video model, providing an open-source platform for community contributions. It targets researchers and developers interested in state-of-the-art video generation, offering a scalable framework for training and inference.

How It Works

Open-Sora Plan utilizes a diffusion transformer architecture with a focus on spatiotemporal feature learning through 3D attention mechanisms. It incorporates a CausalVideoVAE for efficient video compression and generation, achieving high compression ratios with low training costs. Recent updates include WF-VAE, prompt refiners, data filtering, sparse attention, and bucket training strategies for improved quality and efficiency.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via pip install -e . (Python >= 3.8, PyTorch >= 2.1.0). NPU support requires specific torch_npu installation and decord compilation.
  • Demo: A Gradio web UI is available via python -m opensora.serve.gradio_web_server.
  • Resources: Pre-trained models and datasets are available on Hugging Face.

Highlighted Details

  • Supports image-to-video (I2V) and transition generation.
  • Achieves 24GB VRAM inference with memory-saving techniques.
  • Offers training capabilities on Huawei Ascend NPU systems.
  • Latest version v1.3.0 supports arbitrary resolutions with a stride of 32.

Maintenance & Community

The project is actively developed with frequent updates and welcomes community contributions via pull requests. Links to Discord and WeChat communities are provided for engagement.

Licensing & Compatibility

The project is licensed under the Apache License 2.0, permitting commercial use and linking with closed-source projects.

Limitations & Caveats

Some v1.2.0 models trained on Panda70M may produce watermarks. Inference resolutions must be multiples of 32, and frame counts must follow a specific pattern (e.g., 4n+1).

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
123 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.