Open-source project aiming to reproduce Sora-like T2V model
Top 4.2% on sourcepulse
This project aims to replicate OpenAI's Sora text-to-video model, providing an open-source platform for community contributions. It targets researchers and developers interested in state-of-the-art video generation, offering a scalable framework for training and inference.
How It Works
Open-Sora Plan utilizes a diffusion transformer architecture with a focus on spatiotemporal feature learning through 3D attention mechanisms. It incorporates a CausalVideoVAE for efficient video compression and generation, achieving high compression ratios with low training costs. Recent updates include WF-VAE, prompt refiners, data filtering, sparse attention, and bucket training strategies for improved quality and efficiency.
Quick Start & Requirements
pip install -e .
(Python >= 3.8, PyTorch >= 2.1.0). NPU support requires specific torch_npu installation and decord compilation.python -m opensora.serve.gradio_web_server
.Highlighted Details
Maintenance & Community
The project is actively developed with frequent updates and welcomes community contributions via pull requests. Links to Discord and WeChat communities are provided for engagement.
Licensing & Compatibility
The project is licensed under the Apache License 2.0, permitting commercial use and linking with closed-source projects.
Limitations & Caveats
Some v1.2.0 models trained on Panda70M may produce watermarks. Inference resolutions must be multiples of 32, and frame counts must follow a specific pattern (e.g., 4n+1).
2 weeks ago
1 day