Discover and explore top open-source AI tools and projects—updated daily.
Tencent-HunyuanUnified video generation with free-form composition and reasoning
New!
Top 83.9% on SourcePulse
Summary
OmniWeaving addresses unified video generation by integrating multimodal composition and reasoning capabilities. It targets researchers and practitioners seeking advanced video creation, enabling sophisticated outputs from complex, interleaved user inputs by understanding nuanced intentions.
How It Works
The architecture combines a Multimodal Large Language Model (MLLM) for semantic parsing, a Variational Autoencoder (VAE) for visual tokenization, and a Multimodal Diffusion Transformer (MMDiT) for generation. Novelties include an "Activating Thinking Mode" where the MLLM actively reasons to refine prompts, and "Hidden States DeepStacking" which injects multi-granular semantic guidance from various MLLM layers into the MMDiT. This approach yields state-of-the-art performance among open-source unified models.
Quick Start & Requirements
Installation involves cloning the repository and installing dependencies via pip install -r requirements.txt. Optional acceleration libraries like Flash Attention, Flex-Block-Attention, or SageAttention can be installed for performance gains. Model weights are available on HuggingFace. Training data construction requires a VLM server (e.g., Qwen3-VL-235B). Inference examples suggest multi-GPU setups are beneficial.
Highlighted Details
Built on HunyuanVideo-1.5, OmniWeaving introduces IntelligentVBench for evaluating unified video generation. It supports diverse tasks including Text-to-Video (T2V), Image-to-Video (I2V), video editing, and compositional generation with multiple subjects and modalities.
Maintenance & Community
Developed by Tencent's HunyuanVideo team. Acknowledges contributions from key open-source projects like Transformers and Diffusers. No direct community channels (Discord/Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README, posing a significant adoption barrier. Compatibility for commercial use or closed-source linking is undetermined.
Limitations & Caveats
Training data construction pipelines are provided for representative tasks, but some may be simplified or omit components. The requirement for a VLM server for data preparation is a notable setup hurdle. The absence of a clear license is a critical limitation for deployment.
1 day ago
Inactive