BindWeave by bytedance

Unified framework for subject-consistent video generation

Created 4 months ago

368 stars

Top 77.0% on SourcePulse

Project Summary

Summary

BindWeave is a unified framework for subject-consistent video generation, capable of handling both single and multi-subject prompts. It targets researchers and engineers in AI video synthesis, offering high-fidelity video creation by integrating multimodal large language models with diffusion transformers.

How It Works

The core architecture couples a pretrained multimodal large language model (MLLM) with a Diffusion Transformer (DiT). BindWeave achieves cross-modal integration through entity grounding and representation alignment. The MLLM parses complex user prompts, generating subject-aware hidden states that precisely condition the DiT for generating high-fidelity videos, ensuring subject consistency across generations.

Quick Start & Requirements

Installation involves cloning the repository and running bash build_env.sh. Key prerequisites include:

Switching between feature_extraction and infer Git branches for specific steps.
Downloading the WanX 2.1 14B model (Wan-AI/Wan2.1-I2V-14B-720P-Diffusers) and the BindWeave 14B model (ByteDance/BindWeave) via Hugging Face CLI.
Converting downloaded weights using python convert_ckpt.py.
Updating configuration files (configs/inference/inference_model_s2v.json) to point to downloaded model components.
Dependencies include huggingface_hub[cli].
Project page: https://lzy-dot.github.io/BindWeave/
arXiv paper: https://arxiv.org/pdf/2510.00438

Highlighted Details

Achieves a score of 57.61% on the OpenS2V-Eval benchmark, demonstrating competitive performance.
Supports generation for both single and multiple subjects within a prompt.
The BindWeave-Wan-14B model is available on HuggingFace.
Community-provided ComfyUI integration and FP8-quantized models are available.

Maintenance & Community

Information regarding community channels (e.g., Discord, Slack) or a public roadmap is not present in the README. Author details are available via the linked arXiv paper.

Licensing & Compatibility

The repository's README does not specify a software license, creating ambiguity regarding usage rights, commercial compatibility, and derivative works.

Limitations & Caveats

The setup process is complex, requiring manual Git branch switching, downloading and converting large model weights, and detailed configuration file adjustments, suggesting a non-trivial barrier to entry. No explicit limitations or known issues are detailed.

Health Check

Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days