BindWeave  by bytedance

Unified framework for subject-consistent video generation

Created 2 months ago
363 stars

Top 77.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

BindWeave is a unified framework for subject-consistent video generation, capable of handling both single and multi-subject prompts. It targets researchers and engineers in AI video synthesis, offering high-fidelity video creation by integrating multimodal large language models with diffusion transformers.

How It Works

The core architecture couples a pretrained multimodal large language model (MLLM) with a Diffusion Transformer (DiT). BindWeave achieves cross-modal integration through entity grounding and representation alignment. The MLLM parses complex user prompts, generating subject-aware hidden states that precisely condition the DiT for generating high-fidelity videos, ensuring subject consistency across generations.

Quick Start & Requirements

Installation involves cloning the repository and running bash build_env.sh. Key prerequisites include:

  • Switching between feature_extraction and infer Git branches for specific steps.
  • Downloading the WanX 2.1 14B model (Wan-AI/Wan2.1-I2V-14B-720P-Diffusers) and the BindWeave 14B model (ByteDance/BindWeave) via Hugging Face CLI.
  • Converting downloaded weights using python convert_ckpt.py.
  • Updating configuration files (configs/inference/inference_model_s2v.json) to point to downloaded model components.
  • Dependencies include huggingface_hub[cli].
  • Project page: https://lzy-dot.github.io/BindWeave/
  • arXiv paper: https://arxiv.org/pdf/2510.00438

Highlighted Details

  • Achieves a score of 57.61% on the OpenS2V-Eval benchmark, demonstrating competitive performance.
  • Supports generation for both single and multiple subjects within a prompt.
  • The BindWeave-Wan-14B model is available on HuggingFace.
  • Community-provided ComfyUI integration and FP8-quantized models are available.

Maintenance & Community

Information regarding community channels (e.g., Discord, Slack) or a public roadmap is not present in the README. Author details are available via the linked arXiv paper.

Licensing & Compatibility

The repository's README does not specify a software license, creating ambiguity regarding usage rights, commercial compatibility, and derivative works.

Limitations & Caveats

The setup process is complex, requiring manual Git branch switching, downloading and converting large model weights, and detailed configuration file adjustments, suggesting a non-trivial barrier to entry. No explicit limitations or known issues are detailed.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
15 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.