Bernini  by bytedance

AI-powered video generation and editing framework

Created 2 weeks ago

New!

703 stars

Top 48.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary Bernini is a unified framework for video generation and editing, combining an MLLM-based semantic planner with a DiT-based renderer. It targets engineers, researchers, and power users, offering state-of-the-art video editing performance comparable to leading commercial models. The system simplifies complex video manipulation through its integrated approach.

How It Works The framework uses an MLLM-based semantic planner for intent interpretation and high-level video manipulation plans. These plans are then rendered into high-fidelity video by a Diffusion Transformer (DiT)-based renderer. This modular design leverages LLMs for understanding and diffusion models for visual synthesis, enabling flexible video creation.

Quick Start & Requirements Requires Python 3.11.2 and a CUDA GPU; NVIDIA Hopper (H100/H800/H200) with CUDA 12.4 is recommended for optimal performance with FlashAttention-3. Core dependencies: PyTorch 2.5.1+cu124, diffusers 0.35.2, accelerate 0.34.2, transformers 4.57.3. Install:

git clone https://github.com/bytedance/Bernini.git bernini && cd bernini
pip install -r requirements.txt

Optional: Open-VeOmni for multi-GPU, FlashAttention-2/3 for faster attention. Weights via Hugging Face (ByteDance/Bernini-R-Diffusers recommended).

Highlighted Details

  • Achieves top-tier video editing performance, rivaling leading closed-source commercial models.
  • Unified framework integrating MLLM semantic planning with DiT video rendering.
  • Supports diverse tasks: T2I, I2I, T2V, V2V editing, and reference-guided video generation.

Maintenance & Community Inference code and model weights open-sourced June 1, 2026, following paper release May 22, 2026. Key contributors: Chenchen Liu, Junyi Chen, Lei Li, Lu Chi, Mingzhen Sun, Zhuoying Li, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuan. No community channels or detailed roadmap specified.

Licensing & Compatibility Released under Apache License 2.0, permitting commercial use, modification, and distribution, including integration into closed-source projects, subject to license terms.

Limitations & Caveats Optimal performance and advanced features (e.g., FlashAttention-3) require NVIDIA Hopper GPUs and CUDA 12.4. Multi-GPU distributed training needs Open-VeOmni setup. Prompt enhancement relies on external OpenAI-compatible API endpoints/keys. Project is newly released (June 2026).

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
14
Star History
707 stars in the last 14 days

Explore Similar Projects

Feedback? Help us improve.