Bernini by bytedance

AI-powered video generation and editing framework

Created 1 month ago

1,156 stars

Top 32.7% on SourcePulse

Project Summary

Summary Bernini is a unified framework for video generation and editing, combining an MLLM-based semantic planner with a DiT-based renderer. It targets engineers, researchers, and power users, offering state-of-the-art video editing performance comparable to leading commercial models. The system simplifies complex video manipulation through its integrated approach.

How It Works The framework uses an MLLM-based semantic planner for intent interpretation and high-level video manipulation plans. These plans are then rendered into high-fidelity video by a Diffusion Transformer (DiT)-based renderer. This modular design leverages LLMs for understanding and diffusion models for visual synthesis, enabling flexible video creation.

Quick Start & Requirements Requires Python 3.11.2 and a CUDA GPU; NVIDIA Hopper (H100/H800/H200) with CUDA 12.4 is recommended for optimal performance with FlashAttention-3. Core dependencies: PyTorch 2.5.1+cu124, diffusers 0.35.2, accelerate 0.34.2, transformers 4.57.3. Install:

git clone https://github.com/bytedance/Bernini.git bernini && cd bernini
pip install -r requirements.txt

Optional: Open-VeOmni for multi-GPU, FlashAttention-2/3 for faster attention. Weights via Hugging Face (ByteDance/Bernini-R-Diffusers recommended).

Project Page: https://bernini-ai.github.io/
HuggingFace Models: https://huggingface.co/ByteDance/Bernini

Highlighted Details

Achieves top-tier video editing performance, rivaling leading closed-source commercial models.
Unified framework integrating MLLM semantic planning with DiT video rendering.
Supports diverse tasks: T2I, I2I, T2V, V2V editing, and reference-guided video generation.

Maintenance & Community Inference code and model weights open-sourced June 1, 2026, following paper release May 22, 2026. Key contributors: Chenchen Liu, Junyi Chen, Lei Li, Lu Chi, Mingzhen Sun, Zhuoying Li, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuan. No community channels or detailed roadmap specified.

Licensing & Compatibility Released under Apache License 2.0, permitting commercial use, modification, and distribution, including integration into closed-source projects, subject to license terms.

Limitations & Caveats Optimal performance and advanced features (e.g., FlashAttention-3) require NVIDIA Hopper GPUs and CUDA 12.4. Multi-GPU distributed training needs Open-VeOmni setup. Prompt enhancement relies on external OpenAI-compatible API endpoints/keys. Project is newly released (June 2026).

Bernini by bytedance

Explore Similar Projects

training-free-methods by littlewhitesea

mammothmoda by bytedance

RAVE by RehgLab

Kiwi-Edit by showlab

UniVideo by KlingAIResearch

kandinsky-5 by kandinskylab

Seedance-2-API by Anil-matcha

Lance by bytedance

EasyAnimate by aigc-apps

Pyramid-Flow by jy0205

Open-Sora by hpcaitech

Pixelle-Video by ATH-MaaS