Discover and explore top open-source AI tools and projects—updated daily.
bytedanceAI-powered video generation and editing framework
New!
Top 48.2% on SourcePulse
Summary Bernini is a unified framework for video generation and editing, combining an MLLM-based semantic planner with a DiT-based renderer. It targets engineers, researchers, and power users, offering state-of-the-art video editing performance comparable to leading commercial models. The system simplifies complex video manipulation through its integrated approach.
How It Works The framework uses an MLLM-based semantic planner for intent interpretation and high-level video manipulation plans. These plans are then rendered into high-fidelity video by a Diffusion Transformer (DiT)-based renderer. This modular design leverages LLMs for understanding and diffusion models for visual synthesis, enabling flexible video creation.
Quick Start & Requirements Requires Python 3.11.2 and a CUDA GPU; NVIDIA Hopper (H100/H800/H200) with CUDA 12.4 is recommended for optimal performance with FlashAttention-3. Core dependencies: PyTorch 2.5.1+cu124, diffusers 0.35.2, accelerate 0.34.2, transformers 4.57.3. Install:
git clone https://github.com/bytedance/Bernini.git bernini && cd bernini
pip install -r requirements.txt
Optional: Open-VeOmni for multi-GPU, FlashAttention-2/3 for faster attention. Weights via Hugging Face (ByteDance/Bernini-R-Diffusers recommended).
Highlighted Details
Maintenance & Community Inference code and model weights open-sourced June 1, 2026, following paper release May 22, 2026. Key contributors: Chenchen Liu, Junyi Chen, Lei Li, Lu Chi, Mingzhen Sun, Zhuoying Li, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuan. No community channels or detailed roadmap specified.
Licensing & Compatibility Released under Apache License 2.0, permitting commercial use, modification, and distribution, including integration into closed-source projects, subject to license terms.
Limitations & Caveats Optimal performance and advanced features (e.g., FlashAttention-3) require NVIDIA Hopper GPUs and CUDA 12.4. Multi-GPU distributed training needs Open-VeOmni setup. Prompt enhancement relies on external OpenAI-compatible API endpoints/keys. Project is newly released (June 2026).
1 day ago
Inactive