VGen  by ali-vilab

Video synthesis codebase for state-of-the-art generative models

created 1 year ago
3,124 stars

Top 15.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

VGen is a comprehensive open-source codebase for video generation, offering implementations of state-of-the-art diffusion models for various synthesis tasks. It caters to researchers and developers in AI video generation, providing tools for training, inference, and customization with a focus on high-quality output and controllability.

How It Works

VGen leverages cascaded diffusion models and hierarchical spatio-temporal decoupling to achieve high-quality video synthesis. It supports text-to-video, image-to-video, and controllable generation based on motion and subject customization. The ecosystem is designed for expandability and includes components for managing experiments and integrating various diffusion model architectures.

Quick Start & Requirements

  • Install: conda create -n vgen python=3.8, conda activate vgen, pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113, pip install -r requirements.txt.
  • Prerequisites: Python 3.8, PyTorch 1.12.0 with CUDA 11.3, ffmpeg, libsm6, libxext6.
  • Setup: Requires cloning the repository and installing dependencies.
  • Docs: Modelscope T2V Technical Report, I2VGen-XL Paper.

Highlighted Details

  • Implements multiple advanced models: I2VGen-xl, VideoComposer, HiGen, InstructVideo, DreamVideo, VideoLCM, TF-T2V.
  • Supports customization via LoRA fine-tuning, subject learning, and motion learning.
  • Includes tools for metric calculation (CLIP-T, CLIP-I, DINO-I, Temporal Consistency).
  • Offers Gradio demos for local testing and HuggingFace/ModelScope integration.

Maintenance & Community

  • Developed by Tongyi Lab of Alibaba Group.
  • Active development with frequent releases of new models and features (e.g., InstructVideo, DreamVideo, VideoLCM).
  • Links to relevant papers and technical reports are provided.

Licensing & Compatibility

  • License: Trained with WebVid-10M and LAION-400M datasets. Intended for RESEARCH/NON-COMMERCIAL USE ONLY.

Limitations & Caveats

  • The current models perform inadequately on anime images and images with black backgrounds due to limited training data.
  • Super-resolution models for TF-T2V only support 32-frame input, requiring frame duplication for 16-frame videos.
Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
34 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.1%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.