VGen  by ali-vilab

Video synthesis codebase for state-of-the-art generative models

Created 1 year ago
3,133 stars

Top 15.3% on SourcePulse

GitHubView on GitHub
Project Summary

VGen is a comprehensive open-source codebase for video generation, offering implementations of state-of-the-art diffusion models for various synthesis tasks. It caters to researchers and developers in AI video generation, providing tools for training, inference, and customization with a focus on high-quality output and controllability.

How It Works

VGen leverages cascaded diffusion models and hierarchical spatio-temporal decoupling to achieve high-quality video synthesis. It supports text-to-video, image-to-video, and controllable generation based on motion and subject customization. The ecosystem is designed for expandability and includes components for managing experiments and integrating various diffusion model architectures.

Quick Start & Requirements

  • Install: conda create -n vgen python=3.8, conda activate vgen, pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113, pip install -r requirements.txt.
  • Prerequisites: Python 3.8, PyTorch 1.12.0 with CUDA 11.3, ffmpeg, libsm6, libxext6.
  • Setup: Requires cloning the repository and installing dependencies.
  • Docs: Modelscope T2V Technical Report, I2VGen-XL Paper.

Highlighted Details

  • Implements multiple advanced models: I2VGen-xl, VideoComposer, HiGen, InstructVideo, DreamVideo, VideoLCM, TF-T2V.
  • Supports customization via LoRA fine-tuning, subject learning, and motion learning.
  • Includes tools for metric calculation (CLIP-T, CLIP-I, DINO-I, Temporal Consistency).
  • Offers Gradio demos for local testing and HuggingFace/ModelScope integration.

Maintenance & Community

  • Developed by Tongyi Lab of Alibaba Group.
  • Active development with frequent releases of new models and features (e.g., InstructVideo, DreamVideo, VideoLCM).
  • Links to relevant papers and technical reports are provided.

Licensing & Compatibility

  • License: Trained with WebVid-10M and LAION-400M datasets. Intended for RESEARCH/NON-COMMERCIAL USE ONLY.

Limitations & Caveats

  • The current models perform inadequately on anime images and images with black backgrounds due to limited training data.
  • Super-resolution models for TF-T2V only support 32-frame input, requiring frame duplication for 16-frame videos.
Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

SkyReels-V2 by SkyworkAI

3.3%
4k
Film generation model for infinite-length videos using diffusion forcing
Created 5 months ago
Updated 1 month ago
Feedback? Help us improve.