SVD_Xtend by pixeli99

SVD extensions for video generation research

Created 1 year ago

721 stars

Top 47.7% on SourcePulse

Project Summary

This repository provides training code and extensions for Stable Video Diffusion (SVD), enabling users to fine-tune SVD models and control video generation with tracklets (bounding boxes). It is targeted at researchers and developers working with video generation and diffusion models who want to enhance control over object motion and improve fine-tuning capabilities.

How It Works

The project builds upon existing SVD models, incorporating techniques like Self-Tracking from Boximator and Instance-Enhancer from TrackDiffusion. This allows for tracklet-conditioned video generation, where the movement of objects can be guided by bounding box information. The training process involves preparing video data into a specific folder structure and configuring training parameters, with an example configuration provided for fine-tuning on datasets like BDD100K.

Quick Start & Requirements

Install/Run: accelerate launch train_svd.py
Prerequisites: PyTorch, accelerate, diffusers, transformers, xformers. Requires a GPU with CUDA.
Configuration: Training configuration parameters include pretrained_model_name_or_path, per_gpu_batch_size, max_train_steps, width, height, learning_rate, and mixed_precision.
Data: Video data needs to be organized into a specific directory structure.
Links: Part 1, Part 2

Highlighted Details

Enables fine-tuning of Stable Video Diffusion models.
Introduces tracklet-conditioned video generation for object motion control.
Leverages Self-Tracking and Instance-Enhancer techniques.
Supports custom video datasets for training.

Maintenance & Community

The project acknowledges contributions from Diffusers and Stability AI, as well as Boximator and GLIGEN.
A BibTeX entry is provided for citation.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is therefore unclear.

Limitations & Caveats

The project is still under development, with text-to-video support marked as "WIP" (Work In Progress). The README notes that the current video quality for tracklet control might appear poor due to the 512x320 resolution used, but this is intended to demonstrate the tracklet control effectiveness.

SVD_Xtend by pixeli99

Explore Similar Projects

gcd by basilevh

VADER by mihirp1998

vid2vid-zero by baaivision

Gen-L-Video by G-U-N

Magic-Me by Zhen-Dong

Tora by alibaba

SD-CN-Animation by volotat

SEINE by Vchitect

Allegro by rhymes-ai

WarpFusion by Sxela

Awesome-Video-Diffusion by showlab

SkyReels-V2 by SkyworkAI