SVD extensions for video generation research
Top 49.4% on sourcepulse
This repository provides training code and extensions for Stable Video Diffusion (SVD), enabling users to fine-tune SVD models and control video generation with tracklets (bounding boxes). It is targeted at researchers and developers working with video generation and diffusion models who want to enhance control over object motion and improve fine-tuning capabilities.
How It Works
The project builds upon existing SVD models, incorporating techniques like Self-Tracking from Boximator and Instance-Enhancer from TrackDiffusion. This allows for tracklet-conditioned video generation, where the movement of objects can be guided by bounding box information. The training process involves preparing video data into a specific folder structure and configuring training parameters, with an example configuration provided for fine-tuning on datasets like BDD100K.
Quick Start & Requirements
accelerate launch train_svd.py
accelerate
, diffusers
, transformers
, xformers
. Requires a GPU with CUDA.pretrained_model_name_or_path
, per_gpu_batch_size
, max_train_steps
, width
, height
, learning_rate
, and mixed_precision
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is still under development, with text-to-video support marked as "WIP" (Work In Progress). The README notes that the current video quality for tracklet control might appear poor due to the 512x320 resolution used, but this is intended to demonstrate the tracklet control effectiveness.
1 year ago
1 day