SVD_Xtend  by pixeli99

SVD extensions for video generation research

created 1 year ago
707 stars

Top 49.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides training code and extensions for Stable Video Diffusion (SVD), enabling users to fine-tune SVD models and control video generation with tracklets (bounding boxes). It is targeted at researchers and developers working with video generation and diffusion models who want to enhance control over object motion and improve fine-tuning capabilities.

How It Works

The project builds upon existing SVD models, incorporating techniques like Self-Tracking from Boximator and Instance-Enhancer from TrackDiffusion. This allows for tracklet-conditioned video generation, where the movement of objects can be guided by bounding box information. The training process involves preparing video data into a specific folder structure and configuring training parameters, with an example configuration provided for fine-tuning on datasets like BDD100K.

Quick Start & Requirements

  • Install/Run: accelerate launch train_svd.py
  • Prerequisites: PyTorch, accelerate, diffusers, transformers, xformers. Requires a GPU with CUDA.
  • Configuration: Training configuration parameters include pretrained_model_name_or_path, per_gpu_batch_size, max_train_steps, width, height, learning_rate, and mixed_precision.
  • Data: Video data needs to be organized into a specific directory structure.
  • Links: Part 1, Part 2

Highlighted Details

  • Enables fine-tuning of Stable Video Diffusion models.
  • Introduces tracklet-conditioned video generation for object motion control.
  • Leverages Self-Tracking and Instance-Enhancer techniques.
  • Supports custom video datasets for training.

Maintenance & Community

  • The project acknowledges contributions from Diffusers and Stability AI, as well as Boximator and GLIGEN.
  • A BibTeX entry is provided for citation.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is therefore unclear.

Limitations & Caveats

The project is still under development, with text-to-video support marked as "WIP" (Work In Progress). The README notes that the current video quality for tracklet control might appear poor due to the 512x320 resolution used, but this is intended to demonstrate the tracklet control effectiveness.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
19 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.