Tune-A-Video  by showlab

Text-to-video generation via diffusion model fine-tuning

created 2 years ago
4,345 stars

Top 11.5% on sourcepulse

GitHubView on GitHub
Project Summary

Tune-A-Video enables one-shot fine-tuning of pre-trained text-to-image diffusion models for text-to-video generation. It allows users to adapt models like Stable Diffusion or personalized DreamBooth models to create videos from text prompts, based on a single input video. This is beneficial for researchers and content creators seeking to generate novel video content with specific styles or subjects.

How It Works

The method fine-tunes a pre-trained text-to-image diffusion model using a single video-text pair. It leverages the existing image generation capabilities of models like Stable Diffusion and adapts them to the temporal domain of video. The process involves fine-tuning the UNet component of the diffusion model, allowing it to generate coherent video sequences from text prompts while preserving the style and content of the input video.

Quick Start & Requirements

  • Install via pip install -r requirements.txt.
  • Requires pre-trained Stable Diffusion models (e.g., v1-4, v2-1) or personalized DreamBooth models from Hugging Face.
  • xformers is highly recommended for efficiency.
  • Training a 24-frame video takes ~10-15 minutes on an A100 GPU.
  • Colab demo available: [link to Colab demo]
  • Pre-trained models on Hugging Face: [link to Hugging Face models]

Highlighted Details

  • Supports fine-tuning on personalized DreamBooth models for subject-specific video generation.
  • Achieves improved consistency using DDIM inversion.
  • Demonstrates impressive results in transforming input videos with new text prompts and styles.
  • Offers a Python API for inference and integration into custom pipelines.

Maintenance & Community

  • Official implementation of a paper presented at ICCV 2023.
  • Code builds upon Hugging Face's diffusers library.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. However, it relies on Stable Diffusion models, which have their own licenses. Users should verify compatibility with their intended use cases, especially for commercial applications.

Limitations & Caveats

  • The README does not specify the license for the Tune-A-Video code itself, which may impact commercial use.
  • Performance and quality are dependent on the base diffusion model and the input video.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
4
Star History
33 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.