VideoTuna  by VideoVerses

Codebase for text-to-video applications

created 9 months ago
490 stars

Top 63.8% on sourcepulse

GitHubView on GitHub
Project Summary

VideoTuna is a comprehensive toolkit for text-to-video (T2V) and image-to-video (I2V) generation, offering unified pipelines for inference, fine-tuning, continuous training, and human preference alignment. It targets researchers and developers working with state-of-the-art video generation models, providing a flexible framework to adapt and improve existing models or train new ones.

How It Works

VideoTuna integrates multiple leading video generation models, including T2V, I2V, T2I, and V2V capabilities, within a single codebase. It supports advanced training techniques like LoRA and full fine-tuning, as well as human preference alignment using RLHF. The framework also includes post-processing enhancements and a novel VideoVAE+ model for improved video reconstruction.

Quick Start & Requirements

  • Installation: Uses Poetry for dependency management. Recommended: conda create -n videotuna python=3.10 -y && conda activate videotuna && pip install poetry && poetry install.
  • Prerequisites: Python 3.10, Poetry. Optional: flash-attn for Hunyuan model optimization. MacOS users with Apple Silicon should use Docker Compose due to dependency compatibility issues.
  • Setup Time: ~3 minutes for basic Poetry install.
  • Resources: High GPU memory requirements for inference (e.g., 60G+ for HunyuanVideo T2V). Fine-tuning commands tested on H800 80G GPUs.
  • Links: Checkpoints, Datasets, Fine-tuning Docs.

Highlighted Details

  • Supports inference for 10+ T2V/I2V/T2I models (e.g., HunyuanVideo, Wan2.1, Step Video, Mochi, CogVideoX, OpenSora, Flux).
  • Enables fine-tuning for VideoCrafter, DynamiCrafter, Open-Sora, CogVideoX, HunyuanVideo, and Flux.
  • Introduces VideoVAE+ for state-of-the-art video reconstruction.
  • Includes VBench evaluation support for T2V performance assessment.

Maintenance & Community

  • Managed via Poetry for streamlined dependency and script management.
  • Active development with recent updates including new model support and code formatting.
  • Project contributors are listed in the README.

Licensing & Compatibility

  • License: CC-BY-NC-ND.
  • Restrictions: Non-commercial use only. Commercial licensing requires contacting project leads.

Limitations & Caveats

  • The CC-BY-NC-ND license restricts commercial use.
  • Some dependencies may have compatibility issues on MacOS ARM64, necessitating Docker.
  • Installation of certain optional dependencies like flash-attn or swissarmytransformer might require retries.
Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
36 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.