VideoTuna by VideoVerses

Codebase for text-to-video applications

Created 1 year ago

532 stars

Top 59.5% on SourcePulse

1 Expert Loves This Project

shizhediao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

VideoTuna is a comprehensive toolkit for text-to-video (T2V) and image-to-video (I2V) generation, offering unified pipelines for inference, fine-tuning, continuous training, and human preference alignment. It targets researchers and developers working with state-of-the-art video generation models, providing a flexible framework to adapt and improve existing models or train new ones.

How It Works

VideoTuna integrates multiple leading video generation models, including T2V, I2V, T2I, and V2V capabilities, within a single codebase. It supports advanced training techniques like LoRA and full fine-tuning, as well as human preference alignment using RLHF. The framework also includes post-processing enhancements and a novel VideoVAE+ model for improved video reconstruction.

Quick Start & Requirements

Installation: Uses Poetry for dependency management. Recommended: conda create -n videotuna python=3.10 -y && conda activate videotuna && pip install poetry && poetry install.
Prerequisites: Python 3.10, Poetry. Optional: flash-attn for Hunyuan model optimization. MacOS users with Apple Silicon should use Docker Compose due to dependency compatibility issues.
Setup Time: ~3 minutes for basic Poetry install.
Resources: High GPU memory requirements for inference (e.g., 60G+ for HunyuanVideo T2V). Fine-tuning commands tested on H800 80G GPUs.
Links: Checkpoints, Datasets, Fine-tuning Docs.

Highlighted Details

Supports inference for 10+ T2V/I2V/T2I models (e.g., HunyuanVideo, Wan2.1, Step Video, Mochi, CogVideoX, OpenSora, Flux).
Enables fine-tuning for VideoCrafter, DynamiCrafter, Open-Sora, CogVideoX, HunyuanVideo, and Flux.
Introduces VideoVAE+ for state-of-the-art video reconstruction.
Includes VBench evaluation support for T2V performance assessment.

Maintenance & Community

Managed via Poetry for streamlined dependency and script management.
Active development with recent updates including new model support and code formatting.
Project contributors are listed in the README.

Licensing & Compatibility

License: CC-BY-NC-ND.
Restrictions: Non-commercial use only. Commercial licensing requires contacting project leads.

Limitations & Caveats

The CC-BY-NC-ND license restricts commercial use.
Some dependencies may have compatibility issues on MacOS ARM64, necessitating Docker.
Installation of certain optional dependencies like flash-attn or swissarmytransformer might require retries.

Health Check

Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

0

Star History

5 stars in the last 30 days

Explore Similar Projects

dolphin by kaleido-lab

Video interaction platform based on LLMs

Created 2 years ago

Updated 2 years ago

t2v-turbo by Ji4chenLi

Text-to-video generation research paper implementation

Created 1 year ago

Updated 11 months ago

tarsier by bytedance

Video-language model for high-quality video descriptions and video understanding

Created 1 year ago

Updated 5 months ago

Vitron by SkyworkAI

Vision LLM research paper for pixel-level understanding, generation, segmentation & editing

Created 1 year ago

Updated 1 year ago

Allegro by rhymes-ai

Text-to-video model for generating short, high-quality videos

Created 1 year ago

Updated 11 months ago

VBench by Vchitect

Benchmark suite for video generation models

Created 2 years ago

Updated 2 days ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

EasyAnimate by aigc-apps

Video generator for high-resolution, long AI videos using transformer diffusion

Created 1 year ago

Updated 10 months ago

VideoPipe by sherlockchou86

Cross-platform C++ framework for video analysis and structuring

Created 3 years ago

Updated 2 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind),

Abubakar Abid

Abubakar Abid(Cofounder of Gradio), and

1 more.

sdnext by vladmandic

WebUI for AI generative image and video creation

Created 3 years ago

Updated 1 day ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Ying Sheng

Ying Sheng(Coauthor of SGLang), and

5 more.

Open-Sora-Plan by PKU-YuanGroup

Open-source project aiming to reproduce Sora-like T2V model

Created 1 year ago

Updated 2 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

3 more.

Wan2.1 by Wan-Video

Video foundation model for text-to-video, image-to-video, and video editing

Created 10 months ago

Updated 3 weeks ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Paras Jain

Paras Jain(Cofounder of Genmo), and

7 more.

Open-Sora by hpcaitech

Video generation initiative for efficient, high-quality video production

Created 1 year ago

Updated 8 months ago

Feedback? Help us improve.