VideoX-Fun by aigc-apps

Flexible framework for advanced AI video generation

Created 1 year ago

1,784 stars

Top 23.9% on SourcePulse

Project Summary

Summary

VideoX-Fun is a flexible AI video generation framework enabling users to create videos at arbitrary resolutions and durations, and train custom Diffusion Transformer (DiT) baseline and Lora models. It targets researchers and power users seeking advanced video synthesis capabilities, offering custom style transfer and precise control over video output.

How It Works

The project leverages Diffusion Transformer (DiT) architectures for video generation. It supports direct prediction from pre-trained models, allowing for variable resolutions, durations, and FPS. Users can train their own baseline and Lora models for specific style transformations. Advanced features include support for various control conditions (Canny, Depth, Pose, MLSD) and camera trajectory control, enhancing creative flexibility.

Quick Start & Requirements

Installation is supported via Docker or local setup. Local setup requires Python 3.10/3.11, PyTorch 2.2.0, CUDA 11.8/12.1, and CUDNN 8+. Nvidia GPUs (e.g., 3060 12G, A100 40G) are necessary, with approximately 60GB of disk space for model weights. Cloud deployment via Aliyun PAI-DSW is also an option. Detailed instructions and model weights are available via Hugging Face and ModelScope links.

Highlighted Details

Extensive Model Support: Integrates multiple versions of CogVideoX-Fun and Wan models (e.g., Wan2.2-Fun-A14B, CogVideoX-Fun-V1.5-5b), offering text-to-video, image-to-video, and control-based generation.
Advanced Control: Features robust control mechanisms including Canny, Depth, Pose, MLSD, reference images, and camera trajectory control for precise video manipulation.
Custom Training: Facilitates training of custom baseline and Lora models, including reward-based Lora optimization for aligning generated content with human preferences.
Resolution & FPS Flexibility: Supports arbitrary video resolutions (e.g., 256x256 to 1024x1024) and variable frame rates (e.g., 8 FPS, 16 FPS, 24 FPS) depending on the chosen model.
Memory Optimization: Offers strategies like model_cpu_offload and qfloat8 quantization to manage GPU memory for large models on consumer hardware.

Maintenance & Community

No specific community channels (e.g., Discord, Slack), roadmap, or prominent maintainer/sponsor information was found in the provided README.

Licensing & Compatibility

The project is licensed under the Apache License 2.0. However, the CogVideoX-5B model's Transformers module is released under a separate "CogVideoX LICENSE," which may impose additional restrictions. Commercial use compatibility should be verified against this specific model license.

Limitations & Caveats

Large parameter models (e.g., 14B) demand significant GPU memory, often requiring the use of provided offloading and quantization techniques. Multi-GPU inference necessitates specific versions of xfuser and yunchang. The CogVideoX-5B model's distinct license requires careful review for commercial applications.

Health Check

Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

132 stars in the last 30 days