Discover and explore top open-source AI tools and projects—updated daily.
aigc-appsFlexible framework for advanced AI video generation
Top 27.4% on SourcePulse
Summary
VideoX-Fun is a flexible AI video generation framework enabling users to create videos at arbitrary resolutions and durations, and train custom Diffusion Transformer (DiT) baseline and Lora models. It targets researchers and power users seeking advanced video synthesis capabilities, offering custom style transfer and precise control over video output.
How It Works
The project leverages Diffusion Transformer (DiT) architectures for video generation. It supports direct prediction from pre-trained models, allowing for variable resolutions, durations, and FPS. Users can train their own baseline and Lora models for specific style transformations. Advanced features include support for various control conditions (Canny, Depth, Pose, MLSD) and camera trajectory control, enhancing creative flexibility.
Quick Start & Requirements
Installation is supported via Docker or local setup. Local setup requires Python 3.10/3.11, PyTorch 2.2.0, CUDA 11.8/12.1, and CUDNN 8+. Nvidia GPUs (e.g., 3060 12G, A100 40G) are necessary, with approximately 60GB of disk space for model weights. Cloud deployment via Aliyun PAI-DSW is also an option. Detailed instructions and model weights are available via Hugging Face and ModelScope links.
Highlighted Details
model_cpu_offload and qfloat8 quantization to manage GPU memory for large models on consumer hardware.Maintenance & Community
No specific community channels (e.g., Discord, Slack), roadmap, or prominent maintainer/sponsor information was found in the provided README.
Licensing & Compatibility
The project is licensed under the Apache License 2.0. However, the CogVideoX-5B model's Transformers module is released under a separate "CogVideoX LICENSE," which may impose additional restrictions. Commercial use compatibility should be verified against this specific model license.
Limitations & Caveats
Large parameter models (e.g., 14B) demand significant GPU memory, often requiring the use of provided offloading and quantization techniques. Multi-GPU inference necessitates specific versions of xfuser and yunchang. The CogVideoX-5B model's distinct license requires careful review for commercial applications.
12 hours ago
Inactive
Lightricks