LightX2V by ModelTC

Video generation inference framework for efficient synthesis

Created 11 months ago

1,973 stars

Top 21.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

Summary

LightX2V is a lightweight, high-performance inference framework for video generation, unifying multiple state-of-the-art techniques for tasks like text-to-video (T2V) and image-to-video (I2V). It targets researchers and developers seeking efficient video synthesis solutions, offering significant speedups and reduced resource requirements.

How It Works

The framework integrates diverse video generation models and techniques into a unified platform. Its core innovation lies in a revolutionary 4-step distillation process, compressing traditional 40-50 step inferences to just 4 steps without requiring Classifier-Free Guidance (CFG). This, combined with system optimizations and support for advanced operators like Sage Attention, Flash Attention, and vLLM, achieves up to ~20x inference acceleration on a single GPU.

Quick Start & Requirements

Comprehensive installation and usage instructions are available in the official documentation. A Docker image is provided for streamlined deployment. For a user-friendly experience, a Windows One-Click Deployment solution is recommended for first-time users, automating environment configuration. Key hardware requirements include GPU support, with the framework enabling 14B models for 480P/720P video generation on systems with as little as 8GB VRAM and 16GB RAM.

Documentation: English Docs, 中文文档
Docker Hub: lightx2v/lightx2v
Windows One-Click Deployment Guide: Link

Highlighted Details

Performance: Achieves ~20x inference speedup through 4-step distillation and system optimizations.
Resource Efficiency: Capable of running 14B models for 480P/720P video generation with as little as 8GB VRAM and 16GB RAM.
Advanced Features: Supports intelligent parameter offloading (disk-CPU-GPU), comprehensive quantization (w8a8-int8, w8a8-fp8, w4a4-nvfp4), smart feature caching, and multi-GPU parallel inference.
Deployment Flexibility: Offers Gradio, ComfyUI, and Windows one-click deployment options. Includes RIFE-based video frame interpolation.

Maintenance & Community

The project is maintained by the "LightX2V Contributors" and encourages community engagement through GitHub Issues for bug reports and feature requests, and GitHub Discussions for general Q&A.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license. This license is permissive and generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The framework is specifically designed for inference, not model training. While flexible, the "Windows One-Click Deployment" is highlighted as the recommended solution for first-time users, suggesting potential complexity in setting up on other operating systems or environments.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

124 stars in the last 30 days