LTX-Video by Lightricks

DiT-based video generation model for high-quality, real-time video creation

Created 1 year ago

9,047 stars

Top 5.7% on SourcePulse

View on GitHub

4 Experts Love This Project

Gabriel Almeida

Cofounder of Langflow

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Jesse Clark

Cofounder of Marqo

Jiaming Song

Chief Scientist at Luma AI

Project Summary

LTX-Video is a DiT-based video generation model designed for real-time, high-quality video creation. It targets researchers and developers interested in advanced video synthesis, offering capabilities like text-to-video, image-to-video, and video extension, with a focus on speed and resolution.

How It Works

LTX-Video utilizes a Diffusion Transformer (DiT) architecture, enabling it to generate high-resolution videos at 30 FPS in real-time. This approach allows for faster-than-watch-time generation, a significant improvement over previous methods. The model is trained on a large, diverse video dataset, facilitating the creation of realistic and varied content.

Quick Start & Requirements

Installation: Clone the repository, create a virtual environment, and install with pip install -e .[inference-script].
Dependencies: Python 3.10.5+, CUDA 12.2+, PyTorch >= 2.1.2. MPS support for macOS requires PyTorch 2.3.0 or >= 2.6.
Model Download: Use hf_hub_download from Hugging Face to get the distilled or full model checkpoints.
Inference: Run via inference.py script for text-to-video, image-to-video, and video extension.
ComfyUI/Diffusers: Integrations available via separate repositories and official documentation.
Resources: Requires significant GPU resources for local inference.
Links: Website, Model, Demo, Paper.

Highlighted Details

Generates 30 FPS videos at 1216x704 resolution in real-time.
Supports text-to-video, image-to-video, keyframe animation, video extension (forward/backward), and video-to-video transformations.
Distilled model offers 15x faster inference, supports fewer diffusion steps, and omits classifier-free guidance.
Features automatic prompt enhancement for shorter prompts.

Maintenance & Community

Active development with regular updates and new checkpoints.
Community contributions are encouraged, with projects like ComfyUI-LTXTricks and LTX-VideoQ8 highlighted.
Links to community discussions and careers page available.

Licensing & Compatibility

Newer checkpoints (v0.9.6, v0.9.5) are released under an "Open Weights" or "OpenRail-M" license, allowing commercial use. Earlier versions may have different terms.

Limitations & Caveats

Input video segments for extension require specific frame counts (multiple of 8 + 1).
Optimal resolutions are under 720x1280 and frame counts below 257.
While real-time, performance is highly dependent on hardware, especially for higher resolutions and frame counts.

Health Check

Last Commit

6 days ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

183 stars in the last 30 days