LTX-2 by Lightricks

DiT-based audio-video foundation model for generative tasks

Created 6 months ago

8,186 stars

Top 6.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

LTX-2 is an open-access, DiT-based audio-video foundation model designed for synchronized, high-fidelity video generation. It offers production-ready outputs, multiple performance modes, and API access, targeting researchers and developers in advanced video synthesis.

How It Works

Leveraging a Diffusion Transformer (DiT) architecture, LTX-2 generates synchronized audio and video streams. Its design prioritizes high fidelity, flexible performance modes (including fast inference and upscaling), and production-ready outputs, positioning it as a versatile tool for contemporary video generation challenges.

Quick Start & Requirements

Install: Clone the repository, then run uv sync --frozen and source .venv/bin/activate.
Prerequisites: Requires downloading multiple model checkpoints from HuggingFace (LTX-2 Model Checkpoint, Spatial Upscaler, Temporal Upscaler, Distilled LoRA, Gemma Text Encoder, Gemma 3 LoRAs).
Links:
- Repository: https://github.com/Lightricks/LTX-2.git
- Prompting Guide: https://ltx.video/blog/how-to-prompt-for-ltx-2
- ComfyUI Integration: https://github.com/Lightricks/ComfyUI-LTXVideo/

Highlighted Details

Pipelines: Offers diverse generation modes including TI2VidTwoStagesPipeline (production, recommended), TI2VidOneStagePipeline (prototyping), DistilledPipeline (fastest inference), ICLoraPipeline (video-to-video), and KeyframeInterpolationPipeline.
Optimization: Supports FP8 transformers for reduced memory, xFormers/Flash Attention integration, gradient estimation for fewer inference steps, and single-stage pipelines for speed.
Prompting: Employs detailed, cinematographer-style prompts (max 200 words) with an optional automatic prompt enhancement feature (enhance_prompt).
ComfyUI Integration: Seamlessly integrates with ComfyUI via a dedicated repository.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord/Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The license type and any compatibility notes for commercial or closed-source use are not specified in the provided README.

Limitations & Caveats

The temporal upscaler is noted as supported but required for future pipeline implementations, indicating potential limitations in current temporal coherence features. Optimization tips like FP8 and Flash Attention may imply specific hardware dependencies (e.g., NVIDIA GPUs). The setup involves downloading numerous large model files.

Health Check

Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

973 stars in the last 30 days