LTX-Video  by Lightricks

DiT-based video generation model for high-quality, real-time video creation

created 8 months ago
7,654 stars

Top 6.9% on sourcepulse

GitHubView on GitHub
Project Summary

LTX-Video is a DiT-based video generation model designed for real-time, high-quality video creation. It targets researchers and developers interested in advanced video synthesis, offering capabilities like text-to-video, image-to-video, and video extension, with a focus on speed and resolution.

How It Works

LTX-Video utilizes a Diffusion Transformer (DiT) architecture, enabling it to generate high-resolution videos at 30 FPS in real-time. This approach allows for faster-than-watch-time generation, a significant improvement over previous methods. The model is trained on a large, diverse video dataset, facilitating the creation of realistic and varied content.

Quick Start & Requirements

  • Installation: Clone the repository, create a virtual environment, and install with pip install -e .[inference-script].
  • Dependencies: Python 3.10.5+, CUDA 12.2+, PyTorch >= 2.1.2. MPS support for macOS requires PyTorch 2.3.0 or >= 2.6.
  • Model Download: Use hf_hub_download from Hugging Face to get the distilled or full model checkpoints.
  • Inference: Run via inference.py script for text-to-video, image-to-video, and video extension.
  • ComfyUI/Diffusers: Integrations available via separate repositories and official documentation.
  • Resources: Requires significant GPU resources for local inference.
  • Links: Website, Model, Demo, Paper.

Highlighted Details

  • Generates 30 FPS videos at 1216x704 resolution in real-time.
  • Supports text-to-video, image-to-video, keyframe animation, video extension (forward/backward), and video-to-video transformations.
  • Distilled model offers 15x faster inference, supports fewer diffusion steps, and omits classifier-free guidance.
  • Features automatic prompt enhancement for shorter prompts.

Maintenance & Community

  • Active development with regular updates and new checkpoints.
  • Community contributions are encouraged, with projects like ComfyUI-LTXTricks and LTX-VideoQ8 highlighted.
  • Links to community discussions and careers page available.

Licensing & Compatibility

  • Newer checkpoints (v0.9.6, v0.9.5) are released under an "Open Weights" or "OpenRail-M" license, allowing commercial use. Earlier versions may have different terms.

Limitations & Caveats

  • Input video segments for extension require specific frame counts (multiple of 8 + 1).
  • Optimal resolutions are under 720x1280 and frame counts below 257.
  • While real-time, performance is highly dependent on hardware, especially for higher resolutions and frame counts.
Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
4
Issues (30d)
13
Star History
4,194 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.