raylight by komikndr

Multi-GPU parallelism for Comfy UI

Created 11 months ago

364 stars

Top 77.1% on SourcePulse

Project Summary

Summary

Raylight addresses the need for true multi-GPU parallelism in ComfyUI, enabling users to scale image and video generation beyond single high-end GPUs. By leveraging Ray for distributed computing, XDiT's Unified Sequence Parallelism (USP) for tensor splitting, and PyTorch's Fully Sharded Data Parallel (FSDP) for model weight sharding, it effectively combines VRAM across multiple GPUs. This allows for more efficient resource utilization and the ability to run larger models or generate higher-resolution content.

How It Works

Raylight integrates as a ComfyUI node, orchestrating GPU workers via the Ray framework. It employs USP to split tensor computations across GPUs, allowing each to process parts of a sequence. Crucially, FSDP shards the model weights themselves across available GPUs, overcoming the limitation where each GPU must load the entire model. This dual approach of USP and FSDP maximizes GPU utilization, effectively pooling VRAM and enabling scalable inference.

Quick Start & Requirements

Installation is straightforward via cloning to ComfyUI/custom_nodes and running pip install -r requirements.txt, or by using the ComfyUI Manager. Key dependencies include nvidia-nccl-cu12==2.28.9 for FSDP with fp8 models on Nvidia GPUs, and optional FlashAttention 2. Windows users are advised to use WSL due to installation complexities.

Highlighted Details

Extensive model support including Wan, Flux, Chroma, Qwen, Z Image, Lumina 2, Hunyuan Video, Kandinsky5, LTX-2, SD1.5, and SDXL.
Multiple operational modes: Sequence Parallelism (USP), Data Parallel (DP), FSDP, and combined Sequence + FSDP.
FSDP CPU Offload is available for extremely low VRAM scenarios.
Supports AMD ROCm architectures (MI3XX, MI210).
Benchmarks demonstrate significant speedups and VRAM efficiency gains with multi-GPU configurations.

Maintenance & Community

The project acknowledges contributions from specific users (rmatif, City96) and provides a PayPal link for support. No explicit community channels (Discord, Slack) or detailed roadmap are detailed in the provided text.

Licensing & Compatibility

No license information is explicitly stated in the provided README text, which is a critical omission for assessing commercial use or derivative works.

Limitations & Caveats

Known issues include potential dequantized errors with ComfyUI's mixed-precision fp8 models and VRAM leakage with specific Ring/Ulysses configurations. PyTorch 2.8.1+ is recommended for FSDP stability. Some models have limited or no USP/FSDP support. Windows installation is complex, with WSL being the recommended workaround.

raylight by komikndr

Explore Similar Projects

torchshard by kaiyuyue

oslo by tunib-ai

gpu-optimization-workshop by mlops-discord

stable-diffusion-nvidia-docker by NickLucche

Grendel-GS by nyu-systems

glake by antgroup

parallelformers by tunib-ai

ComfyUI-MultiGPU by pollockjj

tensorrt-cpp-api by cyrusbehr

DeepSpeed-MII by deepspeedai

kompute by KomputeProject

xDiT by xdit-project