Discover and explore top open-source AI tools and projects—updated daily.
komikndrMulti-GPU parallelism for Comfy UI
Top 89.7% on SourcePulse
Summary
Raylight addresses the need for true multi-GPU parallelism in ComfyUI, enabling users to scale image and video generation beyond single high-end GPUs. By leveraging Ray for distributed computing, XDiT's Unified Sequence Parallelism (USP) for tensor splitting, and PyTorch's Fully Sharded Data Parallel (FSDP) for model weight sharding, it effectively combines VRAM across multiple GPUs. This allows for more efficient resource utilization and the ability to run larger models or generate higher-resolution content.
How It Works
Raylight integrates as a ComfyUI node, orchestrating GPU workers via the Ray framework. It employs USP to split tensor computations across GPUs, allowing each to process parts of a sequence. Crucially, FSDP shards the model weights themselves across available GPUs, overcoming the limitation where each GPU must load the entire model. This dual approach of USP and FSDP maximizes GPU utilization, effectively pooling VRAM and enabling scalable inference.
Quick Start & Requirements
Installation is straightforward via cloning to ComfyUI/custom_nodes and running pip install -r requirements.txt, or by using the ComfyUI Manager. Key dependencies include nvidia-nccl-cu12==2.28.9 for FSDP with fp8 models on Nvidia GPUs, and optional FlashAttention 2. Windows users are advised to use WSL due to installation complexities.
Highlighted Details
Maintenance & Community
The project acknowledges contributions from specific users (rmatif, City96) and provides a PayPal link for support. No explicit community channels (Discord, Slack) or detailed roadmap are detailed in the provided text.
Licensing & Compatibility
No license information is explicitly stated in the provided README text, which is a critical omission for assessing commercial use or derivative works.
Limitations & Caveats
Known issues include potential dequantized errors with ComfyUI's mixed-precision fp8 models and VRAM leakage with specific Ring/Ulysses configurations. PyTorch 2.8.1+ is recommended for FSDP stability. Some models have limited or no USP/FSDP support. Windows installation is complex, with WSL being the recommended workaround.
1 week ago
Inactive
kaiyuyue
tunib-ai
tunib-ai
NVIDIA