ComfyUI-MultiGPU by pollockjj

ComfyUI extension for advanced multi-GPU memory management

Created 1 year ago

801 stars

Top 44.0% on SourcePulse

Project Summary

This custom node for ComfyUI addresses VRAM limitations by implementing "Virtual VRAM" and multi-GPU distribution for model components. It targets ComfyUI users seeking to maximize latent space processing, run larger models, or leverage multiple GPUs by intelligently offloading model layers (UNet, CLIP, VAE) to system RAM or secondary GPUs, thereby freeing up primary GPU VRAM for computation.

How It Works

The core of the project utilizes DisTorch (distributed torch) to manage model component placement across available devices. Instead of loading entire models onto a single GPU, DisTorch allows static parts of models to be offloaded to slower memory like CPU DRAM or other GPUs. This enhances memory management, not parallel processing, as workflow steps remain sequential. Users can select donor devices and specify offload amounts via a simple slider ("Normal Mode") or precise allocation strings ("Expert Mode"). Expert modes include 'bytes' for exact GB/MB allocation per device, 'ratio' for percentage-based splitting, and 'fraction' for device VRAM utilization percentages, enabling fine-grained control over model distribution.

Quick Start & Requirements

Installation is preferably done via the ComfyUI-Manager by searching for ComfyUI-MultiGPU. Manual installation involves cloning the repository into the ComfyUI/custom_nodes/ directory. The extension automatically creates multi-GPU versions of existing ComfyUI loader nodes and integrates with specific libraries like WanVideoWrapper, ComfyUI-GGUF, and others for expanded functionality. Tested setups include multi-GPU configurations (e.g., 2x 3090 + 1060ti, 4070, 3090/1070ti).

Highlighted Details

Supports universal .safetensors and GGUF model formats.
DisTorch 2.0 offers potential up to 10% faster GGUF inference compared to DisTorch V1.
Provides tight integration with WanVideoWrapper, adding eight bespoke MultiGPU nodes.
Features intuitive 'bytes' and 'ratio' model-driven allocation modes for precise device placement.
Enables offloading of the entire model while still allowing compute operations on the primary CUDA device.

Maintenance & Community

The project is currently maintained by pollockjj and was originally created by Alexander Dzhoganov. No specific community channels (like Discord/Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The provided README does not specify a software license. This lack of explicit licensing information may pose a barrier to adoption for users requiring clear commercial or open-source usage terms.

Limitations & Caveats

The extension focuses on memory management and component offloading, not parallelizing workflow execution steps, which remain sequential. Compatibility with specific model loaders depends on their availability and integration within the ComfyUI ecosystem. The absence of a stated license is a significant caveat for assessing adoption suitability.

ComfyUI-MultiGPU by pollockjj

Explore Similar Projects

raylight by komikndr

gpu-optimization-workshop by mlops-discord

DFloat11 by LeanModels

stable-diffusion-nvidia-docker by NickLucche

glake by antgroup

finetune-gpt2xl by Xirider

kvcached by ovg-project

k8s-vgpu-scheduler by 4paradigm

amd-strix-halo-toolboxes by kyuz0

gdrcopy by NVIDIA

SimpleTuner by bghira

FlexLLMGen by FMInference