ComfyUI-MultiGPU  by pollockjj

ComfyUI extension for advanced multi-GPU memory management

Created 1 year ago
729 stars

Top 47.4% on SourcePulse

GitHubView on GitHub
Project Summary

This custom node for ComfyUI addresses VRAM limitations by implementing "Virtual VRAM" and multi-GPU distribution for model components. It targets ComfyUI users seeking to maximize latent space processing, run larger models, or leverage multiple GPUs by intelligently offloading model layers (UNet, CLIP, VAE) to system RAM or secondary GPUs, thereby freeing up primary GPU VRAM for computation.

How It Works

The core of the project utilizes DisTorch (distributed torch) to manage model component placement across available devices. Instead of loading entire models onto a single GPU, DisTorch allows static parts of models to be offloaded to slower memory like CPU DRAM or other GPUs. This enhances memory management, not parallel processing, as workflow steps remain sequential. Users can select donor devices and specify offload amounts via a simple slider ("Normal Mode") or precise allocation strings ("Expert Mode"). Expert modes include 'bytes' for exact GB/MB allocation per device, 'ratio' for percentage-based splitting, and 'fraction' for device VRAM utilization percentages, enabling fine-grained control over model distribution.

Quick Start & Requirements

Installation is preferably done via the ComfyUI-Manager by searching for ComfyUI-MultiGPU. Manual installation involves cloning the repository into the ComfyUI/custom_nodes/ directory. The extension automatically creates multi-GPU versions of existing ComfyUI loader nodes and integrates with specific libraries like WanVideoWrapper, ComfyUI-GGUF, and others for expanded functionality. Tested setups include multi-GPU configurations (e.g., 2x 3090 + 1060ti, 4070, 3090/1070ti).

Highlighted Details

  • Supports universal .safetensors and GGUF model formats.
  • DisTorch 2.0 offers potential up to 10% faster GGUF inference compared to DisTorch V1.
  • Provides tight integration with WanVideoWrapper, adding eight bespoke MultiGPU nodes.
  • Features intuitive 'bytes' and 'ratio' model-driven allocation modes for precise device placement.
  • Enables offloading of the entire model while still allowing compute operations on the primary CUDA device.

Maintenance & Community

The project is currently maintained by pollockjj and was originally created by Alexander Dzhoganov. No specific community channels (like Discord/Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The provided README does not specify a software license. This lack of explicit licensing information may pose a barrier to adoption for users requiring clear commercial or open-source usage terms.

Limitations & Caveats

The extension focuses on memory management and component offloading, not parallelizing workflow execution steps, which remain sequential. Compatibility with specific model loaders depends on their availability and integration within the ComfyUI ecosystem. The absence of a stated license is a significant caveat for assessing adoption suitability.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
13
Star History
62 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
2 more.

SimpleTuner by bghira

0.4%
3k
Fine-tuning kit for diffusion models
Created 2 years ago
Updated 20 hours ago
Feedback? Help us improve.