ComfyUI-nunchaku  by nunchaku-tech

ComfyUI plugin for efficient 4-bit neural network inference

created 4 months ago
1,774 stars

Top 24.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides ComfyUI nodes for Nunchaku, an efficient inference engine for 4-bit neural networks quantized with SVDQuant. It targets users of ComfyUI looking to leverage highly optimized, memory-efficient diffusion models, offering significant speedups and reduced VRAM requirements.

How It Works

Nunchaku utilizes SVDQuant for 4-bit quantization, enabling efficient inference on consumer hardware. The ComfyUI nodes integrate this engine, providing specialized loaders for diffusion models, LoRAs, and text encoders. Key advantages include a custom FP16 attention implementation that outperforms flash-attention2 on compatible hardware and a First-Block Cache mechanism to further accelerate inference.

Quick Start & Requirements

  • Installation: Install via ComfyUI Manager or manually clone into ComfyUI/custom_nodes.
  • Prerequisites: ComfyUI, Python, comfy-cli (optional). Requires downloading specific models (e.g., FLUX.1-schnell, text encoders) from HuggingFace/ModelScope.
  • Compatibility: Supports NVIDIA 20-series (Turing) GPUs and newer. FP16 attention is required for 20-series GPUs.
  • Resources: Detailed installation tutorials (video/text) are available.

Highlighted Details

  • Nunchaku-FP16 attention is ~1.2x faster than flash-attention2 without precision loss.
  • Supports multi-LoRA and ControlNet integration.
  • Includes CPU offloading options for reduced GPU memory usage.
  • LoRA loading does not require pre-conversion.

Maintenance & Community

  • Active development with regular updates and roadmap publications.
  • Community support available via Slack, Discord, and WeChat.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README, but it is associated with the MIT-HAN-LAB, suggesting a permissive license. Compatibility with commercial or closed-source projects is likely, but verification is recommended.

Limitations & Caveats

  • The 4-bit T5 model loading currently consumes excessive memory, with optimizations planned.
  • The FLUX.1 Depth Preprocessor node is deprecated.
Health Check
Last commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
13
Issues (30d)
103
Star History
939 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 16 hours ago
Feedback? Help us improve.