FastAPI for text-to-image diffusion using FP8
Top 95.5% on sourcepulse
This repository provides a FastAPI implementation of the Flux diffusion model, optimized for speed through FP8 matrix multiplication and other quantization techniques. It targets users seeking faster image generation on consumer hardware, offering a ~2x speedup over baseline implementations.
How It Works
The core innovation lies in leveraging FP8 precision for matrix multiplications within the Flux model, significantly accelerating computation. The implementation also supports compiling specific model blocks and additional layers for further performance gains. Remaining layers utilize faster half-precision accumulation.
Quick Start & Requirements
mamba create -n flux-fp8-matmul-api python=3.11 pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
python -m pip install -r requirements.txt
python main.py --config-path <path_to_config> --port <port_number>
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
bfloat16
for flow_dtype
is recommended for quality but may slightly slow down consumer GPUs.9 months ago
1 day