Discover and explore top open-source AI tools and projects—updated daily.
areddenFastAPI for text-to-image diffusion using FP8
Top 92.0% on SourcePulse
This repository provides a FastAPI implementation of the Flux diffusion model, optimized for speed through FP8 matrix multiplication and other quantization techniques. It targets users seeking faster image generation on consumer hardware, offering a ~2x speedup over baseline implementations.
How It Works
The core innovation lies in leveraging FP8 precision for matrix multiplications within the Flux model, significantly accelerating computation. The implementation also supports compiling specific model blocks and additional layers for further performance gains. Remaining layers utilize faster half-precision accumulation.
Quick Start & Requirements
mamba create -n flux-fp8-matmul-api python=3.11 pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidiapython -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124python -m pip install -r requirements.txtpython main.py --config-path <path_to_config> --port <port_number>Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
bfloat16 for flow_dtype is recommended for quality but may slightly slow down consumer GPUs.1 year ago
Inactive
MDK8888
casper-hansen