flux2.c  by antirez

Pure C inference engine for AI image generation

Created 1 week ago

New!

1,393 stars

Top 29.0% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a pure C inference engine for the FLUX.2-klein-4B image generation model, enabling AI image generation without Python, PyTorch, or CUDA dependencies. It targets engineers and researchers seeking to integrate AI into C/C++ projects or run models on diverse platforms, offering a lightweight, portable, and accessible solution for AI image synthesis.

How It Works

The core of the project is a complete implementation of the FLUX.2-klein-4B model inference written entirely in C, requiring only the standard C library. It supports optional acceleration via BLAS (e.g., OpenBLAS) for significant speedups or Metal Performance Shaders (MPS) on Apple Silicon for maximum performance. The system includes the Qwen3-4B text encoder directly, managing model loading, text encoding, diffusion, and VAE decoding within a single, self-contained executable. This approach bypasses the typical Python AI ecosystem, promoting broader accessibility and easier integration.

Quick Start & Requirements

  • Primary install / run command:
    • Build: make mps (Apple Silicon), make blas (Intel Mac/Linux w/ OpenBLAS), or make generic (Pure C).
    • Model Download: pip install huggingface_hub then python download_model.py.
    • Generate Image: ./flux -d flux-klein-model -p "A woman wearing sunglasses" -o output.png.
  • Non-default prerequisites and dependencies:
    • Model weights (~16GB).
    • Optional: BLAS library (OpenBLAS on Linux) or Apple's Accelerate framework for make blas.
    • Optional: Apple Silicon for make mps.
    • huggingface_hub Python package for model download.
  • Estimated setup time or resource footprint: Model download is ~16GB. Normal inference requires ~16GB RAM; the --mmap flag reduces peak memory to ~4-5GB for 16GB RAM systems.
  • Links: download_model.py script is provided for model acquisition.

Highlighted Details

  • Zero Dependencies: Pure C implementation, works standalone without external libraries beyond the C standard library.
  • Optional Acceleration: Supports BLAS for ~30x speedup and Metal GPU acceleration on Apple Silicon for the fastest inference.
  • Integrated Text Encoder: The Qwen3-4B text encoder is built-in and automatically released after encoding to conserve memory.
  • Memory Efficient: The --mmap flag enables on-demand weight loading, reducing peak memory usage from ~16GB to ~4-5GB, making it suitable for systems with 16GB RAM.
  • Direct Model Usage: Runs directly with safetensors models without requiring quantization or conversion.
  • C Library API: Offers a libflux.a library for seamless integration into custom C/C++ projects.

Maintenance & Community

The project was developed rapidly with AI assistance, indicating a focus on demonstrating feasibility. No specific community channels (Discord, Slack), roadmap links, or notable contributor/sponsorship information are provided in the README.

Licensing & Compatibility

  • License type: MIT License.
  • Compatibility notes: The permissive MIT license allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The pure C (make generic) backend is considerably slower than accelerated versions. The --mmap mode trades inference speed for reduced memory footprint. Benchmarks indicate that PyTorch implementations can be faster due to better GPU activation handling. The maximum supported resolution is 1024x1024 pixels, and dimensions must be multiples of 16. The inference engine is specific to the FLUX.2-klein-4B model.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
10
Issues (30d)
11
Star History
1,399 stars in the last 9 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

diffusers-rs by LaurentMazare

0%
583
Rust implementation of the Diffusers API for generative models
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.3%
5k
Image synthesis research paper using a linear diffusion transformer
Created 1 year ago
Updated 1 week ago
Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Cofounder of Luma AI), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
2 more.

HunyuanVideo by Tencent-Hunyuan

0.3%
12k
PyTorch code for video generation research
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.