Discover and explore top open-source AI tools and projects—updated daily.
antirezPure C inference engine for AI image generation
New!
Top 29.0% on SourcePulse
This project provides a pure C inference engine for the FLUX.2-klein-4B image generation model, enabling AI image generation without Python, PyTorch, or CUDA dependencies. It targets engineers and researchers seeking to integrate AI into C/C++ projects or run models on diverse platforms, offering a lightweight, portable, and accessible solution for AI image synthesis.
How It Works
The core of the project is a complete implementation of the FLUX.2-klein-4B model inference written entirely in C, requiring only the standard C library. It supports optional acceleration via BLAS (e.g., OpenBLAS) for significant speedups or Metal Performance Shaders (MPS) on Apple Silicon for maximum performance. The system includes the Qwen3-4B text encoder directly, managing model loading, text encoding, diffusion, and VAE decoding within a single, self-contained executable. This approach bypasses the typical Python AI ecosystem, promoting broader accessibility and easier integration.
Quick Start & Requirements
make mps (Apple Silicon), make blas (Intel Mac/Linux w/ OpenBLAS), or make generic (Pure C).pip install huggingface_hub then python download_model.py../flux -d flux-klein-model -p "A woman wearing sunglasses" -o output.png.make blas.make mps.huggingface_hub Python package for model download.--mmap flag reduces peak memory to ~4-5GB for 16GB RAM systems.download_model.py script is provided for model acquisition.Highlighted Details
--mmap flag enables on-demand weight loading, reducing peak memory usage from ~16GB to ~4-5GB, making it suitable for systems with 16GB RAM.libflux.a library for seamless integration into custom C/C++ projects.Maintenance & Community
The project was developed rapidly with AI assistance, indicating a focus on demonstrating feasibility. No specific community channels (Discord, Slack), roadmap links, or notable contributor/sponsorship information are provided in the README.
Licensing & Compatibility
Limitations & Caveats
The pure C (make generic) backend is considerably slower than accelerated versions. The --mmap mode trades inference speed for reduced memory footprint. Benchmarks indicate that PyTorch implementations can be faster due to better GPU activation handling. The maximum supported resolution is 1024x1024 pixels, and dimensions must be multiples of 16. The inference engine is specific to the FLUX.2-klein-4B model.
1 day ago
Inactive
monatis
LaurentMazare
NVlabs
Tencent-Hunyuan