iris.c by antirez

Pure C inference engine for AI image generation

Created 1 month ago

1,885 stars

Top 22.7% on SourcePulse

View on GitHub

7 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

and 3 more!

Project Summary

This project provides a pure C inference engine for the FLUX.2-klein-4B image generation model, enabling AI image generation without Python, PyTorch, or CUDA dependencies. It targets engineers and researchers seeking to integrate AI into C/C++ projects or run models on diverse platforms, offering a lightweight, portable, and accessible solution for AI image synthesis.

How It Works

The core of the project is a complete implementation of the FLUX.2-klein-4B model inference written entirely in C, requiring only the standard C library. It supports optional acceleration via BLAS (e.g., OpenBLAS) for significant speedups or Metal Performance Shaders (MPS) on Apple Silicon for maximum performance. The system includes the Qwen3-4B text encoder directly, managing model loading, text encoding, diffusion, and VAE decoding within a single, self-contained executable. This approach bypasses the typical Python AI ecosystem, promoting broader accessibility and easier integration.

Quick Start & Requirements

Primary install / run command:
- Build: make mps (Apple Silicon), make blas (Intel Mac/Linux w/ OpenBLAS), or make generic (Pure C).
- Model Download: pip install huggingface_hub then python download_model.py.
- Generate Image: ./flux -d flux-klein-model -p "A woman wearing sunglasses" -o output.png.
Non-default prerequisites and dependencies:
- Model weights (~16GB).
- Optional: BLAS library (OpenBLAS on Linux) or Apple's Accelerate framework for make blas.
- Optional: Apple Silicon for make mps.
- huggingface_hub Python package for model download.
Estimated setup time or resource footprint: Model download is ~16GB. Normal inference requires ~16GB RAM; the --mmap flag reduces peak memory to ~4-5GB for 16GB RAM systems.
Links: download_model.py script is provided for model acquisition.

Highlighted Details

Zero Dependencies: Pure C implementation, works standalone without external libraries beyond the C standard library.
Optional Acceleration: Supports BLAS for ~30x speedup and Metal GPU acceleration on Apple Silicon for the fastest inference.
Integrated Text Encoder: The Qwen3-4B text encoder is built-in and automatically released after encoding to conserve memory.
Memory Efficient: The --mmap flag enables on-demand weight loading, reducing peak memory usage from ~16GB to ~4-5GB, making it suitable for systems with 16GB RAM.
Direct Model Usage: Runs directly with safetensors models without requiring quantization or conversion.
C Library API: Offers a libflux.a library for seamless integration into custom C/C++ projects.

Maintenance & Community

The project was developed rapidly with AI assistance, indicating a focus on demonstrating feasibility. No specific community channels (Discord, Slack), roadmap links, or notable contributor/sponsorship information are provided in the README.

Licensing & Compatibility

License type: MIT License.
Compatibility notes: The permissive MIT license allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The pure C (make generic) backend is considerably slower than accelerated versions. The --mmap mode trades inference speed for reduced memory footprint. Benchmarks indicate that PyTorch implementations can be faster due to better GPU activation handling. The maximum supported resolution is 1024x1024 pixels, and dimensions must be multiples of 16. The inference engine is specific to the FLUX.2-klein-4B model.

Health Check

Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

110 stars in the last 30 days