clip.cpp by monatis

Plain C/C++ CLIP inference, dependency-free

Created 2 years ago

546 stars

Top 58.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Andrey Vasnetsov

Cofounder of Qdrant

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

This project provides a dependency-free C/C++ implementation of OpenAI's CLIP model, enabling efficient inference on resource-constrained devices. It targets developers and researchers needing to integrate CLIP for tasks like semantic search or zero-shot labeling without the overhead of large ML frameworks. The key benefit is a lightweight, fast inference engine with minimal dependencies.

How It Works

Leveraging the GGML library, clip.cpp offers highly optimized inference with support for 4-bit, 5-bit, and 8-bit quantization. This approach significantly reduces model size and memory footprint, making it suitable for edge devices and serverless deployments. It supports text-only, vision-only, and two-tower CLIP variants, providing flexibility for various applications.

Quick Start & Requirements

Install: pip install clip_cpp (for X64 Linux with AVX2). For other systems or instruction sets, build from source with cmake -DBUILD_SHARED_LIBS=ON .. and make.
Prerequisites: Standard Python libraries for pip install. Building from source requires CMake. Models are available on HuggingFace (tagged clip-cpp-gguf).
Resources: 4-bit quantized models are approximately 85.6 MB.
Links: Colab Notebook, HuggingFace Models

Highlighted Details

Dependency-free C/C++ inference engine.
Supports 4-bit, 5-bit, and 8-bit quantization.
Python bindings available with no external Python package dependencies.
Includes examples for basic inference, zero-shot labeling, and semantic image search.

Maintenance & Community

The project is actively maintained. Recent updates include Clojure bindings and a switch to the GGUF model format. Discussions and support can be found via GitHub issues.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Image preprocessing uses linear interpolation, which may differ numerically from PIL's bicubic interpolation with antialiasing. The GGUF format is a breaking change from previous .bin files.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days