clip-as-service by jina-ai

Scalable CLIP embedding service for images and text

Created 7 years ago

12,799 stars

Top 3.9% on SourcePulse

View on GitHub

15 Experts Love This Project

Lilian Weng

Cofounder of Thinking Machines Lab

and 11 more!

Project Summary

CLIP-as-service provides a scalable, low-latency microservice for generating embeddings from images and text using CLIP models. It's designed for seamless integration into neural search solutions, enabling rapid development of cross-modal and multi-modal applications. The service targets developers and researchers building AI-powered search and reasoning systems.

How It Works

The service leverages CLIP models for embedding generation and cross-modal reasoning. It supports multiple serving backends including PyTorch (with or without JIT), ONNX Runtime, and TensorRT for optimized performance. This flexibility allows users to choose the best runtime based on their hardware and latency requirements. The architecture supports non-blocking streaming and horizontal scaling across multiple GPUs for high throughput.

Quick Start & Requirements

Install Server: pip install clip-server (or clip-server[onnx], clip-server[tensorrt]). Requires Python 3.7+.
Install Client: pip install clip-client. Requires Python 3.7+.
Run Server: python -m clip_server.
Dependencies: TensorRT and ONNX Runtime are optional for enhanced performance. GPU is recommended for optimal speed.
Docs: https://github.com/jina-ai/clip-as-service

Highlighted Details

Achieves up to 800 QPS with default configuration (single replica, PyTorch no JIT) on a GeForce RTX 3090.
Supports gRPC, HTTP, and WebSocket protocols with TLS and compression.
Offers a /rank endpoint for re-ranking cross-modal matches based on CLIP scores.
Integrates smoothly with Jina and DocArray for building complex search pipelines.

Maintenance & Community

Backed by Jina AI.
Community support via Discord.
YouTube channel for tutorials.

Licensing & Compatibility

Licensed under Apache-2.0.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The README mentions performance benchmarks are based on specific hardware (GeForce RTX 3090) and configurations, which may not be representative of all deployments. While it supports multiple runtimes, optimal performance often requires specific hardware like NVIDIA GPUs for TensorRT.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days