clip-as-service  by jina-ai

Scalable CLIP embedding service for images and text

Created 6 years ago
12,742 stars

Top 3.9% on SourcePulse

GitHubView on GitHub
Project Summary

CLIP-as-service provides a scalable, low-latency microservice for generating embeddings from images and text using CLIP models. It's designed for seamless integration into neural search solutions, enabling rapid development of cross-modal and multi-modal applications. The service targets developers and researchers building AI-powered search and reasoning systems.

How It Works

The service leverages CLIP models for embedding generation and cross-modal reasoning. It supports multiple serving backends including PyTorch (with or without JIT), ONNX Runtime, and TensorRT for optimized performance. This flexibility allows users to choose the best runtime based on their hardware and latency requirements. The architecture supports non-blocking streaming and horizontal scaling across multiple GPUs for high throughput.

Quick Start & Requirements

  • Install Server: pip install clip-server (or clip-server[onnx], clip-server[tensorrt]). Requires Python 3.7+.
  • Install Client: pip install clip-client. Requires Python 3.7+.
  • Run Server: python -m clip_server.
  • Dependencies: TensorRT and ONNX Runtime are optional for enhanced performance. GPU is recommended for optimal speed.
  • Docs: https://github.com/jina-ai/clip-as-service

Highlighted Details

  • Achieves up to 800 QPS with default configuration (single replica, PyTorch no JIT) on a GeForce RTX 3090.
  • Supports gRPC, HTTP, and WebSocket protocols with TLS and compression.
  • Offers a /rank endpoint for re-ranking cross-modal matches based on CLIP scores.
  • Integrates smoothly with Jina and DocArray for building complex search pipelines.

Maintenance & Community

  • Backed by Jina AI.
  • Community support via Discord.
  • YouTube channel for tutorials.

Licensing & Compatibility

  • Licensed under Apache-2.0.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The README mentions performance benchmarks are based on specific hardware (GeForce RTX 3090) and configurations, which may not be representative of all deployments. While it supports multiple runtimes, optimal performance often requires specific hardware like NVIDIA GPUs for TensorRT.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Chenlin Meng Chenlin Meng(Cofounder of Pika), and
9 more.

clip-retrieval by rom1504

0.2%
3k
CLIP retrieval system for semantic search
Created 4 years ago
Updated 1 month ago
Feedback? Help us improve.