infinity  by michaelfeil

REST API for high-throughput, low-latency embedding and reranking

created 1 year ago
2,336 stars

Top 19.9% on sourcepulse

GitHubView on GitHub
Project Summary

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking, CLIP, CLAP, and ColPali models. It targets developers and researchers needing efficient inference for RAG and multimodal AI tasks, offering compatibility with Hugging Face models and OpenAI API specifications.

How It Works

Infinity leverages multiple fast inference backends including PyTorch, Optimum (ONNX/TensorRT), and CTranslate2, with optimizations like FlashAttention for NVIDIA CUDA, AMD ROCm, CPU, AWS INF2, and Apple MPS. It employs dynamic batching and tokenization in dedicated worker threads for efficient processing. The engine supports orchestrating multiple models simultaneously, enabling mix-and-match functionality for diverse AI pipelines.

Quick Start & Requirements

  • Install via pip: pip install infinity-emb[all]
  • Docker is recommended for deployment.
  • Supports NVIDIA CUDA, AMD ROCm, CPU, AWS INF2, and Apple MPS accelerators.
  • Requires Python 3.11+ for development.
  • Official documentation: https://michaelfeil.github.io/infinity

Highlighted Details

  • Deploys any model from Hugging Face.
  • Supports text embeddings, reranking, multimodal (CLIP, CLAP), and text classification.
  • Experimental support for INT8 (CPU/CUDA) and FP8 (H100/MI300).
  • OpenAI API compatible REST API.

Maintenance & Community

  • Developed by Michael Feil.
  • Active development with recent updates in late 2024.
  • Community links are not explicitly provided in the README.

Licensing & Compatibility

  • MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Specialized Docker images for ROCm and TensorRT/ONNX are not continuously built via CI/CD and may require pinning to exact versions.
  • CTranslate2 engine only supports BERT models.
  • Plain vision models (e.g., nomic-ai/nomic-embed-vision-v1.5) are not supported for multimodal tasks.
Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
1
Star History
220 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.