inference by xorbitsai

Model serving library for language, speech, and multimodal models

Created 2 years ago

8,925 stars

Top 5.7% on SourcePulse

View on GitHub

4 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Junyang Lin

Core Maintainer at Alibaba Qwen

Travis Fischer

Founder of Agentic

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

Xorbits Inference (Xinference) is a versatile library for serving large language, speech recognition, and multimodal models, enabling developers to easily integrate various open-source AI models into their applications. It aims to simplify model deployment and inference, offering flexibility for researchers, developers, and data scientists to utilize cutting-edge AI without vendor lock-in.

How It Works

Xinference provides a unified serving layer that supports multiple inference engines (like vLLM, GGML, TensorRT, MLX) and heterogeneous hardware utilization (CPU, GPU, Apple Silicon). It exposes an OpenAI-compatible RESTful API, along with RPC, CLI, and WebUI interfaces, facilitating seamless integration with other tools and frameworks like LangChain and LlamaIndex. Its distributed deployment capabilities allow models to run across multiple devices or machines.

Quick Start & Requirements

Install: pip install "xinference[all]"
Run Local: xinference-local
Prerequisites: Docker and CUDA are required for the Docker image.
Docs: https://github.com/xorbitsai/inference
Colab: https://colab.research.google.com/

Highlighted Details

Supports a wide range of model types: LLMs, speech recognition, multimodal, and text embedding.
Offers OpenAI-compatible API with Function Calling support.
Features distributed deployment across multiple nodes and heterogeneous hardware utilization.
Integrates with popular AI frameworks and platforms like LangChain, LlamaIndex, Dify, and Chatbox.

Maintenance & Community

The project is actively maintained with recent updates and contributions. Community engagement is encouraged via Discord and Twitter.

Licensing & Compatibility

The project is licensed under the Apache License 2.0, which permits commercial use and integration with closed-source applications.

Limitations & Caveats

While Xinference supports numerous backends and platforms, specific engine performance and compatibility may vary. The project is under active development, and some features might be experimental.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

103 stars in the last 30 days