tensorzero  by tensorzero

LLMOps framework for optimizing LLM applications via production data feedback

created 1 year ago
9,162 stars

Top 5.6% on sourcepulse

GitHubView on GitHub
Project Summary

TensorZero is an open-source framework designed to create a feedback loop for optimizing Large Language Model (LLM) applications. It targets engineers and researchers building production-grade LLM systems, enabling them to leverage production data for smarter, faster, and cheaper models.

How It Works

TensorZero unifies several key components of the LLMOps lifecycle: an LLM gateway for accessing diverse models, an observability layer to capture inference metrics and feedback, an optimization engine for prompts and models (including fine-tuning and RL), and an evaluation framework for comparing different strategies. This integrated approach aims to create a compounding data and learning flywheel, allowing systems to improve over time based on real-world usage. The core gateway is built in Rust for low-latency performance.

Quick Start & Requirements

  • Install: pip install tensorzero
  • Prerequisites: ClickHouse database is required for observability. Supports integration with any OpenAI-compatible API.
  • Setup: Quick Start guide claims a 5-minute setup from a basic OpenAI wrapper to a production-ready application with observability and fine-tuning.
  • Links: Quick Start, Comprehensive Tutorial, Deployment Guide.

Highlighted Details

  • Unified gateway supports numerous LLM providers (Anthropic, AWS Bedrock, Azure OpenAI, Gemini, Mistral, vLLM, etc.) and OpenAI-compatible APIs.
  • Rust-based gateway boasts <1ms P99 latency overhead at 10k QPS.
  • Features include A/B testing, fallbacks, prompt templating, batch inference, multimodal support, and GitOps configuration.
  • Optimization capabilities extend to supervised fine-tuning (SFT), preference fine-tuning (DPO), and inference-time optimizations like Best-of-N sampling and Dynamic In-Context Learning (DICL).

Maintenance & Community

  • Backed by investors of prominent open-source projects and AI labs.
  • Team includes former Rust compiler maintainer and researchers from top universities.
  • Community channels available via Slack and Discord.

Licensing & Compatibility

  • The project is 100% open-source and self-hosted with no paid features. The specific license is not explicitly stated in the README, but the emphasis on self-hosting and no paid features suggests a permissive license suitable for commercial use.

Limitations & Caveats

  • While the gateway is written in Rust, the Python client and other integrations rely on Python. A ClickHouse database is a mandatory dependency for the observability features.
Health Check
Last commit

23 hours ago

Responsiveness

1 day

Pull Requests (30d)
198
Issues (30d)
63
Star History
5,511 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
created 5 years ago
updated 3 weeks ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 10 hours ago
Feedback? Help us improve.