serving by tensorflow

Serving system for machine learning models in production

Created 10 years ago

6,342 stars

Top 8.0% on SourcePulse

View on GitHub

10 Experts Love This Project

Peter Salanki

Cofounder of CoreWeave

Travis Fischer

Founder of Agentic

James Reed

Cofounder of Fireworks AI

Cristóbal Valenzuela

Cofounder of Runway

and 6 more!

Project Summary

TensorFlow Serving is a high-performance system for deploying machine learning models in production, primarily targeting ML engineers and researchers. It addresses the challenge of efficiently serving inference requests by managing model lifecycles and providing versioned access, enabling seamless updates and A/B testing without client code changes.

How It Works

TensorFlow Serving is built for flexibility and performance, supporting multiple models and versions simultaneously. It exposes both gRPC and HTTP endpoints for inference. A key feature is its scheduler, which batches inference requests for joint execution, particularly on GPUs, with configurable latency controls. This approach minimizes latency and maximizes throughput by optimizing hardware utilization.

Quick Start & Requirements

Install/Run: Docker is the recommended installation method.

docker pull tensorflow/serving
docker run -t --rm -p 8501:8501 \
    -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
    -e MODEL_NAME=half_plus_two \
    tensorflow/serving &

Prerequisites: Docker, SavedModel format for TensorFlow models.
Resources: Minimal setup time for Docker. Serving models requires sufficient compute resources (CPU/GPU) depending on model complexity.
Docs: Official TensorFlow Serving Documentation

Highlighted Details

Serves multiple models or versions concurrently.
Supports both gRPC and RESTful HTTP inference APIs.
Enables canarying new versions and A/B testing.
Features a request batching scheduler for optimized GPU utilization.
Extensible to serve non-TensorFlow models and custom servables.

Maintenance & Community

Maintained by the TensorFlow team at Google. Contribution guidelines are available.

Licensing & Compatibility

Apache 2.0 License. Compatible with commercial use and closed-source applications.

Limitations & Caveats

While extensible, serving non-TensorFlow models requires custom implementation. Performance tuning may require understanding of SavedModel warmup and signature definitions.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days