JetStream by AI-Hypercomputer

LLM inference engine optimized for throughput and memory on XLA devices

Created 1 year ago

398 stars

Top 72.6% on SourcePulse

View on GitHub

3 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Roy Frostig

Coauthor of JAX; Research Scientist at Google DeepMind

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

JetStream is a high-throughput, memory-optimized inference engine for Large Language Models (LLMs) targeting XLA devices, with initial support for TPUs and future expansion to GPUs. It aims to provide efficient LLM serving for researchers and developers working with large models on specialized hardware.

How It Works

JetStream offers two distinct engine implementations: one built on JAX (leveraging MaxText) and another for PyTorch. This dual approach allows users to choose the framework that best suits their existing model development pipeline. The engine is designed for throughput and memory efficiency, crucial for deploying large LLMs.

Quick Start & Requirements

Install: make install-deps (followed by running server/testing commands).
Prerequisites: Cloud TPU VM (v5e mentioned), XLA devices.
Resources: Setup involves installing dependencies and running Python modules. Specific resource requirements are not detailed but imply a cloud TPU environment.
Docs:
- MaxText Engine: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md
- PyTorch Engine: https://github.com/google/jetstream-pytorch/blob/main/README.md
- Gemma Serving: https://github.com/google/JetStream/blob/main/docs/serving-gemma-on-gke-with-jetstream.md

Highlighted Details

Optimized for throughput and memory efficiency on XLA devices.
Supports both JAX (via MaxText) and PyTorch model engines.
Includes tools for server observability and profiling.
Provides a mock server for local testing and development.

Maintenance & Community

The project is hosted by AI-Hypercomputer, a Google entity. Specific community channels or active contributor details are not provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Current support is primarily focused on TPUs, with GPU support listed as a future goal and open to contributions. The setup guide points to Cloud TPU VMs, suggesting limited out-of-the-box local development experience without specific hardware.

Health Check

Last Commit

6 days ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days