JetStream  by AI-Hypercomputer

LLM inference engine optimized for throughput and memory on XLA devices

created 1 year ago
364 stars

Top 78.4% on sourcepulse

GitHubView on GitHub
Project Summary

JetStream is a high-throughput, memory-optimized inference engine for Large Language Models (LLMs) targeting XLA devices, with initial support for TPUs and future expansion to GPUs. It aims to provide efficient LLM serving for researchers and developers working with large models on specialized hardware.

How It Works

JetStream offers two distinct engine implementations: one built on JAX (leveraging MaxText) and another for PyTorch. This dual approach allows users to choose the framework that best suits their existing model development pipeline. The engine is designed for throughput and memory efficiency, crucial for deploying large LLMs.

Quick Start & Requirements

Highlighted Details

  • Optimized for throughput and memory efficiency on XLA devices.
  • Supports both JAX (via MaxText) and PyTorch model engines.
  • Includes tools for server observability and profiling.
  • Provides a mock server for local testing and development.

Maintenance & Community

The project is hosted by AI-Hypercomputer, a Google entity. Specific community channels or active contributor details are not provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Current support is primarily focused on TPUs, with GPU support listed as a future goal and open to contributions. The setup guide points to Cloud TPU VMs, suggesting limited out-of-the-box local development experience without specific hardware.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
43 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
2 more.

LightLLM by ModelTC

0.7%
3k
Python framework for LLM inference and serving
created 2 years ago
updated 10 hours ago
Feedback? Help us improve.