JetStream  by AI-Hypercomputer

LLM inference engine optimized for throughput and memory on XLA devices

Created 1 year ago
375 stars

Top 75.7% on SourcePulse

GitHubView on GitHub
Project Summary

JetStream is a high-throughput, memory-optimized inference engine for Large Language Models (LLMs) targeting XLA devices, with initial support for TPUs and future expansion to GPUs. It aims to provide efficient LLM serving for researchers and developers working with large models on specialized hardware.

How It Works

JetStream offers two distinct engine implementations: one built on JAX (leveraging MaxText) and another for PyTorch. This dual approach allows users to choose the framework that best suits their existing model development pipeline. The engine is designed for throughput and memory efficiency, crucial for deploying large LLMs.

Quick Start & Requirements

Highlighted Details

  • Optimized for throughput and memory efficiency on XLA devices.
  • Supports both JAX (via MaxText) and PyTorch model engines.
  • Includes tools for server observability and profiling.
  • Provides a mock server for local testing and development.

Maintenance & Community

The project is hosted by AI-Hypercomputer, a Google entity. Specific community channels or active contributor details are not provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Current support is primarily focused on TPUs, with GPU support listed as a future goal and open to contributions. The setup guide points to Cloud TPU VMs, suggesting limited out-of-the-box local development experience without specific hardware.

Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.3%
15k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.