tpu-inference  by vllm-project

Unified LLM inference on TPUs

Created 11 months ago
286 stars

Top 91.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

vLLM TPU offers a unified backend (tpu-inference) for accelerating large language model inference on Google TPUs. It targets developers and researchers using JAX or PyTorch, enabling cost reduction and performance gains by allowing PyTorch models to run on TPUs with minimal code changes, while also enhancing native JAX support. The project aims to maximize open-source TPU hardware utilization.

How It Works

A novel hardware plugin unifies JAX and PyTorch under a single lowering path within vLLM. This design allows PyTorch model definitions to execute performantly on TPUs with few to no code modifications, while extending native JAX support. The unified backend simplifies deployment and optimizes hardware efficiency by abstracting framework complexities into a common inference engine.

Quick Start & Requirements

Begin by following the project's quickstart guide. Recommended compatible TPU generations include v7x, v5e, and v6e, with experimental support for v3, v4, and v5p. Consult the official documentation for detailed setup. Links to the quickstart guide and full documentation are provided.

Highlighted Details

  • Unified backend for JAX and PyTorch on TPUs.
  • Focus on maximizing open-source TPU hardware performance for LLM inference.
  • Maintains a "Recommended Models and Features" page for validated configurations.
  • Provides specific "Recipes" for v7x (Ironwood) and v6e (Trillium) TPUs.

Maintenance & Community

Community contributions are encouraged via GitHub Issues (tag good first issue for newcomers). Technical questions and feature requests should be filed as GitHub Issues. User discussions occur on the vLLM Forum's TPU support topic. Development coordination uses the Developer Slack (#sig-tpu). Collaborations can be arranged via vllm-tpu@google.com.

Licensing & Compatibility

The specific open-source license is not explicitly stated in the README. The project supports PyTorch models on TPUs without code changes and offers native JAX support.

Limitations & Caveats

The project acknowledges that core components are still under development. Users are directed to a specific page for validated models and features, indicating that broader support is evolving. Experimental support is available for older TPU generations.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
299
Issues (30d)
17
Star History
31 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

jaxformer by salesforce

0%
299
JAX library for LLM training on TPUs
Created 3 years ago
Updated 2 years ago
Starred by Matthew Johnson Matthew Johnson(Coauthor of JAX; Research Scientist at Google Brain), Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), and
3 more.

sglang-jax by sgl-project

1.5%
264
High-performance LLM inference engine for JAX/TPU serving
Created 8 months ago
Updated 1 day ago
Starred by Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

MiniCPM by OpenBMB

0.3%
9k
Ultra-efficient LLMs for end devices, achieving 5x+ speedup
Created 2 years ago
Updated 1 month ago
Feedback? Help us improve.