tpu-inference by vllm-project

Unified LLM inference on TPUs

Created 1 year ago

382 stars

Top 74.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

Summary

vLLM TPU offers a unified backend (tpu-inference) for accelerating large language model inference on Google TPUs. It targets developers and researchers using JAX or PyTorch, enabling cost reduction and performance gains by allowing PyTorch models to run on TPUs with minimal code changes, while also enhancing native JAX support. The project aims to maximize open-source TPU hardware utilization.

How It Works

A novel hardware plugin unifies JAX and PyTorch under a single lowering path within vLLM. This design allows PyTorch model definitions to execute performantly on TPUs with few to no code modifications, while extending native JAX support. The unified backend simplifies deployment and optimizes hardware efficiency by abstracting framework complexities into a common inference engine.

Quick Start & Requirements

Begin by following the project's quickstart guide. Recommended compatible TPU generations include v7x, v5e, and v6e, with experimental support for v3, v4, and v5p. Consult the official documentation for detailed setup. Links to the quickstart guide and full documentation are provided.

Highlighted Details

Unified backend for JAX and PyTorch on TPUs.
Focus on maximizing open-source TPU hardware performance for LLM inference.
Maintains a "Recommended Models and Features" page for validated configurations.
Provides specific "Recipes" for v7x (Ironwood) and v6e (Trillium) TPUs.

Maintenance & Community

Community contributions are encouraged via GitHub Issues (tag good first issue for newcomers). Technical questions and feature requests should be filed as GitHub Issues. User discussions occur on the vLLM Forum's TPU support topic. Development coordination uses the Developer Slack (#sig-tpu). Collaborations can be arranged via vllm-tpu@google.com.

Licensing & Compatibility

The specific open-source license is not explicitly stated in the README. The project supports PyTorch models on TPUs without code changes and offers native JAX support.

Limitations & Caveats

The project acknowledges that core components are still under development. Users are directed to a specific page for validated models and features, indicating that broader support is evolving. Experimental support is available for older TPU generations.

Health Check

Last Commit

13 hours ago

Responsiveness

Inactive

Pull Requests (30d)

273

Issues (30d)

Star History

33 stars in the last 30 days