Discover and explore top open-source AI tools and projects—updated daily.
vllm-projectUnified LLM inference on TPUs
Top 91.6% on SourcePulse
Summary
vLLM TPU offers a unified backend (tpu-inference) for accelerating large language model inference on Google TPUs. It targets developers and researchers using JAX or PyTorch, enabling cost reduction and performance gains by allowing PyTorch models to run on TPUs with minimal code changes, while also enhancing native JAX support. The project aims to maximize open-source TPU hardware utilization.
How It Works
A novel hardware plugin unifies JAX and PyTorch under a single lowering path within vLLM. This design allows PyTorch model definitions to execute performantly on TPUs with few to no code modifications, while extending native JAX support. The unified backend simplifies deployment and optimizes hardware efficiency by abstracting framework complexities into a common inference engine.
Quick Start & Requirements
Begin by following the project's quickstart guide. Recommended compatible TPU generations include v7x, v5e, and v6e, with experimental support for v3, v4, and v5p. Consult the official documentation for detailed setup. Links to the quickstart guide and full documentation are provided.
Highlighted Details
Maintenance & Community
Community contributions are encouraged via GitHub Issues (tag good first issue for newcomers). Technical questions and feature requests should be filed as GitHub Issues. User discussions occur on the vLLM Forum's TPU support topic. Development coordination uses the Developer Slack (#sig-tpu). Collaborations can be arranged via vllm-tpu@google.com.
Licensing & Compatibility
The specific open-source license is not explicitly stated in the README. The project supports PyTorch models on TPUs without code changes and offers native JAX support.
Limitations & Caveats
The project acknowledges that core components are still under development. Users are directed to a specific page for validated models and features, indicating that broader support is evolving. Experimental support is available for older TPU generations.
1 day ago
Inactive
salesforce
AI-Hypercomputer
sgl-project
b4rtaz
OpenBMB
ai-dynamo