Discover and explore top open-source AI tools and projects—updated daily.
tile-aiUltra-low-latency LLM inference runtime
New!
Top 97.5% on SourcePulse
Summary
TileRT addresses the challenge of ultra-low-latency inference for Large Language Models (LLMs), targeting applications demanding extreme responsiveness like interactive AI and real-time decision-making. It offers a novel tile-based runtime engine designed to push LLM latency boundaries, enabling large models to achieve millisecond-level response times without compromising quality.
How It Works
The core innovation is a tile-level runtime engine employing a compiler-driven approach. LLM operators are decomposed into fine-grained tile-level tasks. The runtime then meticulously reschedules compute, I/O, and communication across multiple devices in a highly overlapped manner. This strategy minimizes hardware idle time and maximizes utilization, leading to significant latency reductions.
Quick Start & Requirements
Installation is recommended via Docker (tileai/tilert:v0.1.0). Prerequisites include:
Tile-AI/DeepSeek-V3.2-Exp-TileRT) must be downloaded from HuggingFace.Highlighted Details
Maintenance & Community
TileRT is presented as an experimental, continuously evolving project with a preview release. Future updates are anticipated to enhance performance and expand support. Specific community channels (e.g., Discord, Slack) or a public roadmap are not detailed in the README.
Licensing & Compatibility
The README does not specify a software license. This omission requires clarification for any adoption decision, particularly regarding commercial use or integration into closed-source projects. Compatibility is currently limited to specific Linux environments with high-end NVIDIA hardware and CUDA versions.
Limitations & Caveats
As an experimental preview, TileRT has limitations. The current build is restricted to an 8-GPU B200 setup. Users must adhere to specific hardware, OS, and CUDA version requirements. Installation within the provided Docker image is strongly advised for stability.
6 days ago
Inactive
cfregly
ByteDance-Seed
ztxz16
FMInference
ai-dynamo