tract  by sonos

Tiny, self-contained inference engine for diverse hardware and modalities

Created 8 years ago
2,922 stars

Top 15.9% on SourcePulse

GitHubView on GitHub
Project Summary

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference. Sonos Tract is a high-performance, minimal inference engine designed for deploying neural networks across diverse hardware and environments. It targets engineers and researchers needing efficient, small-footprint model execution, offering a "translate-once, ship-tiny-runtime" solution for applications ranging from embedded systems to web browsers.

How It Works

Tract loads models from ONNX, NNEF, and TensorFlow Lite formats, optimizing them via its NNEF-based intermediate representation (tract-OPL). This approach allows for cross-platform compatibility and significant runtime optimization. A key feature is "pulsification," enabling models designed for sequence processing to efficiently handle fixed-size inputs for low-latency streaming inference, crucial for real-time applications like wake-word detection.

Quick Start & Requirements

Installation is straightforward via pip install tract for Python users. The Rust API is also available. While tract supports various backends including CPU (x86, ARM), Apple Metal, NVIDIA CUDA, and WebAssembly, specific hardware is only required if targeting those respective backends. Official documentation is available at sonos.github.io/tract.

Highlighted Details

  • Multi-Platform Backends: Supports CPU (x86, ARMv6/7/8, ARM SVE), Apple Metal GPUs, NVIDIA CUDA GPUs, and WebAssembly for browser/WASI deployment.
  • Streaming & Pulsification: First-class support for real-time, low-latency inference on sequence models by processing fixed-size "pulses."
  • Format Versatility: Imports ONNX, NNEF (with tract-OPL extensions), and legacy TensorFlow Lite/TF1 frozen graphs. PyTorch models can be converted via torch-to-nnef.
  • Optimized Runtime: Utilizes tract-OPL to minimize runtime footprint by excluding unnecessary framework components.

Maintenance & Community

The project is used in production at Sonos. Specific details regarding community channels (e.g., Discord/Slack), active contributors, or a public roadmap are not detailed in the README.

Licensing & Compatibility

Original work is dual-licensed under Apache License 2.0 or MIT. Note that files originating from TensorFlow and ONNX projects may be subject to their respective licenses. The permissive licenses generally allow for commercial use and integration into closed-source projects.

Limitations & Caveats

TensorFlow 2 models require conversion to ONNX before use. Support for TensorFlow Lite and TensorFlow 1 is marked as legacy. Internal crates are considered unstable APIs. While tract-OPL extensions aim for stability within minor versions (0.x.y to 0.x.z), applications may need to manage version compatibility.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
113
Issues (30d)
9
Star History
50 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
1 more.

ArcticInference by snowflakedb

0.9%
438
vLLM plugin for high-throughput, low-latency LLM and embedding inference
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.