tract by sonos

Tiny, self-contained inference engine for diverse hardware and modalities

Created 9 years ago

2,987 stars

Top 15.5% on SourcePulse

View on GitHub

7 Experts Love This Project

Alex Cheema

Cofounder of EXO Labs

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Luca Antiga

CTO of Lightning AI

Jason Knight

Director AI Compilers at NVIDIA; Cofounder of OctoML

and 3 more!

Project Summary

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference. Sonos Tract is a high-performance, minimal inference engine designed for deploying neural networks across diverse hardware and environments. It targets engineers and researchers needing efficient, small-footprint model execution, offering a "translate-once, ship-tiny-runtime" solution for applications ranging from embedded systems to web browsers.

How It Works

Tract loads models from ONNX, NNEF, and TensorFlow Lite formats, optimizing them via its NNEF-based intermediate representation (tract-OPL). This approach allows for cross-platform compatibility and significant runtime optimization. A key feature is "pulsification," enabling models designed for sequence processing to efficiently handle fixed-size inputs for low-latency streaming inference, crucial for real-time applications like wake-word detection.

Quick Start & Requirements

Installation is straightforward via pip install tract for Python users. The Rust API is also available. While tract supports various backends including CPU (x86, ARM), Apple Metal, NVIDIA CUDA, and WebAssembly, specific hardware is only required if targeting those respective backends. Official documentation is available at sonos.github.io/tract.

Highlighted Details

Multi-Platform Backends: Supports CPU (x86, ARMv6/7/8, ARM SVE), Apple Metal GPUs, NVIDIA CUDA GPUs, and WebAssembly for browser/WASI deployment.
Streaming & Pulsification: First-class support for real-time, low-latency inference on sequence models by processing fixed-size "pulses."
Format Versatility: Imports ONNX, NNEF (with tract-OPL extensions), and legacy TensorFlow Lite/TF1 frozen graphs. PyTorch models can be converted via torch-to-nnef.
Optimized Runtime: Utilizes tract-OPL to minimize runtime footprint by excluding unnecessary framework components.

Maintenance & Community

The project is used in production at Sonos. Specific details regarding community channels (e.g., Discord/Slack), active contributors, or a public roadmap are not detailed in the README.

Licensing & Compatibility

Original work is dual-licensed under Apache License 2.0 or MIT. Note that files originating from TensorFlow and ONNX projects may be subject to their respective licenses. The permissive licenses generally allow for commercial use and integration into closed-source projects.

Limitations & Caveats

TensorFlow 2 models require conversion to ONNX before use. Support for TensorFlow Lite and TensorFlow 1 is marked as legacy. Internal crates are considered unstable APIs. While tract-OPL extensions aim for stability within minor versions (0.x.y to 0.x.z), applications may need to manage version compatibility.

Health Check

Last Commit

14 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

48 stars in the last 30 days