This repository provides TT-NN, a Python and C++ neural network operator library, and TT-Metalium, a low-level kernel programming model for Tenstorrent hardware. It targets AI researchers and developers looking to optimize and deploy models on Tenstorrent's specialized AI accelerators, offering high performance for LLMs, CNNs, and other AI workloads.
How It Works
TT-NN leverages TT-Metalium, a C++ framework that allows direct programming of Tenstorrent's hardware, including its matrix and vector engines. This low-level control enables fine-grained optimization of data movement, computation, and parallelism (Tensor Parallelism and Data Parallelism) across the chip's architecture, leading to significant performance gains compared to software-only solutions.
Quick Start & Requirements
- Install: Refer to INSTALLING.md.
- Prerequisites: Tenstorrent hardware (e.g., Grayskull, Wormhole). Specific CUDA versions or Python versions are not explicitly stated as mandatory for the core library but may be required for specific demos or integrations.
- Resources: Performance benchmarks for various LLMs, CNNs, and NLP models on different Tenstorrent hardware configurations (n150, n300, QuietBox, Galaxy) are provided, indicating significant throughput.
- Links: API Reference, Model Demos, TT-Metalium API Reference.
Highlighted Details
- Achieves high performance for LLMs like Llama 3.1 70B (1443.2 t/s) and CNNs like ResNet-50 (96,800 fps) on Tenstorrent hardware.
- Supports advanced parallelism techniques including Tensor Parallelism (TP) and Data Parallelism (DP).
- Includes a comprehensive set of programming guides and tech reports for kernel development and model optimization.
- Features a bounty program to incentivize community contributions.
Maintenance & Community
- Active development with frequent releases (e.g., v0.57.0-rc71).
- Community engagement via Discord.
- Opportunities for contribution and employment are highlighted.
Licensing & Compatibility
- The repository is part of Tenstorrent's bounty program, implying open-source contributions. Specific licensing details (e.g., MIT, Apache) are not explicitly stated in the README but are typical for such projects. Compatibility for commercial use would depend on the final license.
Limitations & Caveats
- The primary limitation is the requirement for Tenstorrent-specific hardware, making it inaccessible without purchasing their accelerators. Performance figures are tied to specific hardware configurations and model versions.