tt-forge  by tenstorrent

Tenstorrent's MLIR compiler stack for AI hardware

Created 1 year ago
274 stars

Top 94.3% on SourcePulse

GitHubView on GitHub
Project Summary

AI developers can leverage Tenstorrent's TT-Forge to run and train AI workloads on Tenstorrent hardware through an open-source, MLIR-based compiler stack. It aims to provide a general and performant solution, simplifying the deployment of complex models from frameworks like PyTorch, JAX, and ONNX across various Tenstorrent hardware configurations.

How It Works

TT-Forge integrates multiple components: frontends (TT-XLA for PyTorch/JAX, TT-Forge-ONNX for ONNX/TF/Paddle) convert models into MLIR dialects (StableHLO, TTIR). The core TT-MLIR compiler optimizes these graphs, lowering them to TTNN and TTKernel dialects, which are then executed by the TT-Metalium runtime on Tenstorrent hardware. TT-Lang offers a Python DSL for developing custom, high-performance kernels, abstracting low-level hardware complexities.

Quick Start & Requirements

Installation requires using Tenstorrent's private PyPI index: pip install tt-forge --extra-index-url https://pypi.eng.aws.tenstorrent.com/. The setup guide specifies Ubuntu 24.04 and Python 3.12. Additional dependencies like torchvision may be needed for specific examples. Official documentation and hardware details are available.

Highlighted Details

  • Supports over 800 model variants tested in CI, including large models like Llama 3 70B and Stable Diffusion XL.
  • Encompasses both inference and training capabilities.
  • Offers multi-chip support for specific hardware configurations (e.g., N300+).
  • TT-Lang provides a Python-based approach for custom kernel development.

Maintenance & Community

Community support is available via Discord. Tenstorrent also runs a bounty program for contributions, with details available in the issues tab.

Licensing & Compatibility

The repository's README does not explicitly state a software license. This absence requires clarification for adoption decisions, particularly regarding commercial use or derivative works.

Limitations & Caveats

The TT-Lang Python DSL for custom kernel development is currently in an "early preview" state. Installation relies on a custom PyPI index, which may indicate a less mature or publicly available distribution channel.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
12
Issues (30d)
19
Star History
53 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

0.8%
2k
Framework for scaling multimodal model training across accelerators
Created 1 year ago
Updated 16 hours ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
7 more.

executorch by pytorch

0.2%
5k
On-device AI framework for PyTorch inference and training
Created 4 years ago
Updated 15 hours ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.1%
10k
PyTorch training helper for distributed execution
Created 5 years ago
Updated 1 day ago
Feedback? Help us improve.