xla  by pytorch

PyTorch on XLA devices

Created 6 years ago
2,677 stars

Top 17.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides PyTorch/XLA, a Python package enabling PyTorch to run on XLA-accelerated hardware, primarily Google Cloud TPUs and NVIDIA GPUs. It targets researchers and engineers looking to leverage specialized hardware for faster deep learning model training and inference, offering significant performance gains over standard CPU or GPU setups.

How It Works

PyTorch/XLA integrates PyTorch with the XLA (Accelerated Linear Algebra) compiler. XLA optimizes PyTorch operations into efficient kernels for specific hardware backends. The library supports various execution modes, including single-process, multi-process, and SPMD (Single Program, Multiple Data), allowing flexible scaling across multiple accelerators. It employs lazy tensor evaluation and asynchronous execution to maximize hardware utilization.

Quick Start & Requirements

  • Installation: Use pip install torch==<version> 'torch_xla[tpu]==<version>' for stable builds on TPU VMs. Nightly builds and specific CUDA versions require direct wheel installation from provided GCS URLs.
  • Prerequisites: Google Cloud TPU VM or compatible GPU environment. Specific CUDA versions (e.g., 12.1, 12.6) are required for GPU builds. Python 3.8-3.11 are supported depending on the release.
  • Resources: Requires access to TPU or GPU hardware. Setup involves installing PyTorch and PyTorch/XLA wheels.
  • Documentation: PyTorch/XLA Docs

Highlighted Details

  • Offers C++11 ABI builds for improved lazy tensor tracing performance, showing up to 39% MFU on Mixtral 8x7B compared to 33% for pre-C++11 ABI.
  • Supports distributed training paradigms like DistributedDataParallel (DDP) and FullyShardedDataParallel (FSDP).
  • Provides comprehensive documentation on performance tuning, distributed execution, and specific features like Pallas and Triton integration.
  • Includes reference implementations for large models in the AI-Hypercomputer/tpu-recipes repository.

Maintenance & Community

Jointly operated by Google and Meta, with contributions from individual developers. Feedback and bug reports are encouraged via GitHub issues.

Licensing & Compatibility

The repository is open-source, with licensing details not explicitly stated in the README but generally aligned with PyTorch's permissive licensing for commercial use.

Limitations & Caveats

The README notes that as of release 2.7, only C++11 ABI builds are provided, which may impact compatibility with older pre-C++11 ABI setups. Specific Python and CUDA version compatibility must be carefully checked when selecting wheels or Docker images.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
73
Issues (30d)
18
Star History
27 stars in the last 30 days

Explore Similar Projects

Starred by Chris Lattner Chris Lattner(Author of LLVM, Clang, Swift, Mojo, MLIR; Cofounder of Modular), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
18 more.

open-infra-index by deepseek-ai

0.1%
8k
AI infrastructure tools for efficient AGI development
Created 6 months ago
Updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 1 week ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
7 more.

TransformerEngine by NVIDIA

0.4%
3k
Library for Transformer model acceleration on NVIDIA GPUs
Created 3 years ago
Updated 19 hours ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.3%
9k
PyTorch training helper for distributed execution
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.