xla  by pytorch

PyTorch on XLA devices

created 6 years ago
2,648 stars

Top 18.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides PyTorch/XLA, a Python package enabling PyTorch to run on XLA-accelerated hardware, primarily Google Cloud TPUs and NVIDIA GPUs. It targets researchers and engineers looking to leverage specialized hardware for faster deep learning model training and inference, offering significant performance gains over standard CPU or GPU setups.

How It Works

PyTorch/XLA integrates PyTorch with the XLA (Accelerated Linear Algebra) compiler. XLA optimizes PyTorch operations into efficient kernels for specific hardware backends. The library supports various execution modes, including single-process, multi-process, and SPMD (Single Program, Multiple Data), allowing flexible scaling across multiple accelerators. It employs lazy tensor evaluation and asynchronous execution to maximize hardware utilization.

Quick Start & Requirements

  • Installation: Use pip install torch==<version> 'torch_xla[tpu]==<version>' for stable builds on TPU VMs. Nightly builds and specific CUDA versions require direct wheel installation from provided GCS URLs.
  • Prerequisites: Google Cloud TPU VM or compatible GPU environment. Specific CUDA versions (e.g., 12.1, 12.6) are required for GPU builds. Python 3.8-3.11 are supported depending on the release.
  • Resources: Requires access to TPU or GPU hardware. Setup involves installing PyTorch and PyTorch/XLA wheels.
  • Documentation: PyTorch/XLA Docs

Highlighted Details

  • Offers C++11 ABI builds for improved lazy tensor tracing performance, showing up to 39% MFU on Mixtral 8x7B compared to 33% for pre-C++11 ABI.
  • Supports distributed training paradigms like DistributedDataParallel (DDP) and FullyShardedDataParallel (FSDP).
  • Provides comprehensive documentation on performance tuning, distributed execution, and specific features like Pallas and Triton integration.
  • Includes reference implementations for large models in the AI-Hypercomputer/tpu-recipes repository.

Maintenance & Community

Jointly operated by Google and Meta, with contributions from individual developers. Feedback and bug reports are encouraged via GitHub issues.

Licensing & Compatibility

The repository is open-source, with licensing details not explicitly stated in the README but generally aligned with PyTorch's permissive licensing for commercial use.

Limitations & Caveats

The README notes that as of release 2.7, only C++11 ABI builds are provided, which may impact compatibility with older pre-C++11 ABI setups. Specific Python and CUDA version compatibility must be carefully checked when selecting wheels or Docker images.

Health Check
Last commit

23 hours ago

Responsiveness

1 day

Pull Requests (30d)
74
Issues (30d)
18
Star History
63 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 18 hours ago
Starred by Peter Norvig Peter Norvig(Author of Artificial Intelligence: A Modern Approach; Research Director at Google), Didier Lopes Didier Lopes(Founder of OpenBB), and
15 more.

llm.c by karpathy

0.2%
27k
LLM training in pure C/CUDA, no PyTorch needed
created 1 year ago
updated 1 month ago
Feedback? Help us improve.