optimum-neuron by huggingface

Transformers training and inference acceleration for AWS accelerators

Created 3 years ago

260 stars

Top 97.6% on SourcePulse

View on GitHub

3 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Luis Capelo

Cofounder of Lightning AI

Philipp Schmid

DevRel at Google DeepMind

Project Summary

Summary

Optimum Neuron bridges the 🤗 Transformers library with AWS's Trainium and Inferentia accelerators, enabling efficient training and inference. It targets developers seeking to leverage specialized AWS hardware, offering a drop-in replacement for standard Transformers components to minimize code modifications and accelerate model deployment on these platforms.

How It Works

The project provides optimized versions of 🤗 Transformers models and training utilities, such as NeuronModelForCausalLM and NeuronSFTTrainer. It integrates directly with AWS Neuron SDK, allowing users to compile models for specific accelerators and utilize features like tensor parallelism for distributed training. This approach abstracts away much of the complexity of hardware-specific optimization, making AWS accelerators more accessible to the existing Transformers ecosystem.

Quick Start & Requirements

Installation is typically done via pip: pip install --upgrade-strategy eager optimum-neuron[neuronx] for AWS Trainium/Inferentia2. Additional components for training or vLLM inference can be installed with [training] or [vllm] extras, respectively. Users must install the Neuron driver and tools separately before installing optimum-neuron; an extensive guide for this is referenced. The project primarily targets PyTorch and requires access to AWS Trainium or Inferentia hardware. Further guides for compilation options and advanced usage are also referenced.

Highlighted Details

Offers drop-in replacements for standard Transformers training and inference classes.
Supports distributed training with minimal code changes, including tensor parallelism.
Provides optimized models compiled for AWS Trainium and Inferentia accelerators.
Enables production-ready inference through model compilation with static shapes, supporting bf16 precision and flash_attention_2.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, or community channels (e.g., Discord, Slack). It directs users to open issues or pull requests for support.

Licensing & Compatibility

The license type is not explicitly stated in the provided README. This omission requires further investigation for commercial use or closed-source integration.

Limitations & Caveats

Adoption is strictly limited to users with access to AWS Trainium or Inferentia hardware. The separate installation of Neuron drivers and tools adds an initial setup step. Inference compilation requires defining static shapes (batch size, sequence length), which may necessitate recompilation for varying inference parameters.

Health Check

Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days