batch_invariant_ops by thinking-machines-lab

Enhance LLM inference determinism

Created 4 months ago

946 stars

Top 38.7% on SourcePulse

View on GitHub

6 Experts Love This Project

Edward Sun

Research Scientist at Meta Superintelligence Lab

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Eric Zhang

Founding Engineer at Modal

Will Brown

Research Lead at Prime Intellect

and 2 more!

Project Summary

Batch Invariant Ops addresses non-determinism in LLM inference, particularly within PyTorch. It provides batch-invariant kernels to ensure consistent outputs across different batch sizes, benefiting researchers and engineers seeking reproducible machine learning results. The library offers a low-overhead, non-intrusive method to enhance the determinism of existing PyTorch models.

How It Works

The library leverages PyTorch's torch.Library mechanism to substitute standard PyTorch kernels with custom, batch-invariant implementations. This approach allows for seamless integration into existing PyTorch workflows, requiring minimal code modifications. By replacing operations like matrix multiplication and softmax with deterministic variants, it eliminates sources of numerical instability that can arise from varying batch processing orders or internal optimizations.

Quick Start & Requirements

Primary install / run command: pip install -e .
Non-default prerequisites and dependencies: CUDA-enabled GPU (implied by torch.set_default_device('cuda') in examples), PyTorch.
Links: Companion blog post: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

Highlighted Details

Supports key operations: torch.mm, torch.addmm, torch.log_softmax, torch.mean.
Demonstrates deterministic vLLM inference as a proof-of-concept, reducing unique samples from 18 to 1 out of 1000 trials.
Provides a testing utility to verify batch-invariance of operations.

Maintenance & Community

No specific details on maintainers, community channels (Discord/Slack), or roadmap are provided in the README.

Licensing & Compatibility

The README does not specify the license type or compatibility notes for commercial use.

Limitations & Caveats

The library currently supports a limited set of PyTorch operations. Its effectiveness and integration may depend on the specific model architecture and PyTorch version used. The vLLM example requires an upstream PR for full integration.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days