batch_invariant_ops  by thinking-machines-lab

Enhance LLM inference determinism

Created 1 week ago

New!

636 stars

Top 52.1% on SourcePulse

GitHubView on GitHub
Project Summary

Batch Invariant Ops addresses non-determinism in LLM inference, particularly within PyTorch. It provides batch-invariant kernels to ensure consistent outputs across different batch sizes, benefiting researchers and engineers seeking reproducible machine learning results. The library offers a low-overhead, non-intrusive method to enhance the determinism of existing PyTorch models.

How It Works

The library leverages PyTorch's torch.Library mechanism to substitute standard PyTorch kernels with custom, batch-invariant implementations. This approach allows for seamless integration into existing PyTorch workflows, requiring minimal code modifications. By replacing operations like matrix multiplication and softmax with deterministic variants, it eliminates sources of numerical instability that can arise from varying batch processing orders or internal optimizations.

Quick Start & Requirements

Highlighted Details

  • Supports key operations: torch.mm, torch.addmm, torch.log_softmax, torch.mean.
  • Demonstrates deterministic vLLM inference as a proof-of-concept, reducing unique samples from 18 to 1 out of 1000 trials.
  • Provides a testing utility to verify batch-invariance of operations.

Maintenance & Community

No specific details on maintainers, community channels (Discord/Slack), or roadmap are provided in the README.

Licensing & Compatibility

The README does not specify the license type or compatibility notes for commercial use.

Limitations & Caveats

The library currently supports a limited set of PyTorch operations. Its effectiveness and integration may depend on the specific model architecture and PyTorch version used. The vLLM example requires an upstream PR for full integration.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
639 stars in the last 8 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Ying Sheng Ying Sheng(Coauthor of SGLang), and
2 more.

LookaheadDecoding by hao-ai-lab

0.2%
1k
Parallel decoding algorithm for faster LLM inference
Created 1 year ago
Updated 6 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.