PyTorch library for k-bit quantization, enabling accessible LLMs
Top 7.1% on sourcepulse
bitsandbytes provides efficient k-bit quantization for large language models in PyTorch, enabling accessible deployment on consumer hardware. It targets researchers and developers working with LLMs who need to reduce memory footprint and improve inference speed.
How It Works
The library wraps custom CUDA functions for 8-bit optimizers, matrix multiplication (LLM.int8()), and 8- & 4-bit quantization. It offers bitsandbytes.nn.Linear8bitLt
and bitsandbytes.nn.Linear4bit
for quantization-aware layers and bitsandbytes.optim
for 8-bit optimizers, reducing memory usage and potentially speeding up computations.
Quick Start & Requirements
pip install bitsandbytes
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The library primarily targets NVIDIA GPUs with CUDA. Support for other hardware backends is under development and may not be production-ready.
3 days ago
1+ week