bitsandbytes  by bitsandbytes-foundation

PyTorch library for k-bit quantization, enabling accessible LLMs

created 4 years ago
7,416 stars

Top 7.1% on sourcepulse

GitHubView on GitHub
Project Summary

bitsandbytes provides efficient k-bit quantization for large language models in PyTorch, enabling accessible deployment on consumer hardware. It targets researchers and developers working with LLMs who need to reduce memory footprint and improve inference speed.

How It Works

The library wraps custom CUDA functions for 8-bit optimizers, matrix multiplication (LLM.int8()), and 8- & 4-bit quantization. It offers bitsandbytes.nn.Linear8bitLt and bitsandbytes.nn.Linear4bit for quantization-aware layers and bitsandbytes.optim for 8-bit optimizers, reducing memory usage and potentially speeding up computations.

Quick Start & Requirements

Highlighted Details

  • Enables 8-bit and 4-bit quantization for LLMs.
  • Includes 8-bit optimizers.
  • Supports LLM.int8() matrix multiplication.

Maintenance & Community

  • Ongoing efforts to support Intel CPU+GPU, AMD GPU, Apple Silicon, and NPUs.
  • Official documentation hosted on Hugging Face.

Licensing & Compatibility

  • MIT licensed.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

The library primarily targets NVIDIA GPUs with CUDA. Support for other hardware backends is under development and may not be production-ready.

Health Check
Last commit

3 days ago

Responsiveness

1+ week

Pull Requests (30d)
13
Issues (30d)
11
Star History
466 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 11 hours ago
Feedback? Help us improve.