bitsandbytes  by bitsandbytes-foundation

PyTorch library for k-bit quantization, enabling accessible LLMs

Created 4 years ago
7,714 stars

Top 6.7% on SourcePulse

GitHubView on GitHub
Project Summary

bitsandbytes provides efficient k-bit quantization for large language models in PyTorch, enabling accessible deployment on consumer hardware. It targets researchers and developers working with LLMs who need to reduce memory footprint and improve inference speed.

How It Works

The library wraps custom CUDA functions for 8-bit optimizers, matrix multiplication (LLM.int8()), and 8- & 4-bit quantization. It offers bitsandbytes.nn.Linear8bitLt and bitsandbytes.nn.Linear4bit for quantization-aware layers and bitsandbytes.optim for 8-bit optimizers, reducing memory usage and potentially speeding up computations.

Quick Start & Requirements

Highlighted Details

  • Enables 8-bit and 4-bit quantization for LLMs.
  • Includes 8-bit optimizers.
  • Supports LLM.int8() matrix multiplication.

Maintenance & Community

  • Ongoing efforts to support Intel CPU+GPU, AMD GPU, Apple Silicon, and NPUs.
  • Official documentation hosted on Hugging Face.

Licensing & Compatibility

  • MIT licensed.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

The library primarily targets NVIDIA GPUs with CUDA. Support for other hardware backends is under development and may not be production-ready.

Health Check
Last Commit

15 hours ago

Responsiveness

1 week

Pull Requests (30d)
9
Issues (30d)
9
Star History
97 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI) and Jeremy Howard Jeremy Howard(Cofounder of fast.ai).

QuaRot by spcl

1.6%
442
Code for a NeurIPS 2024 research paper on LLM quantization
Created 1 year ago
Updated 11 months ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.2%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.