PyTorch implementation of BitNet research paper
Top 23.8% on sourcepulse
This repository provides a PyTorch implementation of BitNet, a 1-bit Transformer architecture designed for scaling Large Language Models. It offers a drop-in replacement for standard nn.Linear
layers, enabling significant model compression and potential performance gains. The target audience includes researchers and engineers working on efficient LLMs and those looking to experiment with low-bit quantization techniques.
How It Works
BitNet replaces standard linear projections with a custom BitLinear
module. This module quantizes weights and activations to 1-bit, using a specific process involving layer normalization, binarization, and absolute maximum quantization. This approach aims to drastically reduce memory footprint and computational cost while maintaining competitive performance, as demonstrated by the paper's findings on scaling 1-bit Transformers.
Quick Start & Requirements
pip3 install bitnet
BitLinear
, BitNetTransformer
, BitMGQA
, BitFeedForward
, BitLora
, BitMamba
, BitMoE
, and OneBitViT
are provided. Hugging Face integration and a drop-in replacement for PyTorch models are also available.python setup.py build_ext --inplace
.Highlighted Details
BitLinear
layer and a full BitNetTransformer
.BitAttention
with Multi-Grouped Query Attention (MGQA).OneBitViT
), LoRA (BitLora
), Mamba (BitMamba
), and Mixture of Experts (BitMoE
).Maintenance & Community
The project is actively developed, with a new iteration ("The Era of 1-bit LLMs") in progress. Community contributions are encouraged via an Agora Discord server.
Licensing & Compatibility
Limitations & Caveats
Models using BitLinear
require fine-tuning from scratch or training from scratch; simply swapping layers in a pre-trained model will not yield correct results. Some components, like BitLinear 1.5, are still in progress with known bugs. CUDA implementation for BitNet15b is a future task.
1 week ago
1 day