BitNet by kyegomez

PyTorch implementation of BitNet research paper

Created 2 years ago

1,888 stars

Top 22.8% on SourcePulse

View on GitHub

4 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Alex Cheema

Cofounder of EXO Labs

Victor Taelin

Author of Bend, Kind, HVM

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Project Summary

This repository provides a PyTorch implementation of BitNet, a 1-bit Transformer architecture designed for scaling Large Language Models. It offers a drop-in replacement for standard nn.Linear layers, enabling significant model compression and potential performance gains. The target audience includes researchers and engineers working on efficient LLMs and those looking to experiment with low-bit quantization techniques.

How It Works

BitNet replaces standard linear projections with a custom BitLinear module. This module quantizes weights and activations to 1-bit, using a specific process involving layer normalization, binarization, and absolute maximum quantization. This approach aims to drastically reduce memory footprint and computational cost while maintaining competitive performance, as demonstrated by the paper's findings on scaling 1-bit Transformers.

Quick Start & Requirements

Install: pip3 install bitnet
Prerequisites: PyTorch, Python. CUDA is required for optimized kernels.
Usage: Examples for BitLinear, BitNetTransformer, BitMGQA, BitFeedForward, BitLora, BitMamba, BitMoE, and OneBitViT are provided. Hugging Face integration and a drop-in replacement for PyTorch models are also available.
CUDA Kernel: An optimized CUDA GEMM kernel is available via python setup.py build_ext --inplace.

Highlighted Details

Implements the core BitLinear layer and a full BitNetTransformer.
Includes optimized BitAttention with Multi-Grouped Query Attention (MGQA).
Offers drop-in replacement utilities for Hugging Face models and general PyTorch models.
Features experimental implementations for Vision Transformers (OneBitViT), LoRA (BitLora), Mamba (BitMamba), and Mixture of Experts (BitMoE).
Includes an optimized CUDA GEMM kernel for low-bit operations.

Maintenance & Community

The project is actively developed, with a new iteration ("The Era of 1-bit LLMs") in progress. Community contributions are encouraged via an Agora Discord server.

Licensing & Compatibility

License: MIT
Compatibility: Suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Models using BitLinear require fine-tuning from scratch or training from scratch; simply swapping layers in a pre-trained model will not yield correct results. Some components, like BitLinear 1.5, are still in progress with known bugs. CUDA implementation for BitNet15b is a future task.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days