BitNet  by kyegomez

PyTorch implementation of BitNet research paper

created 1 year ago
1,866 stars

Top 23.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of BitNet, a 1-bit Transformer architecture designed for scaling Large Language Models. It offers a drop-in replacement for standard nn.Linear layers, enabling significant model compression and potential performance gains. The target audience includes researchers and engineers working on efficient LLMs and those looking to experiment with low-bit quantization techniques.

How It Works

BitNet replaces standard linear projections with a custom BitLinear module. This module quantizes weights and activations to 1-bit, using a specific process involving layer normalization, binarization, and absolute maximum quantization. This approach aims to drastically reduce memory footprint and computational cost while maintaining competitive performance, as demonstrated by the paper's findings on scaling 1-bit Transformers.

Quick Start & Requirements

  • Install: pip3 install bitnet
  • Prerequisites: PyTorch, Python. CUDA is required for optimized kernels.
  • Usage: Examples for BitLinear, BitNetTransformer, BitMGQA, BitFeedForward, BitLora, BitMamba, BitMoE, and OneBitViT are provided. Hugging Face integration and a drop-in replacement for PyTorch models are also available.
  • CUDA Kernel: An optimized CUDA GEMM kernel is available via python setup.py build_ext --inplace.

Highlighted Details

  • Implements the core BitLinear layer and a full BitNetTransformer.
  • Includes optimized BitAttention with Multi-Grouped Query Attention (MGQA).
  • Offers drop-in replacement utilities for Hugging Face models and general PyTorch models.
  • Features experimental implementations for Vision Transformers (OneBitViT), LoRA (BitLora), Mamba (BitMamba), and Mixture of Experts (BitMoE).
  • Includes an optimized CUDA GEMM kernel for low-bit operations.

Maintenance & Community

The project is actively developed, with a new iteration ("The Era of 1-bit LLMs") in progress. Community contributions are encouraged via an Agora Discord server.

Licensing & Compatibility

  • License: MIT
  • Compatibility: Suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Models using BitLinear require fine-tuning from scratch or training from scratch; simply swapping layers in a pre-trained model will not yield correct results. Some components, like BitLinear 1.5, are still in progress with known bugs. CUDA implementation for BitNet15b is a future task.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
63 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 15 hours ago
Feedback? Help us improve.