BitNet-Transformers by Beomi

HuggingFace Transformers implementation of BitNet scaling for LLMs

Created 2 years ago

310 stars

Top 86.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides a PyTorch implementation of BitNet, a 1-bit transformer architecture for large language models, integrated with the Hugging Face Transformers library. It targets researchers and engineers looking to explore significant memory savings and potential performance gains in LLMs by quantizing weights to 1-bit.

How It Works

The core innovation is the BitLinear layer, which replaces standard linear layers in the Llama 2 architecture. This layer quantizes weights to 1-bit, drastically reducing memory footprint while employing a mixed-precision approach for activations and other parameters to maintain performance. The implementation integrates directly into Hugging Face's Llama model by patching the modeling_llama.py file.

Quick Start & Requirements

Install: Clone the repo, install requirements (pip install -r clm_requirements.txt), clone the Hugging Face Transformers repo, and install it in editable mode (pip install -e transformers). Then, replace the original Llama modeling file with the BitNet version.
Prerequisites: PyTorch, Hugging Face Transformers, Python. GPU is implied for training.
Resources: The README shows memory usage comparisons: 1-bit BitLLAMA uses ~100MB for model weights compared to ~250MB for 16-bit Llama.

Highlighted Details

Implements BitLinear layer for Llama 2 architecture.
Demonstrates memory savings: ~100MB for 1-bit BitLLAMA vs. ~250MB for 16-bit Llama.
Includes sample code for Language Model training (e.g., Wikitext-103).
Planned updates include using uint8 and custom CUDA kernels for 1-bit weights.

Maintenance & Community

The project is maintained by Beomi. There are no explicit links to community channels or roadmaps provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The implementation is still under active development, with several planned updates including full 1-bit weight usage and custom CUDA kernels. The current version uses a mixed-precision approach for weights.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days