Discover and explore top open-source AI tools and projects—updated daily.
BeomiHuggingFace Transformers implementation of BitNet scaling for LLMs
Top 86.9% on SourcePulse
This repository provides a PyTorch implementation of BitNet, a 1-bit transformer architecture for large language models, integrated with the Hugging Face Transformers library. It targets researchers and engineers looking to explore significant memory savings and potential performance gains in LLMs by quantizing weights to 1-bit.
How It Works
The core innovation is the BitLinear layer, which replaces standard linear layers in the Llama 2 architecture. This layer quantizes weights to 1-bit, drastically reducing memory footprint while employing a mixed-precision approach for activations and other parameters to maintain performance. The implementation integrates directly into Hugging Face's Llama model by patching the modeling_llama.py file.
Quick Start & Requirements
pip install -r clm_requirements.txt), clone the Hugging Face Transformers repo, and install it in editable mode (pip install -e transformers). Then, replace the original Llama modeling file with the BitNet version.Highlighted Details
uint8 and custom CUDA kernels for 1-bit weights.Maintenance & Community
The project is maintained by Beomi. There are no explicit links to community channels or roadmaps provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility with commercial or closed-source projects is not specified.
Limitations & Caveats
The implementation is still under active development, with several planned updates including full 1-bit weight usage and custom CUDA kernels. The current version uses a mixed-precision approach for weights.
1 year ago
Inactive
Vahe1994
facebookresearch
b4rtaz
lyogavin