BitNet-Transformers  by Beomi

HuggingFace Transformers implementation of BitNet scaling for LLMs

created 1 year ago
305 stars

Top 88.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of BitNet, a 1-bit transformer architecture for large language models, integrated with the Hugging Face Transformers library. It targets researchers and engineers looking to explore significant memory savings and potential performance gains in LLMs by quantizing weights to 1-bit.

How It Works

The core innovation is the BitLinear layer, which replaces standard linear layers in the Llama 2 architecture. This layer quantizes weights to 1-bit, drastically reducing memory footprint while employing a mixed-precision approach for activations and other parameters to maintain performance. The implementation integrates directly into Hugging Face's Llama model by patching the modeling_llama.py file.

Quick Start & Requirements

  • Install: Clone the repo, install requirements (pip install -r clm_requirements.txt), clone the Hugging Face Transformers repo, and install it in editable mode (pip install -e transformers). Then, replace the original Llama modeling file with the BitNet version.
  • Prerequisites: PyTorch, Hugging Face Transformers, Python. GPU is implied for training.
  • Resources: The README shows memory usage comparisons: 1-bit BitLLAMA uses ~100MB for model weights compared to ~250MB for 16-bit Llama.

Highlighted Details

  • Implements BitLinear layer for Llama 2 architecture.
  • Demonstrates memory savings: ~100MB for 1-bit BitLLAMA vs. ~250MB for 16-bit Llama.
  • Includes sample code for Language Model training (e.g., Wikitext-103).
  • Planned updates include using uint8 and custom CUDA kernels for 1-bit weights.

Maintenance & Community

The project is maintained by Beomi. There are no explicit links to community channels or roadmaps provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The implementation is still under active development, with several planned updates including full 1-bit weight usage and custom CUDA kernels. The current version uses a mixed-precision approach for weights.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
10 more.

qlora by artidoro

0.2%
11k
Finetuning tool for quantized LLMs
created 2 years ago
updated 1 year ago
Feedback? Help us improve.