BitNet-Transformers  by Beomi

HuggingFace Transformers implementation of BitNet scaling for LLMs

Created 1 year ago
307 stars

Top 87.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of BitNet, a 1-bit transformer architecture for large language models, integrated with the Hugging Face Transformers library. It targets researchers and engineers looking to explore significant memory savings and potential performance gains in LLMs by quantizing weights to 1-bit.

How It Works

The core innovation is the BitLinear layer, which replaces standard linear layers in the Llama 2 architecture. This layer quantizes weights to 1-bit, drastically reducing memory footprint while employing a mixed-precision approach for activations and other parameters to maintain performance. The implementation integrates directly into Hugging Face's Llama model by patching the modeling_llama.py file.

Quick Start & Requirements

  • Install: Clone the repo, install requirements (pip install -r clm_requirements.txt), clone the Hugging Face Transformers repo, and install it in editable mode (pip install -e transformers). Then, replace the original Llama modeling file with the BitNet version.
  • Prerequisites: PyTorch, Hugging Face Transformers, Python. GPU is implied for training.
  • Resources: The README shows memory usage comparisons: 1-bit BitLLAMA uses ~100MB for model weights compared to ~250MB for 16-bit Llama.

Highlighted Details

  • Implements BitLinear layer for Llama 2 architecture.
  • Demonstrates memory savings: ~100MB for 1-bit BitLLAMA vs. ~250MB for 16-bit Llama.
  • Includes sample code for Language Model training (e.g., Wikitext-103).
  • Planned updates include using uint8 and custom CUDA kernels for 1-bit weights.

Maintenance & Community

The project is maintained by Beomi. There are no explicit links to community channels or roadmaps provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The implementation is still under active development, with several planned updates including full 1-bit weight usage and custom CUDA kernels. The current version uses a mixed-precision approach for weights.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
1 more.

blt by facebookresearch

0.1%
2k
Code for Byte Latent Transformer research paper
Created 9 months ago
Updated 3 months ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.