BitNet  by microsoft

Inference framework for 1-bit LLMs

created 1 year ago
20,630 stars

Top 2.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official inference framework for 1-bit Large Language Models (LLMs), specifically BitNet b1.58. It enables fast and energy-efficient LLM inference on CPUs, with future support for NPUs and GPUs, targeting researchers and users who want to run LLMs locally on less powerful hardware.

How It Works

BitNet leverages optimized C++ kernels, building upon the llama.cpp framework and Lookup Table methodologies from T-MAC. This approach allows for lossless inference of 1.58-bit models, achieving significant speedups and energy reductions on CPUs by utilizing specialized quantization techniques.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create -n bitnet-cpp python=3.9, conda activate bitnet-cpp), install dependencies (pip install -r requirements.txt), and build the project.
  • Prerequisites: Python >= 3.9, CMake >= 3.22, Clang >= 18. For Windows, Visual Studio 2022 with "Desktop development with C++" and "C++ CMake tools for Windows" is required.
  • Model Download: Use huggingface-cli download to get models, then run python setup_env.py to prepare them for inference.
  • Inference: Execute python run_inference.py -m <model_path> -p "Your prompt" for text generation.
  • Resources: Official demo available. See README for detailed build instructions for different OS.

Highlighted Details

  • Achieves 1.37x-5.07x speedups on ARM CPUs and 2.37x-6.17x on x86 CPUs.
  • Reduces energy consumption by 55.4%-70.0% on ARM and 71.9%-82.2% on x86.
  • Supports running a 100B BitNet b1.58 model on a single CPU at 5-7 tokens/sec.
  • Includes scripts for benchmarking and generating dummy models for performance testing.

Maintenance & Community

This project is based on llama.cpp. The latest updates include official 2B parameter models on Hugging Face and efficient edge inference for ternary LLMs.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

NPU and GPU support are listed as "coming next." The README mentions that tested models are dummy setups used in a research context, and some specific model configurations (e.g., BitNet-b1.58-3B) may not support all quantization types (e.g., i2_s on x86).

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
2
Star History
2,757 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

JittorLLMs by Jittor

0%
2k
Low-resource LLM inference library
created 2 years ago
updated 5 months ago
Feedback? Help us improve.