DFloat11  by LeanModels

Lossless compression framework for efficient LLM GPU inference

created 3 months ago
457 stars

Top 67.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DFloat11 is a lossless compression framework designed to reduce the size of Large Language Models (LLMs) by approximately 30%, enabling efficient GPU inference on resource-constrained hardware. It targets researchers and engineers working with LLMs who need to optimize memory usage and inference speed without compromising model accuracy.

How It Works

DFloat11 achieves lossless compression by employing a novel dynamic-length floating-point representation. This approach encodes model weights in a way that preserves bit-for-bit identical outputs compared to the original BFloat16 model. The framework integrates seamlessly with the HuggingFace ecosystem, allowing for easy adoption and use with existing LLM pipelines.

Quick Start & Requirements

  • Installation: pip install dfloat11[cuda12] or pip install dfloat11[cuda11]
  • Prerequisites: CUDA-compatible GPU, PyTorch.
  • Usage: Inference script provided (inference.py) or via HuggingFace from_pretrained with DFloat11ModelForCausalLM.
  • Links: Pre-compressed Models, Code Repository

Highlighted Details

  • ~30% lossless size reduction for LLM weights.
  • Bit-for-bit identical outputs to original BFloat16 models.
  • Up to 38.8x faster generation compared to CPU offloading.
  • Enables up to 13.17x longer context length within the same GPU memory budget.

Maintenance & Community

Developed by Rice University and xMAD.ai. GPU kernel designed by Tianyi Zhang.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Requires a CUDA-compatible GPU. The specific license and its implications for commercial use are not detailed in the provided README.

Health Check
Last commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
160 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 19 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Ying Sheng Ying Sheng(Author of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
created 2 years ago
updated 2 weeks ago
Feedback? Help us improve.