DFloat11  by LeanModels

Lossless compression framework for efficient LLM GPU inference

Created 10 months ago
608 stars

Top 53.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DFloat11 is a lossless compression framework designed to reduce the size of Large Language Models (LLMs) by approximately 30%, enabling efficient GPU inference on resource-constrained hardware. It targets researchers and engineers working with LLMs who need to optimize memory usage and inference speed without compromising model accuracy.

How It Works

DFloat11 achieves lossless compression by employing a novel dynamic-length floating-point representation. This approach encodes model weights in a way that preserves bit-for-bit identical outputs compared to the original BFloat16 model. The framework integrates seamlessly with the HuggingFace ecosystem, allowing for easy adoption and use with existing LLM pipelines.

Quick Start & Requirements

  • Installation: pip install dfloat11[cuda12] or pip install dfloat11[cuda11]
  • Prerequisites: CUDA-compatible GPU, PyTorch.
  • Usage: Inference script provided (inference.py) or via HuggingFace from_pretrained with DFloat11ModelForCausalLM.
  • Links: Pre-compressed Models, Code Repository

Highlighted Details

  • ~30% lossless size reduction for LLM weights.
  • Bit-for-bit identical outputs to original BFloat16 models.
  • Up to 38.8x faster generation compared to CPU offloading.
  • Enables up to 13.17x longer context length within the same GPU memory budget.

Maintenance & Community

Developed by Rice University and xMAD.ai. GPU kernel designed by Tianyi Zhang.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Requires a CUDA-compatible GPU. The specific license and its implications for commercial use are not detailed in the provided README.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
13 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

rtp-llm by alibaba

0.3%
1k
LLM inference engine for diverse applications
Created 2 years ago
Updated 17 hours ago
Feedback? Help us improve.