gguf-docs  by iuliaturc

GGUF quantization explained

Created 9 months ago
416 stars

Top 70.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides unofficial documentation for the GGUF quantization ecosystem, which encompasses the GGML tensor library, the llama.cpp inference engine, and the GGUF binary file format. It aims to clarify the various quantization algorithms and settings for users, particularly those looking to run large language models on consumer-grade hardware by reducing model memory footprint through post-training quantization (PTQ).

How It Works

GGUF quantization is a Post-Training Quantization (PTQ) method applied to pre-trained, high-precision LLMs. It works by reducing the bit width of individual model weights. This process significantly decreases the memory requirements of the model, enabling inference on less powerful, consumer-grade hardware. The ecosystem includes the GGML tensor library and the llama.cpp inference engine, which is optimized for CPU-based LLM inference.

Quick Start & Requirements

This repository is documentation-focused and does not have direct installation or execution commands. However, it references the llama.cpp repository for practical implementation. Requirements would typically involve a C++ compiler and potentially Python for associated scripts, depending on the specific llama.cpp usage.

Highlighted Details

  • GGUF is an evolution of PTQ methods like GPTQ, AWQ, QLoRA, and QuIP#.
  • The ecosystem was primarily developed by Georgi Gerganov and collaborators, with a focus on practical implementation over formal documentation.
  • This documentation is human-written, with AI-generated sections clearly flagged.

Maintenance & Community

Contributions are welcomed via pull requests, provided they are supported by reliable references from official sources like the llama.cpp repository. The project emphasizes human-written content.

Licensing & Compatibility

The repository itself is documentation, and its licensing is not specified. However, it pertains to the GGUF ecosystem, which is closely tied to llama.cpp. Users should refer to the llama.cpp repository for licensing details relevant to the underlying technologies.

Limitations & Caveats

As unofficial documentation, there may be omissions or inaccuracies. The rapid evolution of the GGUF ecosystem means that documentation may lag behind the latest developments. Contributions are subject to review to ensure quality and adherence to guidelines.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.3%
6k
PyTorch implementation for Google's Gemma models
Created 2 years ago
Updated 10 months ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

2.4%
15k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 1 month ago
Feedback? Help us improve.