gguf-docs by iuliaturc

GGUF quantization explained

Created 1 year ago

491 stars

Top 62.1% on SourcePulse

Project Summary

This repository provides unofficial documentation for the GGUF quantization ecosystem, which encompasses the GGML tensor library, the llama.cpp inference engine, and the GGUF binary file format. It aims to clarify the various quantization algorithms and settings for users, particularly those looking to run large language models on consumer-grade hardware by reducing model memory footprint through post-training quantization (PTQ).

How It Works

GGUF quantization is a Post-Training Quantization (PTQ) method applied to pre-trained, high-precision LLMs. It works by reducing the bit width of individual model weights. This process significantly decreases the memory requirements of the model, enabling inference on less powerful, consumer-grade hardware. The ecosystem includes the GGML tensor library and the llama.cpp inference engine, which is optimized for CPU-based LLM inference.

Quick Start & Requirements

This repository is documentation-focused and does not have direct installation or execution commands. However, it references the llama.cpp repository for practical implementation. Requirements would typically involve a C++ compiler and potentially Python for associated scripts, depending on the specific llama.cpp usage.

Highlighted Details

GGUF is an evolution of PTQ methods like GPTQ, AWQ, QLoRA, and QuIP#.
The ecosystem was primarily developed by Georgi Gerganov and collaborators, with a focus on practical implementation over formal documentation.
This documentation is human-written, with AI-generated sections clearly flagged.

Maintenance & Community

Contributions are welcomed via pull requests, provided they are supported by reliable references from official sources like the llama.cpp repository. The project emphasizes human-written content.

Licensing & Compatibility

The repository itself is documentation, and its licensing is not specified. However, it pertains to the GGUF ecosystem, which is closely tied to llama.cpp. Users should refer to the llama.cpp repository for licensing details relevant to the underlying technologies.

Limitations & Caveats

As unofficial documentation, there may be omissions or inaccuracies. The rapid evolution of the GGUF ecosystem means that documentation may lag behind the latest developments. Contributions are subject to review to ensure quality and adherence to guidelines.

gguf-docs by iuliaturc

Explore Similar Projects

EXAONE-Deep by LG-AI-EXAONE

cformers by NolanoOrg

ScaleLLM by vectorch-ai

quip-sharp by Cornell-RelaxML

InferLLM by MegEngine

deepcompressor by nunchaku-ai

GPTQModel by ModelCloud

ik_llama.cpp by ikawrakow

awesome-free-llm-apis by mnfst

gemma_pytorch by google

lmdeploy by InternLM

airllm by lyogavin