Curated list of resources for LLM quantization research
Top 98.6% on sourcepulse
This repository is a curated list of resources focused on Large Language Model (LLM) quantization techniques. It aims to provide researchers and engineers with a comprehensive overview of papers, methods, and tools for reducing LLM size and computational requirements, enabling deployment on resource-constrained devices.
How It Works
The list primarily categorizes and summarizes key research papers on LLM quantization. It highlights techniques like GPTQ (post-training quantization using Hessian information for 3-4 bit weights), SmoothQuant (migrating quantization difficulty from activations to weights via a scaling factor transformation for W8A8), and AWQ (activation-aware weight quantization protecting salient weights for W4A16). These methods offer advantages by reducing model footprint and computational load while aiming to preserve accuracy.
Highlighted Details
Maintenance & Community
Contributions are welcomed via issues or pull requests for new resources, broken links, or outdated information.
Licensing & Compatibility
This repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The list focuses on research papers and techniques; it does not provide direct code implementations or pre-quantized models. Information may become outdated as the field rapidly evolves.
1 month ago
Inactive