LLM compression papers and tools for efficient training/inference
Top 26.6% on sourcepulse
This repository serves as a curated collection of research papers and tools focused on Large Language Model (LLM) compression. It aims to provide a comprehensive resource for researchers and practitioners looking to accelerate LLM training and inference by reducing model size and computational requirements.
How It Works
The repository categorizes LLM compression techniques into several key areas: Quantization, Pruning and Sparsity, Distillation, Efficient Prompting, KV Cache Compression, and Other methods. Each category lists relevant research papers with links to their publications or code repositories, facilitating easy access to state-of-the-art techniques and their implementations.
Quick Start & Requirements
This repository is a curated list and does not have direct installation or execution commands. Users are expected to follow the links provided for individual papers and tools to access their respective requirements and setup instructions.
Highlighted Details
Maintenance & Community
The repository is actively maintained, with contributions welcomed from the community. It encourages users to add new papers and tools related to LLM compression, ensuring the list remains up-to-date with the latest advancements.
Licensing & Compatibility
The repository itself is a collection of links and does not impose a specific license. Users must adhere to the licenses of the individual papers and tools they access.
Limitations & Caveats
As a curated list, the repository does not provide direct tooling or implementation. Users must independently evaluate and integrate the referenced papers and tools, which may have varying levels of maturity, documentation, and compatibility.
1 month ago
1 day