Curated list for efficient LLMs
Top 24.4% on sourcepulse
This repository is a curated list of papers and projects focused on making Large Language Models (LLMs) more efficient. It serves researchers and practitioners looking to reduce computational costs, memory footprint, and latency in LLM deployment and training. The list covers a wide range of techniques, including pruning, quantization, knowledge distillation, and architectural modifications.
How It Works
The repository organizes research papers into distinct categories such as Network Pruning/Sparsity, Knowledge Distillation, Quantization, Inference Acceleration, Efficient MOE, Efficient Architecture, KV Cache Compression, Text Compression, Low-Rank Decomposition, Hardware/System/Serving, Efficient Fine-tuning, and Efficient Training. Each entry typically includes a title, authors, a brief introduction, and links to the paper or code. The list is updated regularly, with recent papers highlighted on the main page.
Quick Start & Requirements
This is a curated list of research papers and projects, not a runnable software package. No installation or specific requirements are needed to browse the content.
Highlighted Details
Maintenance & Community
The project is community-driven, with contributions welcomed via pull requests or email. The README indicates active updates and community engagement.
Licensing & Compatibility
The repository itself is a collection of links and information, not software with a specific license. Individual papers and projects linked within the repository will have their own licenses.
Limitations & Caveats
As a curated list, it does not provide direct implementations or benchmarks. Users must refer to the linked papers and projects for practical application and performance details.
1 month ago
1 day