LLM paper list focused on efficient inference and compression
Top 48.0% on sourcepulse
This repository is a curated reading list focused on Large Language Model (LLM) inference and model compression techniques. It serves as a personal knowledge base for researchers and engineers working on optimizing LLM performance, efficiency, and deployment.
How It Works
The list categorizes papers by key LLM components and optimization strategies, including Transformer architectures, foundation models, position encoding, KV cache, activation functions, pruning, quantization, normalization, sparsity, rank compression, fine-tuning methods (LoRA, QLoRA), sampling, scaling strategies (tensor/pipeline parallelism), Mixture of Experts (MoE), and watermarking. This structured approach allows for systematic exploration of advancements in LLM efficiency.
Highlighted Details
Maintenance & Community
This is a personal reading list maintained by Evan Miller. No community or active maintenance is indicated.
Licensing & Compatibility
The repository itself contains no code, only links to research papers. Licensing is determined by the original publication venues of the papers.
Limitations & Caveats
This is a static list of papers and does not provide code implementations, benchmarks, or direct analysis. It requires users to access and review the papers independently.
1 year ago
Inactive