LLM-Reading-List  by evanmiller

LLM paper list focused on efficient inference and compression

created 2 years ago
736 stars

Top 48.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated reading list focused on Large Language Model (LLM) inference and model compression techniques. It serves as a personal knowledge base for researchers and engineers working on optimizing LLM performance, efficiency, and deployment.

How It Works

The list categorizes papers by key LLM components and optimization strategies, including Transformer architectures, foundation models, position encoding, KV cache, activation functions, pruning, quantization, normalization, sparsity, rank compression, fine-tuning methods (LoRA, QLoRA), sampling, scaling strategies (tensor/pipeline parallelism), Mixture of Experts (MoE), and watermarking. This structured approach allows for systematic exploration of advancements in LLM efficiency.

Highlighted Details

  • Covers foundational papers like "Attention Is All You Need" and "Transformer-XL."
  • Includes recent advancements in KV cache optimization (H2O, vLLM, FlashAttention).
  • Features a comprehensive section on pruning techniques (OBD, OBS, SparseGPT, Wanda).
  • Details various quantization methods (LLM.int8(), SmoothQuant, QuIP, SqueezeLLM).
  • Explores parameter-efficient fine-tuning (LoRA, QLoRA, DyLoRA).

Maintenance & Community

This is a personal reading list maintained by Evan Miller. No community or active maintenance is indicated.

Licensing & Compatibility

The repository itself contains no code, only links to research papers. Licensing is determined by the original publication venues of the papers.

Limitations & Caveats

This is a static list of papers and does not provide code implementations, benchmarks, or direct analysis. It requires users to access and review the papers independently.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.