LLM-Reading-List  by evanmiller

LLM paper list focused on efficient inference and compression

Created 2 years ago
742 stars

Top 46.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated reading list focused on Large Language Model (LLM) inference and model compression techniques. It serves as a personal knowledge base for researchers and engineers working on optimizing LLM performance, efficiency, and deployment.

How It Works

The list categorizes papers by key LLM components and optimization strategies, including Transformer architectures, foundation models, position encoding, KV cache, activation functions, pruning, quantization, normalization, sparsity, rank compression, fine-tuning methods (LoRA, QLoRA), sampling, scaling strategies (tensor/pipeline parallelism), Mixture of Experts (MoE), and watermarking. This structured approach allows for systematic exploration of advancements in LLM efficiency.

Highlighted Details

  • Covers foundational papers like "Attention Is All You Need" and "Transformer-XL."
  • Includes recent advancements in KV cache optimization (H2O, vLLM, FlashAttention).
  • Features a comprehensive section on pruning techniques (OBD, OBS, SparseGPT, Wanda).
  • Details various quantization methods (LLM.int8(), SmoothQuant, QuIP, SqueezeLLM).
  • Explores parameter-efficient fine-tuning (LoRA, QLoRA, DyLoRA).

Maintenance & Community

This is a personal reading list maintained by Evan Miller. No community or active maintenance is indicated.

Licensing & Compatibility

The repository itself contains no code, only links to research papers. Licensing is determined by the original publication venues of the papers.

Limitations & Caveats

This is a static list of papers and does not provide code implementations, benchmarks, or direct analysis. It requires users to access and review the papers independently.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
2 more.

sparsegpt by IST-DASLab

0.5%
836
Code for massive language model one-shot pruning (ICML 2023 paper)
Created 2 years ago
Updated 1 year ago
Starred by Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

wanda by locuslab

0.4%
802
LLM pruning research paper implementation
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.