LLM-Reading-List by evanmiller

LLM paper list focused on efficient inference and compression

Created 2 years ago

750 stars

Top 46.4% on SourcePulse

View on GitHub

5 Experts Love This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Core Maintainer of vLLM

and 1 more!

Project Summary

This repository is a curated reading list focused on Large Language Model (LLM) inference and model compression techniques. It serves as a personal knowledge base for researchers and engineers working on optimizing LLM performance, efficiency, and deployment.

How It Works

The list categorizes papers by key LLM components and optimization strategies, including Transformer architectures, foundation models, position encoding, KV cache, activation functions, pruning, quantization, normalization, sparsity, rank compression, fine-tuning methods (LoRA, QLoRA), sampling, scaling strategies (tensor/pipeline parallelism), Mixture of Experts (MoE), and watermarking. This structured approach allows for systematic exploration of advancements in LLM efficiency.

Highlighted Details

Covers foundational papers like "Attention Is All You Need" and "Transformer-XL."
Includes recent advancements in KV cache optimization (H2O, vLLM, FlashAttention).
Features a comprehensive section on pruning techniques (OBD, OBS, SparseGPT, Wanda).
Details various quantization methods (LLM.int8(), SmoothQuant, QuIP, SqueezeLLM).
Explores parameter-efficient fine-tuning (LoRA, QLoRA, DyLoRA).

Maintenance & Community

This is a personal reading list maintained by Evan Miller. No community or active maintenance is indicated.

Licensing & Compatibility

The repository itself contains no code, only links to research papers. Licensing is determined by the original publication venues of the papers.

Limitations & Caveats

This is a static list of papers and does not provide code implementations, benchmarks, or direct analysis. It requires users to access and review the papers independently.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days