Awesome-LLM-Post-training by mbzuai-oryx

Curated list of LLM post-training resources

Created 11 months ago

2,247 stars

Top 20.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This repository serves as a comprehensive survey and guide to post-training methodologies for Large Language Models (LLMs), with a particular focus on enhancing reasoning capabilities. It targets researchers and practitioners in AI and NLP, offering a curated collection of papers, code, benchmarks, and tutorials to facilitate understanding and implementation of advanced LLM training techniques.

How It Works

The project categorizes LLM post-training approaches into Fine-tuning, Reinforcement Learning (RL), and Test-time Scaling methods. It delves into specific techniques like RLHF, reward learning, policy optimization, and LLM-augmented RL, providing a structured overview of how these methods improve LLM reasoning, decision-making, and generalization. The repository highlights the integration of LLMs with RL frameworks and explores applications in areas like autonomous agents and complex problem-solving.

Quick Start & Requirements

This repository is a curated collection of resources, not a runnable software package. It links to various papers (arXiv, Springer, Oxford Academic), code repositories (GitHub), and tutorials (websites). No direct installation or execution commands are provided.

Highlighted Details

Extensive taxonomy of post-training methods, including Fine-tuning, RL, and Test-time Scaling.
Curated list of over 30 influential papers on LLM reasoning and RL, with links to publications.
Categorized list of 15+ relevant code repositories and libraries for LLM fine-tuning and RL integration.
Collection of benchmarks and datasets for evaluating LLM reasoning and RL performance.

Maintenance & Community

The repository is actively maintained by mbzuai-oryx and encourages community contributions via pull requests. It cites a primary paper and provides a BibTeX entry for academic use. Feedback and issues can be raised directly in the repository.

Licensing & Compatibility

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This license restricts commercial use and requires derivative works to be shared under the same terms.

Limitations & Caveats

As a curated resource list, this repository does not provide executable code or direct implementations. Users must independently access and integrate the linked papers, code, and benchmarks. The focus is on research and academic exploration, not a production-ready framework.

Health Check

Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

40 stars in the last 30 days