LLM-with-RL-papers by floodsung

Paper list for LLMs using reinforcement learning

Created 2 years ago

278 stars

Top 93.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Vincent Weisser

Cofounder of Prime Intellect

Project Summary

This repository serves as a curated collection of research papers focused on the intersection of Large Language Models (LLMs) and Reinforcement Learning (RL). It targets researchers and practitioners in AI, NLP, and RL, providing a centralized resource for understanding advancements in areas like instruction following, reasoning, and self-improvement through RL techniques.

How It Works

The collection categorizes papers into key themes: RL without Human Feedback, RL with Human Feedback (RLHF), and Prompt-based RL-related methods. This structure allows users to navigate the landscape of LLM-RL research, from foundational RL applications to advanced human-in-the-loop training paradigms and prompt optimization strategies.

Quick Start & Requirements

This repository is a collection of links to research papers and associated code repositories. No installation or specific requirements are needed to browse the paper list.

Code Repositories: Links to relevant codebases like DeepSpeed Chat RLHF, TRLX, PKU-Beaver, and ColossalAI are provided.

Highlighted Details

Comprehensive coverage of RLHF, including seminal works and recent advancements in alignment and safety.
Inclusion of papers on novel RL applications for LLMs, such as code generation and mathematical reasoning.
Categorization of prompt-based RL methods, highlighting techniques like Self-Refine and ReAct.
Links to foundational review papers on LLMs for decision-making.

Maintenance & Community

The repository is maintained by floodsung. Specific community channels or active development status are not detailed in the README.

Licensing & Compatibility

The repository itself, as a collection of links, does not have a specific license. The licenses of linked papers and code repositories would need to be checked individually.

Limitations & Caveats

This is a curated list of papers and does not include implementations or runnable code for the research described. The focus is purely on the academic literature.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days