LLM-with-RL-papers  by floodsung

Paper list for LLMs using reinforcement learning

created 2 years ago
276 stars

Top 94.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated collection of research papers focused on the intersection of Large Language Models (LLMs) and Reinforcement Learning (RL). It targets researchers and practitioners in AI, NLP, and RL, providing a centralized resource for understanding advancements in areas like instruction following, reasoning, and self-improvement through RL techniques.

How It Works

The collection categorizes papers into key themes: RL without Human Feedback, RL with Human Feedback (RLHF), and Prompt-based RL-related methods. This structure allows users to navigate the landscape of LLM-RL research, from foundational RL applications to advanced human-in-the-loop training paradigms and prompt optimization strategies.

Quick Start & Requirements

This repository is a collection of links to research papers and associated code repositories. No installation or specific requirements are needed to browse the paper list.

  • Code Repositories: Links to relevant codebases like DeepSpeed Chat RLHF, TRLX, PKU-Beaver, and ColossalAI are provided.

Highlighted Details

  • Comprehensive coverage of RLHF, including seminal works and recent advancements in alignment and safety.
  • Inclusion of papers on novel RL applications for LLMs, such as code generation and mathematical reasoning.
  • Categorization of prompt-based RL methods, highlighting techniques like Self-Refine and ReAct.
  • Links to foundational review papers on LLMs for decision-making.

Maintenance & Community

The repository is maintained by floodsung. Specific community channels or active development status are not detailed in the README.

Licensing & Compatibility

The repository itself, as a collection of links, does not have a specific license. The licenses of linked papers and code repositories would need to be checked individually.

Limitations & Caveats

This is a curated list of papers and does not include implementations or runnable code for the research described. The focus is purely on the academic literature.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.