O1-CODER by ADaM-BJTU

Coding model replication research paper

Created 1 year ago

336 stars

Top 81.9% on SourcePulse

Project Summary

O1-CODER aims to replicate OpenAI's O1 model for coding tasks, targeting researchers and developers seeking to improve AI's systematic reasoning in code generation. It offers a novel approach combining Reinforcement Learning (RL) and Monte Carlo Tree Search (MCTS) to enhance "System-2" thinking, potentially leading to more efficient and logical code.

How It Works

The project employs a Test Case Generator (TCG) to create standardized evaluations for generated code. Its core innovation lies in a self-play and RL loop where the model generates reasoning data, then uses RL and MCTS to iteratively refine its policy. This cycle continuously optimizes the model for systematic reasoning and code optimization, leveraging diverse supervision data from a small amount of ground truth code.

Quick Start & Requirements

Installation: Not explicitly detailed, but likely involves cloning the repository and installing Python dependencies.
Prerequisites: Python environment, potentially specific libraries for RL and MCTS. GPU acceleration is highly recommended for training.
Resources: Training RL models can be computationally intensive, requiring significant GPU resources and time.
Links: Paper

Highlighted Details

Replication of OpenAI's O1 model for coding.
Utilizes RL and MCTS for enhanced systematic reasoning.
Test Case Generator (TCG) for automated code evaluation.
Can generate diverse supervision data from minimal ground truth code.

Maintenance & Community

The project appears to be actively maintained with recent updates to its reward aggregator, training code, and technical report. Planned updates include RL code, curated datasets, and a Reinforcement Fine-Tuning (RFT) version. Community channels are not specified in the README.

Licensing & Compatibility

This work is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README indicates that the Reinforcement Learning code and curated datasets are still under development. The RFT version plans to skip CoT data initialization, which might impact initial model performance or stability.

O1-CODER by ADaM-BJTU

Explore Similar Projects

LLM-with-RL-papers by floodsung

Super_MARIO by MARIO-Math-Reasoning

GenSim by liruiw

machin by iffiX

swe-rl by facebookresearch

alpaca_farm by tatsu-lab

Awesome-System2-Reasoning-LLM by zzli2022

Mulberry by HJYao00

rStar by microsoft

OpenAlpha_Evolve by shyamsaktawat

Absolute-Zero-Reasoner by LeapLabTHU

dgm by jennyzzt