O1-Journey by GAIR-NLP

Research paper on replicating O1 via "journey learning"

Created 1 year ago

2,000 stars

Top 21.7% on SourcePulse

View on GitHub

6 Experts Love This Project

Elvis Saravia

Founder of DAIR.AI

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Dmytro Ivchenko

Cofounder of Fireworks AI

Binyuan Hui

Research Scientist at Alibaba Qwen

and 2 more!

Project Summary

This project documents a transparent, real-time replication effort of OpenAI's O1 model, focusing on a novel "journey learning" paradigm. It targets AI researchers and practitioners interested in understanding and reproducing advanced LLM capabilities, particularly in complex reasoning tasks. The primary benefit is the open sharing of methodologies, datasets, and findings for advancing AI research.

How It Works

The project introduces "journey learning," a paradigm emphasizing continuous progress through learning, reflection, and adaptation, mimicking human-like intelligence with capabilities for backtracking and refinement. This approach is applied to replicate O1, with Part 2 demonstrating that simple distillation from O1's API, combined with supervised fine-tuning, can surpass O1-preview performance on mathematical reasoning. Part 3 explores inference-time scaling for medical reasoning, showing significant performance gains with extended reasoning time.

Quick Start & Requirements

The journey thought training dataset is available on Hugging Face.
Specific installation or execution commands are not detailed in the README.
Requirements likely include significant computational resources for LLM training and inference, and access to relevant medical and mathematical benchmarks.
Links:
- Part 1 Report: https://arxiv.org/abs/2410.18982
- Part 2 Report: https://arxiv.org/abs/2411.16489
- Part 3 Report: https://arxiv.org/abs/2501.06458
- Dataset: https://huggingface.co/datasets/GAIR-NLP/journey-thought

Highlighted Details

Introduces "journey learning," a paradigm for continuous AI progress with reflection and adaptation.
Demonstrates simple distillation from O1's API can outperform O1-preview on mathematical reasoning.
Shows inference-time scaling improves medical reasoning performance by 6%-11% with minimal training data.
Models exhibit improved generalization, reduced hallucination, and better safety.

Maintenance & Community

The core development team consists of undergraduate and PhD students from Shanghai Jiao Tong University's GAIR research group, guided by researchers from NYU and MBZUAI. Contact is available via email for those interested in joining.

Licensing & Compatibility

The README does not explicitly state a license. Given the nature of replicating proprietary models and the academic context, users should verify licensing for any derived works or commercial use.

Limitations & Caveats

The project is presented as an ongoing "journey" with resources gradually released. Specific implementation details and direct code for replication are not immediately available in the README, requiring users to consult the linked papers and potentially await further releases.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days