nano-aha-moment  by McGill-NLP

Single-file library for "RL for LLMs" training

created 5 months ago
509 stars

Top 62.1% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a single-file, single-GPU implementation for Reinforcement Learning (RL) from Human Feedback (RLHF) for Large Language Models (LLMs), targeting researchers and practitioners who want to understand and experiment with RLHF training from scratch. It offers an efficient, full-parameter tuning approach, making complex RLHF training accessible and understandable.

How It Works

The library implements a DeepSeek R1-zero style training pipeline, focusing on simplicity and efficiency. It avoids external RL libraries, integrating all necessary components within a single Jupyter notebook or Python script. This design choice allows for complete visibility and understanding of the RLHF process, from data handling to model fine-tuning, facilitating rapid iteration and debugging.

Quick Start & Requirements

  • Install dependencies: pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124 and pip install -r requirements.txt.
  • Requires CUDA 12.4.
  • Training can be initiated by running nano_r1.ipynb or nano_r1_script.py.
  • Official model: McGill-NLP/nano-aha-moment-3b
  • Lecture Series: Part 1, Part 2

Highlighted Details

  • Single file, single GPU implementation for RLHF.
  • Full parameter tuning for efficient training (<10h).
  • Inspired by TinyZero and Mini-R1, with a focus on simplicity and clarity.
  • Achieved ~60% Accuracy on the CountDown Task with the trained 3B model.

Maintenance & Community

The project is associated with McGill University's NLP group. Further community engagement details (e.g., Discord/Slack) are not specified in the README.

Licensing & Compatibility

The repository does not explicitly state a license. This lack of explicit licensing means it defaults to all rights reserved, potentially restricting commercial use or integration into closed-source projects.

Limitations & Caveats

The project is described as a "Todo" item for a "Full evaluation suite," indicating that comprehensive benchmarking and validation may be incomplete. The absence of a specified license poses a significant caveat for adoption in commercial or closed-source environments.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
77 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

TinyZero by Jiayi-Pan

0.2%
12k
Minimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks
created 6 months ago
updated 3 months ago
Feedback? Help us improve.