TinyZero by Jiayi-Pan

Minimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks

Created 11 months ago

12,587 stars

Top 4.0% on SourcePulse

View on GitHub

13 Experts Love This Project

George Hotz

Author of tinygrad; Founder of the tiny corp, comma.ai

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Vincent Weisser

Cofounder of Prime Intellect

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 9 more!

Project Summary

TinyZero aims to reproduce the reasoning capabilities of DeepSeek R1-Zero, specifically for countdown and multiplication tasks. It targets researchers and developers interested in enhancing large language models (LLMs) with self-verification and search abilities through reinforcement learning (RL), offering a cost-effective path to advanced reasoning.

How It Works

TinyZero builds upon the veRL framework, leveraging RL to imbue a base LLM with emergent self-verification and search skills. This approach allows the model to develop sophisticated reasoning without explicit programming, potentially leading to more robust and generalizable problem-solving capabilities.

Quick Start & Requirements

Installation: conda create -n zero python=3.9, pip install torch, pip3 install vllm==0.6.3, pip3 install ray verl, pip install -e ., pip3 install flash-attn --no-build-isolation, pip install wandb IPython matplotlib.
Prerequisites: Python 3.9, PyTorch (CUDA 12.1 recommended), vLLM (0.6.3 or compatible), Ray, veRL, Flash Attention 2, wandb, IPython, matplotlib.
Data Prep: python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}.
Training: Uses bash scripts like ./scripts/train_tiny_zero.sh with environment variables for GPU count, model path, data directory, and experiment name.
Resources: Single GPU works for models <= 1.5B; 2+ GPUs recommended for 3B+ models. Out-of-VRAM issues can be mitigated with critic.model.enable_gradient_checkpointing=True.
Links: Twitter thread: https://x.com/jiayi_pirate/status/1882839370505621655, Experiment log: https://wandb.ai/jiayipan/TinyZero.

Highlighted Details

Minimal reproduction of DeepSeek R1-Zero.
Enables self-verification and search abilities via RL.
Supports Qwen2.5 series base models.
Offers instructions for both base and instruct-tuned models.

Maintenance & Community

The project is maintained by Jiayi Pan and collaborators. Further details on community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that Qwen2.5-0.5B base models may fail to learn reasoning. For larger models, multi-GPU setups are recommended, and gradient checkpointing might be necessary to manage VRAM. The project appears to be research-oriented, and stability for production use is not guaranteed.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

137 stars in the last 30 days