TinyZero  by Jiayi-Pan

Minimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks

created 6 months ago
12,073 stars

Top 4.2% on sourcepulse

GitHubView on GitHub
Project Summary

TinyZero aims to reproduce the reasoning capabilities of DeepSeek R1-Zero, specifically for countdown and multiplication tasks. It targets researchers and developers interested in enhancing large language models (LLMs) with self-verification and search abilities through reinforcement learning (RL), offering a cost-effective path to advanced reasoning.

How It Works

TinyZero builds upon the veRL framework, leveraging RL to imbue a base LLM with emergent self-verification and search skills. This approach allows the model to develop sophisticated reasoning without explicit programming, potentially leading to more robust and generalizable problem-solving capabilities.

Quick Start & Requirements

  • Installation: conda create -n zero python=3.9, pip install torch, pip3 install vllm==0.6.3, pip3 install ray verl, pip install -e ., pip3 install flash-attn --no-build-isolation, pip install wandb IPython matplotlib.
  • Prerequisites: Python 3.9, PyTorch (CUDA 12.1 recommended), vLLM (0.6.3 or compatible), Ray, veRL, Flash Attention 2, wandb, IPython, matplotlib.
  • Data Prep: python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}.
  • Training: Uses bash scripts like ./scripts/train_tiny_zero.sh with environment variables for GPU count, model path, data directory, and experiment name.
  • Resources: Single GPU works for models <= 1.5B; 2+ GPUs recommended for 3B+ models. Out-of-VRAM issues can be mitigated with critic.model.enable_gradient_checkpointing=True.
  • Links: Twitter thread: https://x.com/jiayi_pirate/status/1882839370505621655, Experiment log: https://wandb.ai/jiayipan/TinyZero.

Highlighted Details

  • Minimal reproduction of DeepSeek R1-Zero.
  • Enables self-verification and search abilities via RL.
  • Supports Qwen2.5 series base models.
  • Offers instructions for both base and instruct-tuned models.

Maintenance & Community

The project is maintained by Jiayi Pan and collaborators. Further details on community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that Qwen2.5-0.5B base models may fail to learn reasoning. For larger models, multi-GPU setups are recommended, and gradient checkpointing might be necessary to manage VRAM. The project appears to be research-oriented, and stability for production use is not guaranteed.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
3
Star History
442 stars in the last 90 days

Explore Similar Projects

Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.