Discover and explore top open-source AI tools and projects—updated daily.
Jiayi-PanMinimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks
Top 4.0% on SourcePulse
TinyZero aims to reproduce the reasoning capabilities of DeepSeek R1-Zero, specifically for countdown and multiplication tasks. It targets researchers and developers interested in enhancing large language models (LLMs) with self-verification and search abilities through reinforcement learning (RL), offering a cost-effective path to advanced reasoning.
How It Works
TinyZero builds upon the veRL framework, leveraging RL to imbue a base LLM with emergent self-verification and search skills. This approach allows the model to develop sophisticated reasoning without explicit programming, potentially leading to more robust and generalizable problem-solving capabilities.
Quick Start & Requirements
conda create -n zero python=3.9, pip install torch, pip3 install vllm==0.6.3, pip3 install ray verl, pip install -e ., pip3 install flash-attn --no-build-isolation, pip install wandb IPython matplotlib.python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}../scripts/train_tiny_zero.sh with environment variables for GPU count, model path, data directory, and experiment name.critic.model.enable_gradient_checkpointing=True.Highlighted Details
Maintenance & Community
The project is maintained by Jiayi Pan and collaborators. Further details on community channels or roadmaps are not explicitly provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README notes that Qwen2.5-0.5B base models may fail to learn reasoning. For larger models, multi-GPU setups are recommended, and gradient checkpointing might be necessary to manage VRAM. The project appears to be research-oriented, and stability for production use is not guaranteed.
8 months ago
Inactive
sail-sg
0russwest0
hkust-nlp