Discover and explore top open-source AI tools and projects—updated daily.
Scalable pipeline for training general-purpose agents
Top 77.9% on SourcePulse
L0 is a scalable, end-to-end training pipeline for general-purpose agents, offering a framework for complex reinforcement learning environments and a "code-as-action" agent scaffold called NB-Agent. It aims to enable agents to perform general tasks through a REPL interface and multi-turn training recipes, with pre-trained models available up to 32B parameters.
How It Works
L0 employs an Agentic Policy Gradient algorithm, treating entire "think-code" sequences as single actions, optimized with a verifiable reward function for correctness, format, and execution. It uses strict on-policy training with KL-divergence penalties and a DAPO-inspired rejection sampling strategy. The infrastructure features a decoupled architecture separating CPU agent workers from GPU inference servers, a flexible FastAPI-based orchestration, and lightweight sandboxing via Bubblewrap for secure, parallel agent environments.
Quick Start & Requirements
Installation involves cloning the repository and using Pixi for package management. A typical training setup requires preparing datasets, starting an agent execution manager server (CPU-bound), configuring remote server URLs, and setting up API keys for external services (Jina, Exa/Firecrawl/Serper). Multi-node training necessitates a Ray cluster. The project supports training from 0.6B to 32B parameter models, with hardware requirements scaling from 1 GPU for 0.6B models to 64 GPUs across 8 nodes for 32B models.
Highlighted Details
Maintenance & Community
The project acknowledges contributions from the verl, SGLang, Open-Reasoner-Zero, and DAPO communities. It also thanks the Pixi team. Links to Hugging Face models and a Zhihu article are provided.
Licensing & Compatibility
The repository's license is not explicitly stated in the README.
Limitations & Caveats
Potential issues include Out of Memory (OOM) errors during SGLang server CUDA graph capture, which may require launching a Ray cluster or adjusting tensor parallel size. Training may also hang at the update_weight_from_tensors
step, necessitating process restarts.
2 months ago
Inactive