l0 by cmriat

Scalable pipeline for training general-purpose agents

Created 6 months ago

363 stars

Top 77.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Project Summary

L0 is a scalable, end-to-end training pipeline for general-purpose agents, offering a framework for complex reinforcement learning environments and a "code-as-action" agent scaffold called NB-Agent. It aims to enable agents to perform general tasks through a REPL interface and multi-turn training recipes, with pre-trained models available up to 32B parameters.

How It Works

L0 employs an Agentic Policy Gradient algorithm, treating entire "think-code" sequences as single actions, optimized with a verifiable reward function for correctness, format, and execution. It uses strict on-policy training with KL-divergence penalties and a DAPO-inspired rejection sampling strategy. The infrastructure features a decoupled architecture separating CPU agent workers from GPU inference servers, a flexible FastAPI-based orchestration, and lightweight sandboxing via Bubblewrap for secure, parallel agent environments.

Quick Start & Requirements

Installation involves cloning the repository and using Pixi for package management. A typical training setup requires preparing datasets, starting an agent execution manager server (CPU-bound), configuring remote server URLs, and setting up API keys for external services (Jina, Exa/Firecrawl/Serper). Multi-node training necessitates a Ray cluster. The project supports training from 0.6B to 32B parameter models, with hardware requirements scaling from 1 GPU for 0.6B models to 64 GPUs across 8 nodes for 32B models.

Highlighted Details

Provides pre-trained models: L0-4B (Qwen 3), L0-7B (Qwen2.5), and L0-32B (Qwen2.5).
NB-Agent can be used with existing models like Gemini and Claude without further training.
Leverages Pixi for cross-platform package management.
Adapts code from verl and SGLang projects.

Maintenance & Community

The project acknowledges contributions from the verl, SGLang, Open-Reasoner-Zero, and DAPO communities. It also thanks the Pixi team. Links to Hugging Face models and a Zhihu article are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

Potential issues include Out of Memory (OOM) errors during SGLang server CUDA graph capture, which may require launching a Ray cluster or adjusting tensor parallel size. Training may also hang at the update_weight_from_tensors step, necessitating process restarts.

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days