l0  by cmriat

Scalable pipeline for training general-purpose agents

Created 2 months ago
359 stars

Top 77.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

L0 is a scalable, end-to-end training pipeline for general-purpose agents, offering a framework for complex reinforcement learning environments and a "code-as-action" agent scaffold called NB-Agent. It aims to enable agents to perform general tasks through a REPL interface and multi-turn training recipes, with pre-trained models available up to 32B parameters.

How It Works

L0 employs an Agentic Policy Gradient algorithm, treating entire "think-code" sequences as single actions, optimized with a verifiable reward function for correctness, format, and execution. It uses strict on-policy training with KL-divergence penalties and a DAPO-inspired rejection sampling strategy. The infrastructure features a decoupled architecture separating CPU agent workers from GPU inference servers, a flexible FastAPI-based orchestration, and lightweight sandboxing via Bubblewrap for secure, parallel agent environments.

Quick Start & Requirements

Installation involves cloning the repository and using Pixi for package management. A typical training setup requires preparing datasets, starting an agent execution manager server (CPU-bound), configuring remote server URLs, and setting up API keys for external services (Jina, Exa/Firecrawl/Serper). Multi-node training necessitates a Ray cluster. The project supports training from 0.6B to 32B parameter models, with hardware requirements scaling from 1 GPU for 0.6B models to 64 GPUs across 8 nodes for 32B models.

Highlighted Details

  • Provides pre-trained models: L0-4B (Qwen 3), L0-7B (Qwen2.5), and L0-32B (Qwen2.5).
  • NB-Agent can be used with existing models like Gemini and Claude without further training.
  • Leverages Pixi for cross-platform package management.
  • Adapts code from verl and SGLang projects.

Maintenance & Community

The project acknowledges contributions from the verl, SGLang, Open-Reasoner-Zero, and DAPO communities. It also thanks the Pixi team. Links to Hugging Face models and a Zhihu article are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

Potential issues include Out of Memory (OOM) errors during SGLang server CUDA graph capture, which may require launching a Ray cluster or adjusting tensor parallel size. Training may also hang at the update_weight_from_tensors step, necessitating process restarts.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research) and Will Brown Will Brown(Research Lead at Prime Intellect).

agent-lightning by microsoft

6.0%
2k
Train any AI agent with rollouts and feedback
Created 3 months ago
Updated 2 days ago
Feedback? Help us improve.