l0  by cmriat

Scalable pipeline for training general-purpose agents

Created 6 months ago
363 stars

Top 77.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

L0 is a scalable, end-to-end training pipeline for general-purpose agents, offering a framework for complex reinforcement learning environments and a "code-as-action" agent scaffold called NB-Agent. It aims to enable agents to perform general tasks through a REPL interface and multi-turn training recipes, with pre-trained models available up to 32B parameters.

How It Works

L0 employs an Agentic Policy Gradient algorithm, treating entire "think-code" sequences as single actions, optimized with a verifiable reward function for correctness, format, and execution. It uses strict on-policy training with KL-divergence penalties and a DAPO-inspired rejection sampling strategy. The infrastructure features a decoupled architecture separating CPU agent workers from GPU inference servers, a flexible FastAPI-based orchestration, and lightweight sandboxing via Bubblewrap for secure, parallel agent environments.

Quick Start & Requirements

Installation involves cloning the repository and using Pixi for package management. A typical training setup requires preparing datasets, starting an agent execution manager server (CPU-bound), configuring remote server URLs, and setting up API keys for external services (Jina, Exa/Firecrawl/Serper). Multi-node training necessitates a Ray cluster. The project supports training from 0.6B to 32B parameter models, with hardware requirements scaling from 1 GPU for 0.6B models to 64 GPUs across 8 nodes for 32B models.

Highlighted Details

  • Provides pre-trained models: L0-4B (Qwen 3), L0-7B (Qwen2.5), and L0-32B (Qwen2.5).
  • NB-Agent can be used with existing models like Gemini and Claude without further training.
  • Leverages Pixi for cross-platform package management.
  • Adapts code from verl and SGLang projects.

Maintenance & Community

The project acknowledges contributions from the verl, SGLang, Open-Reasoner-Zero, and DAPO communities. It also thanks the Pixi team. Links to Hugging Face models and a Zhihu article are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

Potential issues include Out of Memory (OOM) errors during SGLang server CUDA graph capture, which may require launching a Ray cluster or adjusting tensor parallel size. Training may also hang at the update_weight_from_tensors step, necessitating process restarts.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

ROMA by sentient-agi

1.6%
5k
A meta-agent framework for building hierarchical multi-agent systems
Created 8 months ago
Updated 1 month ago
Feedback? Help us improve.