arbor by Ziems

Framework for optimizing DSPy programs with RL

Created 10 months ago

302 stars

Top 88.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Jeff Huber

Cofounder of Chroma

Omar Khattab

Coauthor of DSPy, ColBERT; Professor at MIT

Will Brown

Research Lead at Prime Intellect

Project Summary

Summary

Arbor is a framework designed to optimize DSPy programs using Reinforcement Learning (RL). It targets developers seeking to enhance the performance and efficiency of their language model programs by automating the fine-tuning process. The primary benefit is achieving superior program outputs through advanced RL techniques.

How It Works

Arbor employs a Generalized Proximal Policy Optimization (GRPO) approach to fine-tune DSPy language models. It integrates with DSPy via a custom ArborProvider, allowing an RL agent to iteratively improve program prompts and parameters. The framework uses a defined reward function to guide the optimization process, aiming to discover more effective program configurations than manual tuning or standard fine-tuning. Parameter-efficient fine-tuning via LoRA is supported.

Quick Start & Requirements

Installation: Install via uv pip install -U arbor-ai or pip install -U arbor-ai. For the latest DSPy features, install from source: uv pip install -U git+https://github.com/stanfordnlp/dspy.git@main.
Prerequisites: Requires a multi-GPU setup for training (e.g., 3-4 GPUs recommended). CUDA and nvcc must be installed. flash-attn can be optionally installed for accelerated inference, but its installation may take over 15 minutes.
Resources: Training is resource-intensive, necessitating significant GPU compute.
Community: Join the Arbor Discord or DSPy Discord for support and discussions.

Highlighted Details

Leverages GRPO for RL-based optimization of DSPy programs.
Supports LoRA for efficient fine-tuning of large language models.
Integrates flash-attn for potential inference speedups.
Provides a clear Python API for defining tasks, reward functions, and initiating the compilation/optimization process.

Maintenance & Community

The project acknowledges contributions from Will Brown's Verifiers library and the Hugging Face TRL library. Community support is available via dedicated Discord servers for Arbor and DSPy. No specific maintainer information, sponsorship details, or roadmap links are provided in the README.

The following research papers are cited as foundational work:

@article{ziems2025multi,
  title={Multi-module GRPO: Composing policy gradients and prompt optimization for language model programs},
  author={Ziems, Noah and Soylu, Dilara and Agrawal, Lakshya A and Miller, Isaac and Lai, Liheng and Qian, Chen and Song, Kaiqiang and Jiang, Meng and Klein, Dan and Zaharia, Matei and others},
  journal={arXiv preprint arXiv:2508.04660},
  year={2025}
}
@article{agrawal2025gepa,
  title={GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning},
  author={Agrawal, Lakshya A and Tan, Shangyin and Soylu, Dilara and Ziems, Noah and Khare, Rishi and Opsahl-Ong, Krista and Singhvi, Arnav and Shandilya, Herumb and Ryan, Michael J and Jiang, Meng and others},
  journal={arXiv preprint arXiv:2507.19457},
  year={2025}
}

Licensing & Compatibility

The license type is not specified in the provided README content. Compatibility is primarily with DSPy and requires specific hardware (multi-GPU) and software (CUDA, nvcc) configurations.

Limitations & Caveats

Potential NCCL errors may require specific environment variable configurations (NCCL_P2P_DISABLE=1, NCCL_IB_DISABLE=1) for stability on certain GPU setups. The installation of optional dependencies like flash-attn can be time-consuming. Training is inherently resource-intensive due to its multi-GPU requirements.

Health Check

Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days