ToolOrchestra by NVlabs

RL framework for training efficient agentic tool orchestrators

Created 3 months ago

659 stars

Top 50.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Yaowei Zheng

Author of LLaMA-Factory

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

ToolOrchestra provides an end-to-end RL training framework for orchestrating intelligent tools and specialized models, enabling efficient agentic workflows. It targets researchers and engineers building complex, multi-turn AI agents, offering a method to train small, highly capable orchestrator models that surpass larger, generalist LLMs in performance and efficiency. The framework allows agents to coordinate diverse tools and models, leading to state-of-the-art results on challenging benchmarks with significantly reduced computational cost.

How It Works

ToolOrchestra employs end-to-end reinforcement learning to train small orchestrator models (e.g., Orchestrator-8B) that dynamically coordinate tool usage and reasoning. The core approach involves an orchestrator agent alternating between planning and executing tool calls, interacting with a diverse set of resources including basic utilities (search, code interpreter), specialized LLMs (coding, math), and generalist LLMs. Optimization occurs via outcome, efficiency, and preference rewards, supported by a scalable pipeline for synthesizing training tasks. This method achieves superior performance and efficiency by leveraging specialized components rather than relying solely on monolithic models.

Quick Start & Requirements

Primary Install: Clone the repository, navigate to the toolorchestra directory, and set up Conda environments (toolorchestra, retriever, vllm1) with Python 3.12. Install dependencies using pip install -r requirements.txt and specific packages per environment.
Prerequisites: Python 3.12, Conda, PyTorch 2.4.0 with CUDA 12.1 (for retriever), FAISS-GPU (for retriever), Tavily API key, Hugging Face datasets/checkpoints. Environment variables (INDEX_DIR, CHECKPOINT_PATH, HF_HOME, REPO_PATH, CKPT_DIR) must be set.
Resource Footprint: Setup involves multiple Conda environments and downloading large checkpoints. Training commands (resume_h100.py) suggest high-end GPU requirements (e.g., H100).
Links: Repository: https://gitlab-master.nvidia.com/dler/toolorchestra. Index Data: https://huggingface.co/datasets/multi-train/index. Checkpoints: https://huggingface.co/multi-train/ToolOrchestrator. Tavily API: https://app.tavily.com/home.

Highlighted Details

Orchestrator-8B achieves 37.1% on HLE, outperforming GPT-5 (35.1%) while being 2.5x more efficient.
Significantly surpasses GPT-5 on τ2-Bench and FRAMES benchmarks at approximately 30% of the cost.
Utilizes end-to-end RL with outcome, efficiency, and preference rewards for joint optimization.
Features an automatic pipeline for large-scale synthesis of environment and tool-call tasks to aid RL training.

Maintenance & Community

The project lists authors from NVIDIA and The University of Hong Kong. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The code is licensed under the Apache 2.0 license. This license is permissive and generally compatible with commercial use and linking within closed-source projects.

Limitations & Caveats

The setup process is complex, requiring the creation and management of multiple distinct Conda environments with specific library versions (e.g., PyTorch, CUDA). Extensive use of environment variables for paths and API keys adds to the configuration overhead. Evaluation procedures, particularly for HLE, may require running components in separate processes, indicating potential complexities or dependencies that need careful management.

Health Check

Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

54 stars in the last 30 days