multiagent-coaching by ltjed

Framework for training multi-agent systems with AI feedback

Created 4 months ago

395 stars

Top 72.8% on SourcePulse

Project Summary

Summary

MAPPA (Multi-Agent Systems with Per-Action Process Rewards) addresses critical challenges in training multi-agent systems end-to-end: credit assignment and sample efficiency. It enables AI coaches to score every agent action, providing dense feedback for efficient learning and accurate blame attribution. This framework is designed for researchers and engineers developing sophisticated multi-agent AI.

How It Works

The core innovation is an LLM-based AI coach that evaluates each agent's action in real-time, assigning a process reward (0-10) based on the agent's inputs, outputs, and tool feedback. This per-action reward mechanism bypasses complex counterfactual reasoning for credit assignment and provides dense signals crucial for sample efficiency in RL training. The system orchestrates sequential agent workflows and integrates secure code execution via SandboxFusion.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python 3.11 virtual environment with uv, and installing dependencies. Secure code execution requires separate setup of SandboxFusion, involving conda environments. LLM coach credentials (Vertex AI or standard Gemini API key) must be configured.

Prerequisites: Python 3.11+, CUDA-compatible GPUs (minimum 2x 80GB A100/H100 recommended).
Setup: git clone, cd multiagent-coaching, uv venv --python 3.11, source .venv/bin/activate, uv pip install -r requirements_uv.txt. SandboxFusion setup is a separate multi-step process.

Highlighted Details

Per-Action Coaching: Utilizes Gemini for real-time action evaluation and process rewards.
Multi-Agent Orchestration: Supports sequential agent workflows where each agent builds upon previous outputs.
Code Execution: Integrates secure Python code execution via SandboxFusion.
Distributed RL Training: Employs REINFORCE++ with DeepSpeed and Ray for multi-GPU training.
Pipelines: Offers pre-configured pipelines for MathChat (math problem-solving) and DSBench (data science tasks).
Context Handling: Supports large input contexts (24K tokens) and generation lengths (4K-16K tokens).

Maintenance & Community

The project outlines a standard contribution process via pull requests. Support is primarily handled through GitHub issues. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the documentation.

Licensing & Compatibility

The project is licensed under the MIT License. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

Significant hardware investment is required, with a minimum of two 80GB GPUs for single-agent training and more recommended for multi-agent scenarios. The setup process, particularly for SandboxFusion, is complex and involves managing multiple environments. Reliance on external LLM APIs for coaching necessitates API key management and incurs potential usage costs. The project appears to be relatively new, with its primary citation dated 2026.

multiagent-coaching by ltjed

Explore Similar Projects

uni-agent by verl-project

PostTrainBench by aisa-group

Open-AgentRL by Gen-Verse

l0 by cmriat

LlamaGym by KhoomeiK

langchain-skills by langchain-ai

MiniMax-M2.5 by MiniMax-AI

DI-star by opendilab

Adala by HumanSignal

awesome-harness-engineering by ai-boost

agents-from-scratch by pguso

agent-lightning by microsoft