multiagent-coaching  by ltjed

Framework for training multi-agent systems with AI feedback

Created 1 month ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

MAPPA (Multi-Agent Systems with Per-Action Process Rewards) addresses critical challenges in training multi-agent systems end-to-end: credit assignment and sample efficiency. It enables AI coaches to score every agent action, providing dense feedback for efficient learning and accurate blame attribution. This framework is designed for researchers and engineers developing sophisticated multi-agent AI.

How It Works

The core innovation is an LLM-based AI coach that evaluates each agent's action in real-time, assigning a process reward (0-10) based on the agent's inputs, outputs, and tool feedback. This per-action reward mechanism bypasses complex counterfactual reasoning for credit assignment and provides dense signals crucial for sample efficiency in RL training. The system orchestrates sequential agent workflows and integrates secure code execution via SandboxFusion.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python 3.11 virtual environment with uv, and installing dependencies. Secure code execution requires separate setup of SandboxFusion, involving conda environments. LLM coach credentials (Vertex AI or standard Gemini API key) must be configured.

  • Prerequisites: Python 3.11+, CUDA-compatible GPUs (minimum 2x 80GB A100/H100 recommended).
  • Setup: git clone, cd multiagent-coaching, uv venv --python 3.11, source .venv/bin/activate, uv pip install -r requirements_uv.txt. SandboxFusion setup is a separate multi-step process.

Highlighted Details

  • Per-Action Coaching: Utilizes Gemini for real-time action evaluation and process rewards.
  • Multi-Agent Orchestration: Supports sequential agent workflows where each agent builds upon previous outputs.
  • Code Execution: Integrates secure Python code execution via SandboxFusion.
  • Distributed RL Training: Employs REINFORCE++ with DeepSpeed and Ray for multi-GPU training.
  • Pipelines: Offers pre-configured pipelines for MathChat (math problem-solving) and DSBench (data science tasks).
  • Context Handling: Supports large input contexts (24K tokens) and generation lengths (4K-16K tokens).

Maintenance & Community

The project outlines a standard contribution process via pull requests. Support is primarily handled through GitHub issues. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the documentation.

Licensing & Compatibility

The project is licensed under the MIT License. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

Significant hardware investment is required, with a minimum of two 80GB GPUs for single-agent training and more recommended for multi-agent scenarios. The setup process, particularly for SandboxFusion, is complex and involves managing multiple environments. Reliance on external LLM APIs for coaching necessitates API key management and incurs potential usage costs. The project appears to be relatively new, with its primary citation dated 2026.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
253 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

agents by aiwaves-cn

0.1%
6k
Open-source framework for self-evolving, data-centric autonomous language agents
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.