Discover and explore top open-source AI tools and projects—updated daily.
ltjedFramework for training multi-agent systems with AI feedback
Top 99.6% on SourcePulse
Summary
MAPPA (Multi-Agent Systems with Per-Action Process Rewards) addresses critical challenges in training multi-agent systems end-to-end: credit assignment and sample efficiency. It enables AI coaches to score every agent action, providing dense feedback for efficient learning and accurate blame attribution. This framework is designed for researchers and engineers developing sophisticated multi-agent AI.
How It Works
The core innovation is an LLM-based AI coach that evaluates each agent's action in real-time, assigning a process reward (0-10) based on the agent's inputs, outputs, and tool feedback. This per-action reward mechanism bypasses complex counterfactual reasoning for credit assignment and provides dense signals crucial for sample efficiency in RL training. The system orchestrates sequential agent workflows and integrates secure code execution via SandboxFusion.
Quick Start & Requirements
Installation involves cloning the repository, setting up a Python 3.11 virtual environment with uv, and installing dependencies. Secure code execution requires separate setup of SandboxFusion, involving conda environments. LLM coach credentials (Vertex AI or standard Gemini API key) must be configured.
git clone, cd multiagent-coaching, uv venv --python 3.11, source .venv/bin/activate, uv pip install -r requirements_uv.txt. SandboxFusion setup is a separate multi-step process.Highlighted Details
Maintenance & Community
The project outlines a standard contribution process via pull requests. Support is primarily handled through GitHub issues. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the documentation.
Licensing & Compatibility
The project is licensed under the MIT License. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.
Limitations & Caveats
Significant hardware investment is required, with a minimum of two 80GB GPUs for single-agent training and more recommended for multi-agent scenarios. The setup process, particularly for SandboxFusion, is complex and involves managing multiple environments. Reliance on external LLM APIs for coaching necessitates API key management and incurs potential usage costs. The project appears to be relatively new, with its primary citation dated 2026.
3 weeks ago
Inactive
KhoomeiK
aiwaves-cn
microsoft