MAST  by multi-agent-systems-failure-taxonomy

Taxonomy for multi-agent system failures

Created 7 months ago
291 stars

Top 90.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the code and data for a study on Multi-Agent Systems (MAS) failures, introducing the MAST taxonomy. It's targeted at researchers and practitioners in AI and MAS who need to understand and mitigate common failure modes in complex agent interactions. The project offers a structured approach to analyzing MAS failures, enabling more robust system design.

How It Works

The project introduces the Multi-Agent Systems Failure Taxonomy (MAST), a framework for categorizing and analyzing failures in MAS. It leverages a dataset of annotated MAS traces, including those annotated by LLM-as-a-Judge and human annotators, to systematically identify and classify failure patterns. This data-driven approach allows for a comprehensive understanding of the root causes of MAS malfunctions.

Quick Start & Requirements

  • Install required libraries: pip install huggingface_hub pandas
  • Download dataset: Use provided Python snippets to download from Hugging Face Hub (mcemri/MAD).
  • Prerequisites: Python 3.x, Hugging Face Hub access.

Highlighted Details

  • Presents the first comprehensive study and taxonomy (MAST) of MAS challenges.
  • Offers a dataset with over 1,000 annotated MAS traces.
  • Includes traces annotated by both LLM-as-a-Judge and human annotators.
  • Provides a bibtex citation for the associated paper "Why Do Multi-Agent LLM Systems Fail?".

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The README does not specify a license. The code and data are presented for research purposes, and citation is requested.

Limitations & Caveats

The repository focuses on failure analysis and does not provide tools for MAS development or simulation. The dataset annotation process, particularly LLM-as-a-Judge, may introduce biases or inaccuracies inherent to the models used.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), and
4 more.

ag2 by ag2ai

1.1%
4k
AgentOS for building AI agents and facilitating multi-agent cooperation
Created 11 months ago
Updated 16 hours ago
Feedback? Help us improve.