Agents_Failure_Attribution by ag2ai

Automated failure attribution for LLM multi-agent systems

Created 8 months ago

337 stars

Top 81.8% on SourcePulse

Project Summary

This repository provides an implementation for automated failure attribution in LLM-based multi-agent systems, addressing the challenge of identifying which agent and at which step a task failed. It is targeted at researchers and developers working with complex agentic systems, offering a benchmark and dataset to reduce manual debugging and accelerate development cycles.

How It Works

The project introduces automated failure attribution methods to pinpoint the root cause of failures in multi-agent systems. It supports various judging strategies, including "all-at-once," "step-by-step," and "binary search," to analyze task execution logs and identify the responsible agent and error step. This approach aims to provide fine-grained insights for debugging and agent self-improvement.

Quick Start & Requirements

Install requirements: pip install -r requirements.txt
Supported models include GPT-4o, GPT-4, GPT-4o-mini, Llama-3.1 variants, and Qwen2.5 variants.
Inference command: python inference.py --method <METHOD> --model <MODEL> --is_handcrafted <DATA> --directory_path <PATH>
Evaluation command: python evaluate.py --data_path <DATA_PATH> --eval_file <EVAL_FILE>
Dataset available on Hugging Face.

Highlighted Details

ICML 2025 Spotlight paper (Top 2.6% acceptance rate).
Features a benchmark dataset of 184 annotated failure tasks from algorithm-generated and hand-crafted agentic systems.
Annotations include the failing agent, decisive error step, and natural language explanations.
Covers diverse multi-agent scenarios from GAIA and AssistantBench.

Maintenance & Community

The project is associated with the ICML 2025 paper "Which Agent Causes Task Failures and When?".
A GitHub star count is requested to motivate further improvements.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The project is presented as a research artifact for an upcoming conference (ICML 2025), implying potential for ongoing development and changes.
No specific hardware requirements (e.g., GPU, CUDA) are mentioned, but model support suggests significant computational resources may be needed for inference.

Health Check

Last Commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days