Agents_Failure_Attribution  by mingyin1

Automated failure attribution for LLM multi-agent systems

created 3 months ago
292 stars

Top 91.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation for automated failure attribution in LLM-based multi-agent systems, addressing the challenge of identifying which agent and at which step a task failed. It is targeted at researchers and developers working with complex agentic systems, offering a benchmark and dataset to reduce manual debugging and accelerate development cycles.

How It Works

The project introduces automated failure attribution methods to pinpoint the root cause of failures in multi-agent systems. It supports various judging strategies, including "all-at-once," "step-by-step," and "binary search," to analyze task execution logs and identify the responsible agent and error step. This approach aims to provide fine-grained insights for debugging and agent self-improvement.

Quick Start & Requirements

  • Install requirements: pip install -r requirements.txt
  • Supported models include GPT-4o, GPT-4, GPT-4o-mini, Llama-3.1 variants, and Qwen2.5 variants.
  • Inference command: python inference.py --method <METHOD> --model <MODEL> --is_handcrafted <DATA> --directory_path <PATH>
  • Evaluation command: python evaluate.py --data_path <DATA_PATH> --eval_file <EVAL_FILE>
  • Dataset available on Hugging Face.

Highlighted Details

  • ICML 2025 Spotlight paper (Top 2.6% acceptance rate).
  • Features a benchmark dataset of 184 annotated failure tasks from algorithm-generated and hand-crafted agentic systems.
  • Annotations include the failing agent, decisive error step, and natural language explanations.
  • Covers diverse multi-agent scenarios from GAIA and AssistantBench.

Maintenance & Community

  • The project is associated with the ICML 2025 paper "Which Agent Causes Task Failures and When?".
  • A GitHub star count is requested to motivate further improvements.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

  • The project is presented as a research artifact for an upcoming conference (ICML 2025), implying potential for ongoing development and changes.
  • No specific hardware requirements (e.g., GPU, CUDA) are mentioned, but model support suggests significant computational resources may be needed for inference.
Health Check
Last commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
266 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Simon Willison Simon Willison(Author of Django), and
1 more.

tau-bench by sierra-research

2.4%
709
Benchmark for tool-agent-user interaction research
created 1 year ago
updated 3 weeks ago
Feedback? Help us improve.