Multi-Agents-Debate  by Skytliang

Framework for LLM multi-agent debate

created 2 years ago
415 stars

Top 71.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository introduces the Multi-Agent Debate (MAD) framework, designed to enhance the reasoning and problem-solving capabilities of Large Language Models (LLMs). It addresses the "Degeneration of Thoughts" (DoT) issue observed in single-agent self-reflection by leveraging adversarial debate between two LLM agents to correct biases and improve accuracy.

How It Works

MAD employs a "devil" (affirmative) and "angel" (negative) agent dynamic. The devil proposes an initial answer or reasoning, and the angel critiques it, identifying errors or biases. This iterative "tit-for-tat" exchange allows agents to correct each other's distorted perceptions, overcome rigidity, and provide external feedback, leading to more robust and accurate outcomes than solitary reflection.

Quick Start & Requirements

  • Install dependencies: pip3 install -r requirements.txt
  • Set OpenAI API key in debate4tran.sh and interactive.py.
  • Run debate: sh debate4tran.sh
  • Run interactive demo: python3 interactive.py
  • Demo available at: [link to demo if provided, otherwise omit]

Highlighted Details

  • Demonstrates significant improvements on Counterintuitive QA and Commonsense Machine Translation tasks.
  • Provides detailed examples of debate processes for both QA and translation scenarios.
  • Framework is designed to mitigate common LLM reasoning pitfalls like bias and rigidity.

Maintenance & Community

The project is associated with authors from multiple institutions, indicating potential academic backing. Further community engagement channels (e.g., Discord, Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Users should verify licensing terms for commercial or closed-source integration.

Limitations & Caveats

The framework relies heavily on OpenAI's API, making it dependent on their service availability and pricing. The effectiveness may vary based on the specific LLM used and the complexity of the task.

Health Check
Last commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
47 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Calvin French-Owen Calvin French-Owen(Coounder of Segment), and
2 more.

ReAct by ysymyth

0.7%
3k
GPT-3 prompting code for ReAct research paper
created 2 years ago
updated 1 year ago
Feedback? Help us improve.