MDAgents  by mitmedialab

Adaptive LLM collaboration for medical decision-making

Created 2 years ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> MDAgents is a framework for adaptive LLM collaboration in medical decision-making, addressing the challenge of effectively deploying foundation models in complex healthcare tasks. It automatically assigns tailored collaboration structures to teams of LLMs, mirroring real-world adaptive processes. This approach significantly enhances performance on medical knowledge and diagnostic benchmarks, offering a more efficient and accurate solution for researchers and practitioners.

How It Works

<2-4 sentences on core approach / design (key algorithms, models, data flow, or architectural choices) and why this approach is advantageous or novel.> The core innovation lies in MDAgents' ability to dynamically determine medical task complexity and assign an optimal collaboration structure—either solo or group—to a team of LLMs. This adaptive assignment optimizes for both accuracy and computational efficiency by selecting the most appropriate agent configuration. The framework leverages state-of-the-art LLMs and has been rigorously evaluated across a suite of challenging medical benchmarks, demonstrating superior performance through its adaptive multi-agent approach.

Quick Start & Requirements

  • Primary install / run command (pip, Docker, binary, etc.).
  • Non-default prerequisites and dependencies (GPU, CUDA >= 12, Python 3.12, large dataset, API keys, OS, hardware, etc.).
  • Estimated setup time or resource footprint.
  • If they are present, include links to official quick-start, docs, demo, or other relevant pages.

Setup involves creating a Python >= 3.9 virtual environment (e.g., Conda), installing dependencies via pip install -r requirements.txt, and setting environment variables for OpenAI and GenAI API keys. Data files (JSON format) should be placed in the ./data directory. The system supports models like GPT-3.5, GPT-4, GPT-4V, GPT-4o, Gemini-Pro, and Gemini-Pro-Vision, and is tested against ten medical datasets including MedQA, PubMedQA, and MIMIC-CXR. Inference is initiated via python3 main.py --model {model_name} --dataset {dataset_name}. Links to the NeurIPS'24 paper and project page are available.

Highlighted Details

  • Achieved top performance in seven out of ten medical benchmarks, particularly those requiring deep medical knowledge and complex multi-modal reasoning.
  • Demonstrated significant performance improvements, up to 11.8%, over prior multi-agent settings (p < 0.05), indicating a substantial leap in collaborative LLM capabilities for medicine.
  • Ablation studies validate the effectiveness of the complexity-aware adaptive mechanism for optimizing LLM team efficiency and accuracy across diverse medical tasks.
  • The research explores the emergent dynamics of group consensus within collaborative agent teams, offering valuable insights into clinical team behavior.

Maintenance & Community

  • Direct contact is available via Yubin Kim (ybkim95@mit.edu). The README does not specify community channels (e.g., Discord, Slack) or a public roadmap.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the provided README. This absence requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

<1-3 sentences on caveats: unsupported platforms, missing features, alpha status, known bugs, breaking changes, bus factor, deprecation, etc. Avoid vague non-statements and judgments.> The README does not detail specific limitations, platform support, or known bugs. The primary caveat is the unstated licensing, which poses a significant adoption risk without clarification. The framework's applicability is focused on medical decision-making tasks.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.