MDAgents by mitmedialab

Adaptive LLM collaboration for medical decision-making

Created 2 years ago

285 stars

Top 91.7% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> MDAgents is a framework for adaptive LLM collaboration in medical decision-making, addressing the challenge of effectively deploying foundation models in complex healthcare tasks. It automatically assigns tailored collaboration structures to teams of LLMs, mirroring real-world adaptive processes. This approach significantly enhances performance on medical knowledge and diagnostic benchmarks, offering a more efficient and accurate solution for researchers and practitioners.

How It Works

<2-4 sentences on core approach / design (key algorithms, models, data flow, or architectural choices) and why this approach is advantageous or novel.> The core innovation lies in MDAgents' ability to dynamically determine medical task complexity and assign an optimal collaboration structure—either solo or group—to a team of LLMs. This adaptive assignment optimizes for both accuracy and computational efficiency by selecting the most appropriate agent configuration. The framework leverages state-of-the-art LLMs and has been rigorously evaluated across a suite of challenging medical benchmarks, demonstrating superior performance through its adaptive multi-agent approach.

Quick Start & Requirements

Primary install / run command (pip, Docker, binary, etc.).
Non-default prerequisites and dependencies (GPU, CUDA >= 12, Python 3.12, large dataset, API keys, OS, hardware, etc.).
Estimated setup time or resource footprint.
If they are present, include links to official quick-start, docs, demo, or other relevant pages.

Setup involves creating a Python >= 3.9 virtual environment (e.g., Conda), installing dependencies via pip install -r requirements.txt, and setting environment variables for OpenAI and GenAI API keys. Data files (JSON format) should be placed in the ./data directory. The system supports models like GPT-3.5, GPT-4, GPT-4V, GPT-4o, Gemini-Pro, and Gemini-Pro-Vision, and is tested against ten medical datasets including MedQA, PubMedQA, and MIMIC-CXR. Inference is initiated via python3 main.py --model {model_name} --dataset {dataset_name}. Links to the NeurIPS'24 paper and project page are available.

Highlighted Details

Achieved top performance in seven out of ten medical benchmarks, particularly those requiring deep medical knowledge and complex multi-modal reasoning.
Demonstrated significant performance improvements, up to 11.8%, over prior multi-agent settings (p < 0.05), indicating a substantial leap in collaborative LLM capabilities for medicine.
Ablation studies validate the effectiveness of the complexity-aware adaptive mechanism for optimizing LLM team efficiency and accuracy across diverse medical tasks.
The research explores the emergent dynamics of group consensus within collaborative agent teams, offering valuable insights into clinical team behavior.

Maintenance & Community

Direct contact is available via Yubin Kim (ybkim95@mit.edu). The README does not specify community channels (e.g., Discord, Slack) or a public roadmap.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. This absence requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

<1-3 sentences on caveats: unsupported platforms, missing features, alpha status, known bugs, breaking changes, bus factor, deprecation, etc. Avoid vague non-statements and judgments.> The README does not detail specific limitations, platform support, or known bugs. The primary caveat is the unstated licensing, which poses a significant adoption risk without clarification. The framework's applicability is focused on medical decision-making tasks.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days