MedRAG toolkit for medical RAG research
Top 73.8% on sourcepulse
MedRAG is a comprehensive toolkit for building and evaluating Retrieval-Augmented Generation (RAG) systems for medical question answering. It provides a modular framework for researchers and practitioners to experiment with various corpora, retrieval methods, and large language models (LLMs) to improve accuracy and reduce hallucinations in medical AI applications.
How It Works
MedRAG structures RAG systems into three core components: Corpora, Retrievers, and LLMs. It supports diverse medical and general knowledge sources, including PubMed, StatPearls, medical textbooks, and Wikipedia, each chunked into snippets. Retrieval is handled by a selection of lexical (BM25) and semantic models (Contriever, SPECTER, MedCPT), with options for accelerated indexing using HNSW. The toolkit integrates with a wide range of LLMs, from commercial APIs like GPT-4 to open-source models like Llama 3.1 and domain-specific models like MEDITRON.
Quick Start & Requirements
pip install torch --index-url https://download.pytorch.org/whl/cu121
).pip install -r requirements.txt
.Highlighted Details
corpus_cache=True
) and HNSW indexing (HNSW=True
) for accelerated retrieval.Maintenance & Community
The project is actively maintained, with recent updates including support for Llama 3.1/3.2, OpenAI API v1.0.0+, and Gemini models. The primary contributors are listed as Teddy-XiongGZ, Qiao Jin, Zhiyong Lu, and Aidong Zhang.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility is confirmed for various LLMs including GPT-4, GPT-3.5, Gemini-1.0-pro, Llama 3.1/3.2, Mixtral, MEDITRON, and PMC-LLaMA.
Limitations & Caveats
Embeddings for the StatPearls corpus are not provided due to frequent updates by the source. The license is not specified, which may impact commercial use or closed-source integration.
2 months ago
1 day