MMedLM by MAGIC-AI4Med

Multilingual language model for medicine (research paper & models)

Created 1 year ago

272 stars

Top 94.7% on SourcePulse

Project Summary

This repository provides the official code and models for "Towards Building Multilingual Language Model for Medicine," a project focused on creating open-source, multilingual LLMs for the medical domain. It offers a large multilingual medical corpus (MMedC), a medical question-answering benchmark (MMedBench), and several pre-trained and fine-tuned models, including MMed-Llama3.1-70B, which rivals GPT-4 performance across multiple languages.

How It Works

The project's approach involves constructing a substantial multilingual medical corpus (MMedC) of 25.5 billion tokens across six languages for auto-regressive pre-training of general LLMs. It also introduces MMedBench, a multilingual medical multiple-choice QA benchmark with rationales, to evaluate and monitor model progress. The models are then further trained or fine-tuned on these resources, demonstrating significant performance gains over existing open-source medical LLMs and competitive results against proprietary models like GPT-4.

Quick Start & Requirements

Installation: Code is provided in folders for pre-training, fine-tuning, and inference. Specific dependencies include PyTorch 1.13 and Transformers 4.37. For LoRA fine-tuning, the PEFT library is required.
Hardware: Auto-regressive training on MMedC requires at least 8 A100 80GB GPUs and extended training periods (over a month). Inference and fine-tuning can be adapted for single machines by removing Slurm commands.
Resources: The project offers models of various sizes (1.8B, 7B, 8B, 70B parameters).
Links: Paper (Arxiv): https://arxiv.org/abs/2402.13963, Leaderboard: https://github.com/MAGIC-AI4Med/MMedLM/blob/main/leaderboard.md

Highlighted Details

MMed-Llama3.1-70B achieves 80.51 on MMedBench, outperforming GPT-4 (74.27) and supporting 8 languages.
MMedLM 2 (7B) rivals GPT-4 on MMedBench.
MMed-Llama 3 (8B) shows superior performance on English benchmarks like MedQA (65.4) and MMedBench (79.25) compared to Llama 3 (60.9 and 63.86 respectively).
The project releases the data collection pipeline, including filtering and OCR code.

Maintenance & Community

The project is associated with Nature Communications and has active releases, including recent models like MMed-Llama3.1-70B.
Contact: qiupengcheng@pjlab.org.cn.

Licensing & Compatibility

The repository is released under the Apache 2.0 license.
Compatibility for commercial use is generally permissive due to the Apache 2.0 license.

Limitations & Caveats

Full auto-regressive training on the MMedC corpus is computationally intensive, requiring significant GPU resources and time.
While open-source models are fine-tuned on the MMedBench trainset before evaluation, proprietary models like GPT-3.5/4 and Gemini are evaluated zero-shot via API.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

LEval by OpenLMLab

Benchmark for long-context language model evaluation

Created 2 years ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Ying Sheng

Ying Sheng(Coauthor of SGLang), and

1 more.

InternLM-techreport by InternLM

Multilingual LLM research paper with 104B parameters

Created 2 years ago

Updated 2 years ago

Portuguese-NLP by ajdavidl

NLP resources and tools focused on Portuguese

Created 3 years ago

Updated 4 months ago

open-korean-instructions by HeegyuKim

Korean instruction datasets for language model training

Created 2 years ago

Updated 6 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

awesome-japanese-llm by llm-jp

Japanese LLM list: models, benchmarks, datasets

Created 2 years ago

Updated 1 week ago

Starred by

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI) and

Philipp Schmid

Philipp Schmid(DevRel at Google DeepMind).

xmtf by bigscience-workshop

Code and data for crosslingual multitask finetuning research

Created 3 years ago

Updated 1 year ago

indicnlp_catalog by AI4Bharat

NLP resource catalog for Indic languages

Created 6 years ago

Updated 10 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Thomas Wolf

Thomas Wolf(Cofounder of Hugging Face).

biobert-pretrained by naver

Pre-trained weights for biomedical text mining

Created 6 years ago

Updated 5 years ago

IndicTrans2 by AI4Bharat

Multilingual NMT model for 22 Indian languages

Created 2 years ago

Updated 1 month ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

CBLUE by CBLUEbenchmark

Benchmark for Chinese biomedical language understanding

Created 4 years ago

Updated 2 years ago

Skywork by SkyworkAI

LLM for multilingual tasks, creative writing, math, and multimodal applications

Created 2 years ago

Updated 8 months ago

AiNiee by NEKOparapa

AI translation tool for games, novels, subtitles, documents

Created 2 years ago

Updated 4 days ago

Feedback? Help us improve.