HuatuoGPT-o1 by FreedomIntelligence

Medical LLM for advanced reasoning

Created 1 year ago

1,279 stars

Top 30.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

HuatuoGPT-o1 is a suite of large language models specifically designed for complex medical reasoning, targeting medical professionals and researchers. It aims to improve LLM accuracy in medical contexts by enabling models to identify errors, explore alternative diagnostic or treatment strategies, and refine their responses through a structured reasoning process.

How It Works

HuatuoGPT-o1 employs a two-stage training process. Stage 1 involves supervised fine-tuning (SFT) on a dataset of verifiable medical problems and complex chains of thought, generated using GPT-4o. Stage 2 utilizes reinforcement learning (RL) with Proximal Policy Optimization (PPO), where a specialized medical verifier model provides rewards to further enhance the LLM's reasoning capabilities. This approach allows the model to learn a "thinks-before-it-answers" methodology, outputting its reasoning process before the final response.

Quick Start & Requirements

Install/Run: Inference can be performed using Hugging Face Transformers. Example provided for FreedomIntelligence/HuatuoGPT-o1-8B.
Prerequisites: Python, transformers, torch, accelerate, deepspeed, trl, sglang (for evaluation). Models are based on Llama-3.1 and Qwen2.5 architectures.
Resources: Training requires an 8-GPU setup (e.g., A100s). Inference can be done with device_map="auto".
Links: Paper, Models, Data

Highlighted Details

Offers models in 7B, 8B, 70B, and 72B parameter sizes.
Supports both English and Chinese languages for Qwen-based models.
Utilizes a "thinks-before-it-answers" output format for transparency.
Provides scripts for data construction and training stages.

Maintenance & Community

The project is associated with FreedomIntelligence. Further community engagement details (e.g., Discord/Slack) are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Models are based on Llama-3.1 and Qwen2.5, which have their own respective licenses that may impose restrictions on commercial use or redistribution.

Limitations & Caveats

The project is presented as research-oriented. The effectiveness of the "medical verifier" and the robustness of the RL training for real-world medical applications require further validation. The data construction scripts require API keys for GPT-4o, implying potential costs and dependency on OpenAI services.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days