HuatuoGPT-o1  by FreedomIntelligence

Medical LLM for advanced reasoning

created 7 months ago
1,173 stars

Top 33.8% on sourcepulse

GitHubView on GitHub
Project Summary

HuatuoGPT-o1 is a suite of large language models specifically designed for complex medical reasoning, targeting medical professionals and researchers. It aims to improve LLM accuracy in medical contexts by enabling models to identify errors, explore alternative diagnostic or treatment strategies, and refine their responses through a structured reasoning process.

How It Works

HuatuoGPT-o1 employs a two-stage training process. Stage 1 involves supervised fine-tuning (SFT) on a dataset of verifiable medical problems and complex chains of thought, generated using GPT-4o. Stage 2 utilizes reinforcement learning (RL) with Proximal Policy Optimization (PPO), where a specialized medical verifier model provides rewards to further enhance the LLM's reasoning capabilities. This approach allows the model to learn a "thinks-before-it-answers" methodology, outputting its reasoning process before the final response.

Quick Start & Requirements

  • Install/Run: Inference can be performed using Hugging Face Transformers. Example provided for FreedomIntelligence/HuatuoGPT-o1-8B.
  • Prerequisites: Python, transformers, torch, accelerate, deepspeed, trl, sglang (for evaluation). Models are based on Llama-3.1 and Qwen2.5 architectures.
  • Resources: Training requires an 8-GPU setup (e.g., A100s). Inference can be done with device_map="auto".
  • Links: Paper, Models, Data

Highlighted Details

  • Offers models in 7B, 8B, 70B, and 72B parameter sizes.
  • Supports both English and Chinese languages for Qwen-based models.
  • Utilizes a "thinks-before-it-answers" output format for transparency.
  • Provides scripts for data construction and training stages.

Maintenance & Community

The project is associated with FreedomIntelligence. Further community engagement details (e.g., Discord/Slack) are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Models are based on Llama-3.1 and Qwen2.5, which have their own respective licenses that may impose restrictions on commercial use or redistribution.

Limitations & Caveats

The project is presented as research-oriented. The effectiveness of the "medical verifier" and the robustness of the RL training for real-world medical applications require further validation. The data construction scripts require API keys for GPT-4o, implying potential costs and dependency on OpenAI services.

Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
97 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.