Discover and explore top open-source AI tools and projects—updated daily.
FreedomIntelligenceLarge Chinese medical QA dataset with 26M question-answer pairs
Top 89.0% on SourcePulse
Huatuo-26M is a massive, 26 million-pair Chinese medical question-and-answer dataset designed for AI research in healthcare. It enables the development of advanced NLP applications, machine learning models for medical tasks, and intelligent medical systems, catering to researchers and developers in the medical AI domain.
How It Works
The dataset aggregates Q&A pairs from diverse sources, including online medical encyclopedias, knowledge bases, and consultation records. A refined version, Huatuo-Lite, offers enhanced data quality and additional fields like hospital departments and related diseases. This multi-source approach ensures broad coverage of medical topics, from diseases and symptoms to treatments and drug information.
Quick Start & Requirements
datasets: datasets.load_dataset('FreedomIntelligence/huatuo_knowledge_graph_qa') (and similar for other sub-datasets).datasets library.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The huatuo_consultation_qa dataset primarily contains URLs as answers, requiring further processing to extract actionable information. The dataset is primarily focused on Chinese medical information.
1 year ago
Inactive