Discover and explore top open-source AI tools and projects—updated daily.
chatoperaChinese psychological counseling Q&A corpus for AI
Top 46.3% on SourcePulse
This repository provides the Emotional First Aid Dataset (efaqa-corpus-zh), a large-scale, annotated Chinese corpus for psychological counseling Q&A. It addresses the need for AI-driven mental health support and LLM fine-tuning by offering rich, multi-turn dialogue data. The dataset is primarily for researchers and developers in the AI and psychology domains.
How It Works
The dataset comprises 20,000 manually annotated multi-turn dialogue entries from psychological counseling sessions. Each entry is meticulously labeled across three severity dimensions: distress type (s1), psychological disorder (s2), and emergency level (s3). A raw, larger dataset is also available for unsupervised LLM training. The annotation process involved significant time and effort, averaging over one minute per entry, to ensure detailed conversational context and classification.
Quick Start & Requirements
pip install -U efaqa-corpus-zhEFAQA_DL_LICENSE environment variable with your certificate identifier.import efaqa_corpus_zh and records = list(efaqa_corpus_zh.load()).Highlighted Details
Maintenance & Community
The project is a collaboration involving academic institutions and Chatopera Inc. Support and issue reporting are handled via GitHub issues: https://github.com/chatopera/docs/issues. Volunteer contributors from multiple countries participated in data annotation.
Licensing & Compatibility
The dataset is distributed under the "春松许可证,v1.0" (ChunSong License, v1.0). Crucially, the data is strictly for research purposes only. Commercial use is prohibited and will be pursued legally.
Limitations & Caveats
The corpus is subjectively annotated and cannot be guaranteed 100% accurate; the team disclaims liability for consequences arising from data content. Extremely complex psychological disorders are not covered due to annotation difficulty. A significant adoption barrier is the requirement to purchase a license to download and use the data, and its strict non-commercial use restriction makes it incompatible with commercial applications.
1 month ago
Inactive
zhenbench
facebookresearch