Framework for chain-of-thought reasoning data and tools
Top 38.4% on sourcepulse
ThoughtSource provides a centralized, open resource for chain-of-thought (CoT) reasoning data and tools, aiming to foster trustworthy AI for scientific research and medical practice. It targets researchers and developers working with large language models (LLMs) to improve their reasoning capabilities.
How It Works
The framework standardizes CoT data using the Hugging Face 🤗 Datasets format, enabling access to diverse datasets like CommonsenseQA, StrategyQA, QED, WorldTree, and medical/math-specific QA sets. It supports both human-generated and AI-generated reasoning chains, offering post-processing for coherence. The library includes modules for data loading, CoT generation using various LLMs (OpenAI, Hugging Face Hub), and performance evaluation.
Quick Start & Requirements
pip install -e ./libs/cot[api]
after cloning the repository and setting up a Python virtual environment.Highlighted Details
ThoughtSource_33
for efficient evaluation.Maintenance & Community
The project is developed by the Samwald research group. Updates are tracked in the versioning section. Community contributions and dataset suggestions are welcomed.
Licensing & Compatibility
Licenses vary by dataset, including MIT, CC BY-SA 3.0, Apache 2.0, CC BY 4.0, AI2 Mercury, and CC BY-NC 4.0. Some AI-generated data licenses are listed as "Unknown." Compatibility for commercial use depends on the specific dataset licenses.
Limitations & Caveats
Some AI-generated reasoning chains have unknown licenses. The project is actively developed, with ongoing efforts to improve dataset quality and expand coverage.
7 months ago
1 day