open-thoughts  by open-thoughts

Open dataset for training reasoning models

created 6 months ago
2,020 stars

Top 22.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides fully open data curation for reasoning models, targeting researchers and developers aiming to train state-of-the-art small reasoning models. It offers curated datasets and models that claim to surpass existing benchmarks in math and code reasoning.

How It Works

The project focuses on generating and curating high-quality reasoning datasets, such as OpenThoughts2-1M and OpenThoughts-114k. These datasets are created through systematic ablation studies on various question generation methodologies, sampling from the highest-performing approaches. This data-centric approach aims to improve the reasoning capabilities of language models.

Quick Start & Requirements

Highlighted Details

  • OpenThinker2-32B model achieves state-of-the-art performance on reasoning benchmarks like AIME, AMC, and MATH500.
  • OpenThoughts2-1M dataset is the #1 trending dataset on Hugging Face.
  • Models are available on Ollama for easy local inference.
  • Full open-source commitment: model weights, datasets, data generation, evaluation, and training code are public.

Maintenance & Community

The project is a collaboration led by Bespoke Labs and the DataComp community, with contributions from researchers at multiple universities and institutions. It has a growing community with over 190 public models on Hugging Face trained using its datasets.

Licensing & Compatibility

The repository states it is "fully open-source" and lists model weights, datasets, and code as publicly available. Specific license details are not explicitly stated in the README, but the emphasis on open access suggests permissive licensing suitable for commercial use and closed-source linking.

Limitations & Caveats

Training and evaluation code are listed as "coming soon," indicating that the full pipeline for reproducing or extending the results may not yet be available.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
3
Star History
292 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.