Raspberry  by daveshap

Open-source dataset for finetuning LLMs with reasoning

created 10 months ago
417 stars

Top 71.3% on sourcepulse

GitHubView on GitHub
Project Summary

Raspberry aims to create an open-source toy dataset for finetuning Large Language Models (LLMs) with enhanced reasoning abilities. Targeting researchers and developers focused on improving LLM reasoning, it offers a structured approach to generating complex queries and corresponding Chain-of-Thought (CoT) and self-critique data.

How It Works

The project synthesizes 500 distinct, complex user queries across various domains requiring math, coding, logic, and planning skills. These queries are then used to generate CoT and self-critique data via automated prompting strategies, leveraging LLMs' inherent reasoning capabilities. The generated samples undergo cleaning and rectification using rubrics and grading techniques to ensure coherence and suitability for single-shot reasoning datasets.

Quick Start & Requirements

  • Install: No specific installation commands are provided in the README. The project appears to be dataset generation focused.
  • Prerequisites: Access to LLMs capable of CoT reasoning (e.g., Claude) is implied for data synthesis.
  • Resources: Data synthesis and cleaning will require computational resources for running LLMs and processing text.

Highlighted Details

  • Focus on synthesizing complex user queries across diverse domains.
  • Generation of Chain-of-Thought (CoT) and self-critique data for LLM finetuning.
  • Goal to demonstrate near-State-of-the-Art (SOTA) performance on reasoning benchmarks.
  • Potential to release an open-source RL-trained model.

Maintenance & Community

The project is initiated by daveshap. Further community engagement and scaling plans are mentioned, including seeking funding via Manifund.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is described as a "toy dataset" and a "pilot" for proof of concept. Achieving near-SOTA performance is an ambitious goal for a small, toy dataset. The initial dataset size is 500 queries, which may be insufficient for robust finetuning across all targeted reasoning abilities.

Health Check
Last commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.