AutoDidact  by dCaples

Research paper on autonomous LLM training via self-verification

created 4 months ago
649 stars

Top 52.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a framework for autonomously training research-capable Large Language Models (LLMs) on custom datasets. It targets researchers and developers looking to enhance LLM reasoning and information retrieval through self-supervised learning and reinforcement. The primary benefit is enabling LLMs to improve their own research and answer-generation abilities locally.

How It Works

AutoDidact leverages a self-verification loop where a small LLM (Llama-8B) autonomously generates question-answer pairs from provided documents. It then uses Group Relative Policy Optimization (GRPO) to train the LLM to effectively search a document corpus and answer these self-generated questions. The model also learns to verify its own answers, creating a continuous self-improvement cycle. This approach is advantageous as it automates the data generation and training process, allowing for efficient, localized LLM agent development.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Generate data and embeddings: python generate_data.py
  • Run training: autodidact.ipynb
  • Prerequisites: Single RTX 4090 GPU recommended.

Highlighted Details

  • Achieved over 2x accuracy improvement (23% to 59%) in 1 hour on a single RTX 4090.
  • Demonstrates adaptive search trajectory learning, improving tool usage and query refinement.
  • Fully autonomous pipeline: question generation, research, verification, embedding, and RL run locally.
  • Supports function calling and agentic loops, built on Unsloth's Efficient GRPO.

Maintenance & Community

No specific community links or maintenance details are provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research exploration, and the demonstrated results are based on a specific dataset (Apollo 13 mission report) and hardware configuration. The effectiveness on diverse or larger datasets may vary.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
40 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Daniel Han Daniel Han(Cofounder of Unsloth), and
1 more.

synthetic-data-kit by meta-llama

1.6%
1k
Synthetic data CLI tool for LLM fine-tuning
created 4 months ago
updated 1 week ago
Starred by Jason Liu Jason Liu(Author of Instructor) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

Search-R1 by PeterGriffinJin

1.3%
3k
RL framework for training LLMs to use search engines
created 5 months ago
updated 3 weeks ago
Feedback? Help us improve.