AutoDidact by dCaples

Research paper on autonomous LLM training via self-verification

Created 10 months ago

680 stars

Top 49.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

This project provides a framework for autonomously training research-capable Large Language Models (LLMs) on custom datasets. It targets researchers and developers looking to enhance LLM reasoning and information retrieval through self-supervised learning and reinforcement. The primary benefit is enabling LLMs to improve their own research and answer-generation abilities locally.

How It Works

AutoDidact leverages a self-verification loop where a small LLM (Llama-8B) autonomously generates question-answer pairs from provided documents. It then uses Group Relative Policy Optimization (GRPO) to train the LLM to effectively search a document corpus and answer these self-generated questions. The model also learns to verify its own answers, creating a continuous self-improvement cycle. This approach is advantageous as it automates the data generation and training process, allowing for efficient, localized LLM agent development.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Generate data and embeddings: python generate_data.py
Run training: autodidact.ipynb
Prerequisites: Single RTX 4090 GPU recommended.

Highlighted Details

Achieved over 2x accuracy improvement (23% to 59%) in 1 hour on a single RTX 4090.
Demonstrates adaptive search trajectory learning, improving tool usage and query refinement.
Fully autonomous pipeline: question generation, research, verification, embedding, and RL run locally.
Supports function calling and agentic loops, built on Unsloth's Efficient GRPO.

Maintenance & Community

No specific community links or maintenance details are provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research exploration, and the demonstrated results are based on a specific dataset (Apollo 13 mission report) and hardware configuration. The effectiveness on diverse or larger datasets may vary.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days