AutoDidact  by dCaples

Research paper on autonomous LLM training via self-verification

Created 6 months ago
667 stars

Top 50.5% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a framework for autonomously training research-capable Large Language Models (LLMs) on custom datasets. It targets researchers and developers looking to enhance LLM reasoning and information retrieval through self-supervised learning and reinforcement. The primary benefit is enabling LLMs to improve their own research and answer-generation abilities locally.

How It Works

AutoDidact leverages a self-verification loop where a small LLM (Llama-8B) autonomously generates question-answer pairs from provided documents. It then uses Group Relative Policy Optimization (GRPO) to train the LLM to effectively search a document corpus and answer these self-generated questions. The model also learns to verify its own answers, creating a continuous self-improvement cycle. This approach is advantageous as it automates the data generation and training process, allowing for efficient, localized LLM agent development.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Generate data and embeddings: python generate_data.py
  • Run training: autodidact.ipynb
  • Prerequisites: Single RTX 4090 GPU recommended.

Highlighted Details

  • Achieved over 2x accuracy improvement (23% to 59%) in 1 hour on a single RTX 4090.
  • Demonstrates adaptive search trajectory learning, improving tool usage and query refinement.
  • Fully autonomous pipeline: question generation, research, verification, embedding, and RL run locally.
  • Supports function calling and agentic loops, built on Unsloth's Efficient GRPO.

Maintenance & Community

No specific community links or maintenance details are provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research exploration, and the demonstrated results are based on a specific dataset (Apollo 13 mission report) and hardware configuration. The effectiveness on diverse or larger datasets may vary.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

LMaaS-Papers by txsun1997

0%
549
Curated list of LMaaS research papers
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.