minimind  by jingyaogong

Minimal LLM training from scratch, under 3 USD and in 2 hours

created 1 year ago
23,575 stars

Top 1.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a comprehensive, from-scratch implementation of the entire LLM training pipeline, targeting AI enthusiasts and developers who want to understand and replicate LLM training. It enables users to train a 26M parameter GPT model in approximately 2 hours with minimal cost, demystifying the LLM development process.

How It Works

MiniMind reconstructs core LLM algorithms (pretraining, SFT, LoRA, DPO, distillation) from scratch using native PyTorch, avoiding high-level abstractions. It emphasizes a "lean and deep" model architecture, similar to Llama3 and DeepSeek-V2, utilizing RMSNorm, SwiGLU, and Rotary Positional Embeddings (RoPE). The project also includes custom tokenizer training and a curated set of high-quality datasets for efficient training.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Test existing models: python eval_model.py --load 1 --model_mode 2
  • Train from scratch: Navigate to ./trainer and run python train_pretrain.py followed by python train_full_sft.py.
  • Requires Python 3.10+, PyTorch with CUDA support.
  • Official quick-start and detailed documentation are available in the README.

Highlighted Details

  • Achieves functional chatbot capabilities with a 26M parameter model trained in ~2 hours for ~3 RMB.
  • Offers a full training pipeline including Pretrain, SFT, LoRA, DPO, and Knowledge Distillation.
  • Supports integration with popular inference engines like llama.cpp, vllm, and ollama.
  • Includes a minimal OpenAI-compatible API server for easy integration with UIs.

Maintenance & Community

The project is actively maintained with recent updates in April 2025. Community engagement is encouraged via GitHub Issues and Pull Requests. Links to community resources are not explicitly provided in the README.

Licensing & Compatibility

Licensed under Apache-2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project explicitly states that models trained with smaller parameter counts (e.g., 26M) may not achieve significant reasoning capabilities through cold-start SFT+GRPO. The README also notes that recent updates may break compatibility with older model weights.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
23
Star History
3,545 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.