minimind by jingyaogong

Minimal LLM training from scratch, under 3 USD and in 2 hours

Created 1 year ago

36,859 stars

Top 0.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

This project provides a comprehensive, from-scratch implementation of the entire LLM training pipeline, targeting AI enthusiasts and developers who want to understand and replicate LLM training. It enables users to train a 26M parameter GPT model in approximately 2 hours with minimal cost, demystifying the LLM development process.

How It Works

MiniMind reconstructs core LLM algorithms (pretraining, SFT, LoRA, DPO, distillation) from scratch using native PyTorch, avoiding high-level abstractions. It emphasizes a "lean and deep" model architecture, similar to Llama3 and DeepSeek-V2, utilizing RMSNorm, SwiGLU, and Rotary Positional Embeddings (RoPE). The project also includes custom tokenizer training and a curated set of high-quality datasets for efficient training.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Test existing models: python eval_model.py --load 1 --model_mode 2
Train from scratch: Navigate to ./trainer and run python train_pretrain.py followed by python train_full_sft.py.
Requires Python 3.10+, PyTorch with CUDA support.
Official quick-start and detailed documentation are available in the README.

Highlighted Details

Achieves functional chatbot capabilities with a 26M parameter model trained in ~2 hours for ~3 RMB.
Offers a full training pipeline including Pretrain, SFT, LoRA, DPO, and Knowledge Distillation.
Supports integration with popular inference engines like llama.cpp, vllm, and ollama.
Includes a minimal OpenAI-compatible API server for easy integration with UIs.

Maintenance & Community

The project is actively maintained with recent updates in April 2025. Community engagement is encouraged via GitHub Issues and Pull Requests. Links to community resources are not explicitly provided in the README.

Licensing & Compatibility

Licensed under Apache-2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project explicitly states that models trained with smaller parameter counts (e.g., 26M) may not achieve significant reasoning capabilities through cold-start SFT+GRPO. The README also notes that recent updates may break compatibility with older model weights.

Health Check

Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1,470 stars in the last 30 days