Minimal LLM training from scratch, under 3 USD and in 2 hours
Top 1.8% on sourcepulse
This project provides a comprehensive, from-scratch implementation of the entire LLM training pipeline, targeting AI enthusiasts and developers who want to understand and replicate LLM training. It enables users to train a 26M parameter GPT model in approximately 2 hours with minimal cost, demystifying the LLM development process.
How It Works
MiniMind reconstructs core LLM algorithms (pretraining, SFT, LoRA, DPO, distillation) from scratch using native PyTorch, avoiding high-level abstractions. It emphasizes a "lean and deep" model architecture, similar to Llama3 and DeepSeek-V2, utilizing RMSNorm, SwiGLU, and Rotary Positional Embeddings (RoPE). The project also includes custom tokenizer training and a curated set of high-quality datasets for efficient training.
Quick Start & Requirements
pip install -r requirements.txt
python eval_model.py --load 1 --model_mode 2
./trainer
and run python train_pretrain.py
followed by python train_full_sft.py
.Highlighted Details
llama.cpp
, vllm
, and ollama
.Maintenance & Community
The project is actively maintained with recent updates in April 2025. Community engagement is encouraged via GitHub Issues and Pull Requests. Links to community resources are not explicitly provided in the README.
Licensing & Compatibility
Licensed under Apache-2.0, allowing for commercial use and integration with closed-source projects.
Limitations & Caveats
The project explicitly states that models trained with smaller parameter counts (e.g., 26M) may not achieve significant reasoning capabilities through cold-start SFT+GRPO. The README also notes that recent updates may break compatibility with older model weights.
3 months ago
1 day