minimind  by jingyaogong

Minimal LLM training from scratch, under 3 USD and in 2 hours

Created 1 year ago
26,225 stars

Top 1.5% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a comprehensive, from-scratch implementation of the entire LLM training pipeline, targeting AI enthusiasts and developers who want to understand and replicate LLM training. It enables users to train a 26M parameter GPT model in approximately 2 hours with minimal cost, demystifying the LLM development process.

How It Works

MiniMind reconstructs core LLM algorithms (pretraining, SFT, LoRA, DPO, distillation) from scratch using native PyTorch, avoiding high-level abstractions. It emphasizes a "lean and deep" model architecture, similar to Llama3 and DeepSeek-V2, utilizing RMSNorm, SwiGLU, and Rotary Positional Embeddings (RoPE). The project also includes custom tokenizer training and a curated set of high-quality datasets for efficient training.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Test existing models: python eval_model.py --load 1 --model_mode 2
  • Train from scratch: Navigate to ./trainer and run python train_pretrain.py followed by python train_full_sft.py.
  • Requires Python 3.10+, PyTorch with CUDA support.
  • Official quick-start and detailed documentation are available in the README.

Highlighted Details

  • Achieves functional chatbot capabilities with a 26M parameter model trained in ~2 hours for ~3 RMB.
  • Offers a full training pipeline including Pretrain, SFT, LoRA, DPO, and Knowledge Distillation.
  • Supports integration with popular inference engines like llama.cpp, vllm, and ollama.
  • Includes a minimal OpenAI-compatible API server for easy integration with UIs.

Maintenance & Community

The project is actively maintained with recent updates in April 2025. Community engagement is encouraged via GitHub Issues and Pull Requests. Links to community resources are not explicitly provided in the README.

Licensing & Compatibility

Licensed under Apache-2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project explicitly states that models trained with smaller parameter counts (e.g., 26M) may not achieve significant reasoning capabilities through cold-start SFT+GRPO. The README also notes that recent updates may break compatibility with older model weights.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
7
Star History
1,566 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
15 more.

torchtune by pytorch

0.2%
5k
PyTorch library for LLM post-training and experimentation
Created 1 year ago
Updated 1 day ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
12 more.

LLMs-from-scratch by rasbt

3.0%
72k
Educational resource for LLM construction in PyTorch
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.