Mini-LLM  by WKQ9411

Mini LLMs for efficient AI research and development

Created 1 year ago
266 stars

Top 96.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project replicates mainstream LLM architectures with limited computational resources, offering "mini models" (100-200M parameters). It targets engineers and researchers studying LLM designs on constrained hardware. The key benefit is enabling rapid architectural learning and replication while maintaining strong HuggingFace ecosystem compatibility.

How It Works

Mini-LLM reconstructs popular LLM architectures (e.g., Llama3, Deepseek, Qwen) into parameter-efficient versions. Refactored for HuggingFace transformers integration, it allows direct use of standard loading/generation methods. Independent training/inference codebases are provided for deeper understanding of principles and implementation.

Quick Start & Requirements

  • Installation: Clone repo (git clone https://github.com/WKQ9411/Mini-LLM.git), cd Mini-LLM, run setup script (./scripts/setup.sh or .\scripts\setup.ps1).
  • Prerequisites: CUDA for training; CPU for inference. Fixed to transformers v4.56.1.
  • Data Preparation: Download datasets via scripts (./scripts/download_data.sh or .\scripts\download_data.ps1).
  • Training & Inference: Scripts for pretraining (pretrain.py), SFT (sft.py), YaRN, DPO, GRPO. Inference via terminal (test_terminal.py) or API (test_api.py).
  • Links: Repo: https://github.com/WKQ9411/Mini-LLM.git. Models on HuggingFace.

Highlighted Details

  • Supports training for mini_llama3, mini_deepseekv3, mini_qwen3_next, mini_deepseekv4.
  • Integrates YaRN (long context), DPO (preference tuning), GRPO (reinforcement learning).
  • Features Triton Flash Attention for optimized forward inference.
  • Maintains HuggingFace transformers v4.x compatibility.

Licensing & Compatibility

The project's license is not explicitly stated. Compatibility is confirmed with HuggingFace transformers v4.56.1; potential issues with v5.x are noted.

Limitations & Caveats

Small models (100-200M parameters) may memorize training data patterns over deep understanding, leading to hallucinations on reasoning tasks. Packing SFT is unsupported for mini_qwen3_next and mini_deepseekv4 due to complexity. Potential compatibility issues with future transformers versions exist.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
83 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
1 more.

ArcticInference by snowflakedb

0.9%
446
vLLM plugin for high-throughput, low-latency LLM and embedding inference
Created 1 year ago
Updated 1 month ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

0.4%
2k
Speculative decoding research paper for faster LLM inference
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.