Discover and explore top open-source AI tools and projects—updated daily.
WKQ9411Mini LLMs for efficient AI research and development
Top 96.2% on SourcePulse
Summary
This project replicates mainstream LLM architectures with limited computational resources, offering "mini models" (100-200M parameters). It targets engineers and researchers studying LLM designs on constrained hardware. The key benefit is enabling rapid architectural learning and replication while maintaining strong HuggingFace ecosystem compatibility.
How It Works
Mini-LLM reconstructs popular LLM architectures (e.g., Llama3, Deepseek, Qwen) into parameter-efficient versions. Refactored for HuggingFace transformers integration, it allows direct use of standard loading/generation methods. Independent training/inference codebases are provided for deeper understanding of principles and implementation.
Quick Start & Requirements
git clone https://github.com/WKQ9411/Mini-LLM.git), cd Mini-LLM, run setup script (./scripts/setup.sh or .\scripts\setup.ps1).transformers v4.56.1../scripts/download_data.sh or .\scripts\download_data.ps1).pretrain.py), SFT (sft.py), YaRN, DPO, GRPO. Inference via terminal (test_terminal.py) or API (test_api.py).https://github.com/WKQ9411/Mini-LLM.git. Models on HuggingFace.Highlighted Details
mini_llama3, mini_deepseekv3, mini_qwen3_next, mini_deepseekv4.transformers v4.x compatibility.Licensing & Compatibility
The project's license is not explicitly stated. Compatibility is confirmed with HuggingFace transformers v4.56.1; potential issues with v5.x are noted.
Limitations & Caveats
Small models (100-200M parameters) may memorize training data patterns over deep understanding, leading to hallucinations on reasoning tasks. Packing SFT is unsupported for mini_qwen3_next and mini_deepseekv4 due to complexity. Potential compatibility issues with future transformers versions exist.
3 weeks ago
Inactive
snowflakedb
SafeAILab
huggingface