Chinese LLM for learning large language models
Top 47.2% on sourcepulse
This project implements small-parameter Chinese Large Language Models (LLMs) from scratch, targeting engineers and researchers for rapid learning of LLM concepts. It provides a complete pipeline from tokenization to deployment, with open-source code and data, enabling a full understanding of LLM development.
How It Works
The project follows a standard LLM architecture, incorporating components like RMSNorm and RoPE. It details a two-stage training process: pre-training (PTM) and instruction fine-tuning (SFT), with optional human alignment (RLHF, DPO). The implementation leverages the Hugging Face Transformers library and DeepSpeed for efficient multi-GPU/multi-node training, supporting various model sizes and an optional Mixture-of-Experts (MoE) architecture.
Quick Start & Requirements
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
The project is maintained by wdndev. Further community or roadmap details are not explicitly provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project prioritizes demonstrating the full LLM pipeline over achieving state-of-the-art performance, resulting in lower evaluation scores and occasional generation errors. The llama.cpp deployment is a modified version and is recommended for Linux environments.
11 months ago
1 day