Small Chinese chat model (0.2B) for dialogue generation
Top 27.2% on sourcepulse
ChatLM-mini-Chinese is a 0.2B parameter Chinese conversational language model trained from scratch, offering a complete pipeline from data cleaning and tokenizer training to pre-training, SFT, and DPO optimization. It targets users needing a lightweight, efficient Chinese LLM for research or deployment on resource-constrained hardware, enabling custom fine-tuning for downstream tasks.
How It Works
The model is based on the T5 architecture, adapted for text-to-text generation. It utilizes a custom tokenizer trained on a large Chinese corpus, and its training pipeline emphasizes efficiency, supporting stream chat and offering full code transparency for data processing, tokenizer training, and model optimization stages. The project highlights a custom trainer for flexible training control and supports PEFT for efficient fine-tuning.
Quick Start & Requirements
pip install -r requirements.txt
or conda install --file requirements.txt
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model's small size (0.2B parameters) and limited pre-training dataset size (9M) may lead to occasional irrelevant responses or "hallucinations." The C-Eval scores indicate baseline performance, suggesting it may not excel in complex evaluation benchmarks without further task-specific fine-tuning.
1 year ago
1 week