Chinese ChatGPT implementation, training/eval tools
Top 92.8% on sourcepulse
This repository provides a comprehensive pipeline for training and evaluating Large Language Models (LLMs), focusing on Chinese language capabilities. It supports LLM pre-training, evaluation using Zero-shot and Few-shot methods, and a full ChatGPT-style training process including Supervised Fine-Tuning (SFT), Reward Modeling, and Reinforcement Learning from Human Feedback (RLHF), or its more memory-efficient alternative, Direct Preference Optimization (DPO). The project is targeted at researchers and developers working with LLMs, particularly those interested in Chinese NLP tasks.
How It Works
The project leverages DeepSpeed for efficient distributed training, enabling the handling of large models and datasets. It implements standard LLM architectures like LLaMA and GPT (decoder-only) and GLM (encoder-only). The RLHF pipeline follows the "Learning to Summarize from Human Feedback" paper, offering options for joint or separate optimization of policy and reward models. DPO is integrated as a memory-saving alternative to the traditional RLHF approach.
Quick Start & Requirements
jieba
library is also necessary for specific tokenization.Highlighted Details
Maintenance & Community
The repository is maintained by sunzeyeah. No specific community channels (Discord/Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. The use of DeepSpeed and Apex implies compatibility with their respective licenses. Commercial use would require careful review of any underlying model licenses and the project's own licensing terms once clarified.
Limitations & Caveats
The setup instructions for DeepSpeed and Apex are detailed and require manual compilation, which can be complex. The custom jieba
modification is critical for correct tokenization of special tokens. RLHF stage details are marked "To be updated." Benchmarking results are provided for specific hardware configurations (V100, A100), which may not directly translate to other setups.
1 year ago
1 week