RLHF  by sunzeyeah

Chinese ChatGPT implementation, training/eval tools

created 2 years ago
285 stars

Top 92.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive pipeline for training and evaluating Large Language Models (LLMs), focusing on Chinese language capabilities. It supports LLM pre-training, evaluation using Zero-shot and Few-shot methods, and a full ChatGPT-style training process including Supervised Fine-Tuning (SFT), Reward Modeling, and Reinforcement Learning from Human Feedback (RLHF), or its more memory-efficient alternative, Direct Preference Optimization (DPO). The project is targeted at researchers and developers working with LLMs, particularly those interested in Chinese NLP tasks.

How It Works

The project leverages DeepSpeed for efficient distributed training, enabling the handling of large models and datasets. It implements standard LLM architectures like LLaMA and GPT (decoder-only) and GLM (encoder-only). The RLHF pipeline follows the "Learning to Summarize from Human Feedback" paper, offering options for joint or separate optimization of policy and reward models. DPO is integrated as a memory-saving alternative to the traditional RLHF approach.

Quick Start & Requirements

  • Installation: Requires cloning DeepSpeed and potentially Apex, with custom builds for CUDA architecture compatibility. A modified jieba library is also necessary for specific tokenization.
  • Prerequisites: NVIDIA GPU with CUDA, Python. Specific CUDA architecture versions may need to be specified during DeepSpeed/Apex builds.
  • Resources: Significant GPU memory and storage are required for downloading models (up to 25.6GB) and datasets (up to 5GB). Training large models can take days.
  • Links: DeepSpeed, Apex, jieba.

Highlighted Details

  • Supports pre-training for LLaMA, GPT, GLM, and Pangu architectures.
  • Offers evaluation benchmarks for C-Eval, MMLU, and CLUE Benchmark.
  • Includes DPO as a memory-efficient alternative to RLHF.
  • Provides pre-trained models and datasets for Chinese LLMs.

Maintenance & Community

The repository is maintained by sunzeyeah. No specific community channels (Discord/Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The use of DeepSpeed and Apex implies compatibility with their respective licenses. Commercial use would require careful review of any underlying model licenses and the project's own licensing terms once clarified.

Limitations & Caveats

The setup instructions for DeepSpeed and Apex are detailed and require manual compilation, which can be complex. The custom jieba modification is critical for correct tokenization of special tokens. RLHF stage details are marked "To be updated." Benchmarking results are provided for specific hardware configurations (V100, A100), which may not directly translate to other setups.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.