RLHF by sunzeyeah

Chinese ChatGPT implementation, training/eval tools

Created 2 years ago

289 stars

Top 91.2% on SourcePulse

Project Summary

This repository provides a comprehensive pipeline for training and evaluating Large Language Models (LLMs), focusing on Chinese language capabilities. It supports LLM pre-training, evaluation using Zero-shot and Few-shot methods, and a full ChatGPT-style training process including Supervised Fine-Tuning (SFT), Reward Modeling, and Reinforcement Learning from Human Feedback (RLHF), or its more memory-efficient alternative, Direct Preference Optimization (DPO). The project is targeted at researchers and developers working with LLMs, particularly those interested in Chinese NLP tasks.

How It Works

The project leverages DeepSpeed for efficient distributed training, enabling the handling of large models and datasets. It implements standard LLM architectures like LLaMA and GPT (decoder-only) and GLM (encoder-only). The RLHF pipeline follows the "Learning to Summarize from Human Feedback" paper, offering options for joint or separate optimization of policy and reward models. DPO is integrated as a memory-saving alternative to the traditional RLHF approach.

Quick Start & Requirements

Installation: Requires cloning DeepSpeed and potentially Apex, with custom builds for CUDA architecture compatibility. A modified jieba library is also necessary for specific tokenization.
Prerequisites: NVIDIA GPU with CUDA, Python. Specific CUDA architecture versions may need to be specified during DeepSpeed/Apex builds.
Resources: Significant GPU memory and storage are required for downloading models (up to 25.6GB) and datasets (up to 5GB). Training large models can take days.
Links: DeepSpeed, Apex, jieba.

Highlighted Details

Supports pre-training for LLaMA, GPT, GLM, and Pangu architectures.
Offers evaluation benchmarks for C-Eval, MMLU, and CLUE Benchmark.
Includes DPO as a memory-efficient alternative to RLHF.
Provides pre-trained models and datasets for Chinese LLMs.

Maintenance & Community

The repository is maintained by sunzeyeah. No specific community channels (Discord/Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The use of DeepSpeed and Apex implies compatibility with their respective licenses. Commercial use would require careful review of any underlying model licenses and the project's own licensing terms once clarified.

Limitations & Caveats

The setup instructions for DeepSpeed and Apex are detailed and require manual compilation, which can be complex. The custom jieba modification is critical for correct tokenization of special tokens. RLHF stage details are marked "To be updated." Benchmarking results are provided for specific hardware configurations (V100, A100), which may not directly translate to other setups.

RLHF by sunzeyeah

Explore Similar Projects

X-LLM by phellonchen

LLaMA-Cult-and-More by shm007g

Zero-Chatgpt by AI-Study-Han

PromptCBLUE by michael-wzhu

intro-llm.github.io by intro-llm

tiny-llm-zh by wdndev

LLMZoo by FreedomIntelligence

FindTheChatGPTer by chenking2020

LLM-workshop-2024 by rasbt

openchat by imoneoi

MedicalGPT by shibing624

Qwen by QwenLM