GPT-4 data for instruction-tuning LLMs via supervised/RL
Top 11.5% on sourcepulse
This repository provides datasets generated by GPT-4 for instruction-following Large Language Models (LLMs). It targets researchers aiming to improve LLM capabilities through supervised and reinforcement learning, offering a valuable resource for building more capable and aligned AI assistants.
How It Works
The project leverages GPT-4 to create diverse instruction-following datasets, including English and Chinese instruction-output pairs, and comparative data for training reward models. This approach aims to transfer GPT-4's advanced instruction-following abilities to other LLMs, as demonstrated by human evaluations showing LLaMA models fine-tuned on this data perform comparably to GPT-4 on key criteria.
Quick Start & Requirements
torchrun
and DeepSpeed for distributed training.
torchrun --nproc_per_node=16 --master_port=12345 train.py --model_name_or_path PATH/TO/LLaMA --data_path ./data/alpaca_gpt4_data.json --output_dir PATH/TO/SAVE --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy "no" --save_strategy "steps" --save_steps 200 --save_total_limit 1 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --deepspeed configs/ds_config.json
plots/main_plots.ipynb
) is provided to reproduce figures from the paper.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The dataset and models trained on it are strictly limited to non-commercial, research purposes due to the CC BY NC 4.0 license.
2 years ago
Inactive