GPT-4-LLM  by Instruction-Tuning-with-GPT-4

GPT-4 data for instruction-tuning LLMs via supervised/RL

created 2 years ago
4,320 stars

Top 11.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides datasets generated by GPT-4 for instruction-following Large Language Models (LLMs). It targets researchers aiming to improve LLM capabilities through supervised and reinforcement learning, offering a valuable resource for building more capable and aligned AI assistants.

How It Works

The project leverages GPT-4 to create diverse instruction-following datasets, including English and Chinese instruction-output pairs, and comparative data for training reward models. This approach aims to transfer GPT-4's advanced instruction-following abilities to other LLMs, as demonstrated by human evaluations showing LLaMA models fine-tuned on this data perform comparably to GPT-4 on key criteria.

Quick Start & Requirements

  • Fine-tuning: Requires a LLaMA model and Hugging Face training code. The provided command uses torchrun and DeepSpeed for distributed training.
    torchrun --nproc_per_node=16 --master_port=12345 train.py --model_name_or_path PATH/TO/LLaMA --data_path ./data/alpaca_gpt4_data.json --output_dir PATH/TO/SAVE --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy "no" --save_strategy "steps" --save_steps 200 --save_total_limit 1 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --deepspeed configs/ds_config.json
    
  • Prerequisites: LLaMA model weights, DeepSpeed configuration, Hugging Face libraries.
  • Evaluation: Recommended to use Vicuna's serving and evaluation pipelines.
  • Plotting: An IPython notebook (plots/main_plots.ipynb) is provided to reproduce figures from the paper.
  • Docs: Project Page

Highlighted Details

  • 52K English and 52K Chinese instruction-following data points generated by GPT-4.
  • 9K "Unnatural Instructions" data points from GPT-4.
  • Comparative data with GPT-4 evaluations for reward model training.
  • Human evaluation indicates LLaMA-GPT-4 performance is similar to GPT-4 on Helpfulness, Honesty, and Harmlessness.

Maintenance & Community

  • The project is associated with the LLaVA (Visual Instruction Tuning with GPT-4) release.
  • Benefits from and acknowledges LLaMA, Alpaca, and Vicuna projects.

Licensing & Compatibility

  • License: CC BY NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
  • Restrictions: Data is intended and licensed for research use only. Models trained using this dataset should not be used outside of research purposes. Commercial use is prohibited.

Limitations & Caveats

The dataset and models trained on it are strictly limited to non-commercial, research purposes due to the CC BY NC 4.0 license.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), John Yang John Yang(Author of SWE-bench, SWE-agent), and
13 more.

stanford_alpaca by tatsu-lab

0.1%
30k
Instruction-following LLaMA model training and data generation
created 2 years ago
updated 1 year ago
Feedback? Help us improve.