GPT-4-LLM by Instruction-Tuning-with-GPT-4

GPT-4 data for instruction-tuning LLMs via supervised/RL

Created 2 years ago

4,342 stars

Top 11.2% on SourcePulse

View on GitHub

10 Experts Love This Project

Jeremy Howard

Cofounder of fast.ai

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Marc Klingen

Cofounder of Langfuse

Teknium

Cofounder of Nous Research

and 6 more!

Project Summary

This repository provides datasets generated by GPT-4 for instruction-following Large Language Models (LLMs). It targets researchers aiming to improve LLM capabilities through supervised and reinforcement learning, offering a valuable resource for building more capable and aligned AI assistants.

How It Works

The project leverages GPT-4 to create diverse instruction-following datasets, including English and Chinese instruction-output pairs, and comparative data for training reward models. This approach aims to transfer GPT-4's advanced instruction-following abilities to other LLMs, as demonstrated by human evaluations showing LLaMA models fine-tuned on this data perform comparably to GPT-4 on key criteria.

Quick Start & Requirements

Fine-tuning: Requires a LLaMA model and Hugging Face training code. The provided command uses torchrun and DeepSpeed for distributed training.

torchrun --nproc_per_node=16 --master_port=12345 train.py --model_name_or_path PATH/TO/LLaMA --data_path ./data/alpaca_gpt4_data.json --output_dir PATH/TO/SAVE --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy "no" --save_strategy "steps" --save_steps 200 --save_total_limit 1 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --deepspeed configs/ds_config.json

Prerequisites: LLaMA model weights, DeepSpeed configuration, Hugging Face libraries.
Evaluation: Recommended to use Vicuna's serving and evaluation pipelines.
Plotting: An IPython notebook (plots/main_plots.ipynb) is provided to reproduce figures from the paper.
Docs: Project Page

Highlighted Details

52K English and 52K Chinese instruction-following data points generated by GPT-4.
9K "Unnatural Instructions" data points from GPT-4.
Comparative data with GPT-4 evaluations for reward model training.
Human evaluation indicates LLaMA-GPT-4 performance is similar to GPT-4 on Helpfulness, Honesty, and Harmlessness.

Maintenance & Community

The project is associated with the LLaVA (Visual Instruction Tuning with GPT-4) release.
Benefits from and acknowledges LLaMA, Alpaca, and Vicuna projects.

Licensing & Compatibility

License: CC BY NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
Restrictions: Data is intended and licensed for research use only. Models trained using this dataset should not be used outside of research purposes. Commercial use is prohibited.

Limitations & Caveats

The dataset and models trained on it are strictly limited to non-commercial, research purposes due to the CC BY NC 4.0 license.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days