Reflection_Tuning by tianyi-lab

Research paper for LLM instruction tuning via data recycling

Created 2 years ago

366 stars

Top 77.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Teknium

Cofounder of Nous Research

Jeremy Howard

Cofounder of fast.ai

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides implementations for Reflection-Tuning (V1) and Selective Reflection-Tuning (V2), methods designed to enhance the quality of instruction-tuning datasets for Large Language Models (LLMs). It addresses the challenge of generating high-quality, student-model-compatible data by using an oracle model for refinement and incorporating student model feedback for selection, aiming to improve LLM performance with significantly less data.

How It Works

Reflection-Tuning (V1) uses an oracle model (like ChatGPT) to refine instruction-response pairs, employing specific criteria and "chain-of-thought" responses to generate improved data. Selective Reflection-Tuning (V2) introduces an interactive pipeline where the student model evaluates and selects refined data based on its own learning needs, using metrics like Instruction-Following Difficulty (IFD) and its reversed version (r-IFD) to ensure data compatibility and criticality.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Requires an OpenAI API key for reflection steps.
Example scripts use meta-llama/Llama-2-7b-hf for IFD score calculation.
Official quick-start and detailed code for reflection and selection are provided.

Highlighted Details

Achieves strong performance on Alpaca Eval and Open LLM Leaderboards with significantly less data (e.g., <1k samples).
Introduces a nuanced evaluation schema, r-IFD, to quantify the relevance of instruction-response pairs.
V2 method allows student models to select data, improving coherence and model-specific compatibility.
Provides pre-generated datasets and model weights for Alpaca and WizardLM.

Maintenance & Community

The project has multiple publications at ACL'24 and NeurIPS'23 workshops. Contact information for Ming Li is provided for questions. Related works on data selection and augmentation are also linked.

Licensing & Compatibility

The repository does not explicitly state a license. The provided code and data are for research purposes, and commercial use would require clarification.

Limitations & Caveats

The reflection process relies on an OpenAI API key, implying costs and potential rate limits. The extraction of reflection results uses regular expressions, which may not be perfect, and raw outputs are planned for future release.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days