Reflection_Tuning  by tianyi-lab

Research paper for LLM instruction tuning via data recycling

created 1 year ago
359 stars

Top 79.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides implementations for Reflection-Tuning (V1) and Selective Reflection-Tuning (V2), methods designed to enhance the quality of instruction-tuning datasets for Large Language Models (LLMs). It addresses the challenge of generating high-quality, student-model-compatible data by using an oracle model for refinement and incorporating student model feedback for selection, aiming to improve LLM performance with significantly less data.

How It Works

Reflection-Tuning (V1) uses an oracle model (like ChatGPT) to refine instruction-response pairs, employing specific criteria and "chain-of-thought" responses to generate improved data. Selective Reflection-Tuning (V2) introduces an interactive pipeline where the student model evaluates and selects refined data based on its own learning needs, using metrics like Instruction-Following Difficulty (IFD) and its reversed version (r-IFD) to ensure data compatibility and criticality.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires an OpenAI API key for reflection steps.
  • Example scripts use meta-llama/Llama-2-7b-hf for IFD score calculation.
  • Official quick-start and detailed code for reflection and selection are provided.

Highlighted Details

  • Achieves strong performance on Alpaca Eval and Open LLM Leaderboards with significantly less data (e.g., <1k samples).
  • Introduces a nuanced evaluation schema, r-IFD, to quantify the relevance of instruction-response pairs.
  • V2 method allows student models to select data, improving coherence and model-specific compatibility.
  • Provides pre-generated datasets and model weights for Alpaca and WizardLM.

Maintenance & Community

The project has multiple publications at ACL'24 and NeurIPS'23 workshops. Contact information for Ming Li is provided for questions. Related works on data selection and augmentation are also linked.

Licensing & Compatibility

The repository does not explicitly state a license. The provided code and data are for research purposes, and commercial use would require clarification.

Limitations & Caveats

The reflection process relies on an OpenAI API key, implying costs and potential rate limits. The extraction of reflection results uses regular expressions, which may not be perfect, and raw outputs are planned for future release.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 23 hours ago
Feedback? Help us improve.