Research paper for LLM instruction tuning via data recycling
Top 79.1% on sourcepulse
This repository provides implementations for Reflection-Tuning (V1) and Selective Reflection-Tuning (V2), methods designed to enhance the quality of instruction-tuning datasets for Large Language Models (LLMs). It addresses the challenge of generating high-quality, student-model-compatible data by using an oracle model for refinement and incorporating student model feedback for selection, aiming to improve LLM performance with significantly less data.
How It Works
Reflection-Tuning (V1) uses an oracle model (like ChatGPT) to refine instruction-response pairs, employing specific criteria and "chain-of-thought" responses to generate improved data. Selective Reflection-Tuning (V2) introduces an interactive pipeline where the student model evaluates and selects refined data based on its own learning needs, using metrics like Instruction-Following Difficulty (IFD) and its reversed version (r-IFD) to ensure data compatibility and criticality.
Quick Start & Requirements
pip install -r requirements.txt
meta-llama/Llama-2-7b-hf
for IFD score calculation.Highlighted Details
Maintenance & Community
The project has multiple publications at ACL'24 and NeurIPS'23 workshops. Contact information for Ming Li is provided for questions. Related works on data selection and augmentation are also linked.
Licensing & Compatibility
The repository does not explicitly state a license. The provided code and data are for research purposes, and commercial use would require clarification.
Limitations & Caveats
The reflection process relies on an OpenAI API key, implying costs and potential rate limits. The extraction of reflection results uses regular expressions, which may not be perfect, and raw outputs are planned for future release.
11 months ago
1 day