AutoIF  by QwenLM

Research paper for improving LLM instruction-following via self-play with execution feedback

Created 1 year ago
307 stars

Top 87.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides AutoIF, a method for automatically generating and verifying instruction-following data for large language models using code execution feedback. It is designed for researchers and developers aiming to improve LLM instruction-following capabilities through scalable, self-play data synthesis.

How It Works

AutoIF synthesizes data in stages, starting with seed instructions and progressing through verification function generation, quality cross-validation, and back-translation. It then augments queries, verifies responses against generated functions, and filters for high-quality instruction-response pairs. This approach leverages code execution to provide objective feedback, ensuring the generated data is reliable and effective for training.

Quick Start & Requirements

  • Install: pip install -r requirements.txt within the ./AutoIF/ directory.
  • Prerequisites: Python 3.9, PyTorch 2.1.2+cu121, Transformers 4.41.2.
  • Setup: Requires running a series of Python scripts for data synthesis and verification.
  • Docs: https://github.com/QwenLM/AutoIF

Highlighted Details

  • Automates instruction-following data generation and quality verification.
  • Utilizes code execution feedback for reliable data quality assessment.
  • Supports both Strong-to-Weak Distillation and Self-Alignment training setups.
  • Integrates with LLaMA-Factory for SFT and DPO training.

Maintenance & Community

The project is associated with Qwen, Alibaba Inc. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README indicates that Transformers version lower than 4.41.2 is unlikely to work. Specific implementation details for training 7B and 70B models are deferred to the associated paper.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Ross Taylor Ross Taylor(Cofounder of General Reasoning; Cocreator of Papers with Code), and
11 more.

open-instruct by allenai

0.7%
3k
Training codebase for instruction-following language models
Created 2 years ago
Updated 17 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), John Yang John Yang(Coauthor of SWE-bench, SWE-agent), and
28 more.

stanford_alpaca by tatsu-lab

0.1%
30k
Instruction-following LLaMA model training and data generation
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.