AutoIF  by QwenLM

Research paper for improving LLM instruction-following via self-play with execution feedback

created 1 year ago
298 stars

Top 90.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides AutoIF, a method for automatically generating and verifying instruction-following data for large language models using code execution feedback. It is designed for researchers and developers aiming to improve LLM instruction-following capabilities through scalable, self-play data synthesis.

How It Works

AutoIF synthesizes data in stages, starting with seed instructions and progressing through verification function generation, quality cross-validation, and back-translation. It then augments queries, verifies responses against generated functions, and filters for high-quality instruction-response pairs. This approach leverages code execution to provide objective feedback, ensuring the generated data is reliable and effective for training.

Quick Start & Requirements

  • Install: pip install -r requirements.txt within the ./AutoIF/ directory.
  • Prerequisites: Python 3.9, PyTorch 2.1.2+cu121, Transformers 4.41.2.
  • Setup: Requires running a series of Python scripts for data synthesis and verification.
  • Docs: https://github.com/QwenLM/AutoIF

Highlighted Details

  • Automates instruction-following data generation and quality verification.
  • Utilizes code execution feedback for reliable data quality assessment.
  • Supports both Strong-to-Weak Distillation and Self-Alignment training setups.
  • Integrates with LLaMA-Factory for SFT and DPO training.

Maintenance & Community

The project is associated with Qwen, Alibaba Inc. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README indicates that Transformers version lower than 4.41.2 is unlikely to work. Specific implementation details for training 7B and 70B models are deferred to the associated paper.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 17 hours ago
Feedback? Help us improve.