Research paper for improving LLM instruction-following via self-play with execution feedback
Top 90.1% on sourcepulse
This repository provides AutoIF, a method for automatically generating and verifying instruction-following data for large language models using code execution feedback. It is designed for researchers and developers aiming to improve LLM instruction-following capabilities through scalable, self-play data synthesis.
How It Works
AutoIF synthesizes data in stages, starting with seed instructions and progressing through verification function generation, quality cross-validation, and back-translation. It then augments queries, verifies responses against generated functions, and filters for high-quality instruction-response pairs. This approach leverages code execution to provide objective feedback, ensuring the generated data is reliable and effective for training.
Quick Start & Requirements
pip install -r requirements.txt
within the ./AutoIF/
directory.Highlighted Details
Maintenance & Community
The project is associated with Qwen, Alibaba Inc. Further community or roadmap details are not explicitly provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README indicates that Transformers version lower than 4.41.2 is unlikely to work. Specific implementation details for training 7B and 70B models are deferred to the associated paper.
1 year ago
1 week