automatic_prompt_engineer  by keirp

Automatic prompt engineer for LLM instruction generation

created 2 years ago
1,286 stars

Top 31.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the Automatic Prompt Engineer (APE) framework, designed to automate the creation and selection of effective prompts for Large Language Models (LLMs). It targets researchers and practitioners seeking to improve LLM performance across various NLP tasks by replacing manual prompt engineering with an LLM-driven, search-based approach. APE aims to achieve human-level or superior prompt quality with reduced human effort.

How It Works

APE treats prompt generation as a program synthesis problem. It uses an LLM to generate candidate prompts based on a specified template and a set of demonstrations. These candidate prompts are then evaluated by another LLM on a given dataset, using a defined evaluation template. The framework employs search strategies, including Upper Confidence Bound (UCB) for efficiency, to identify the best-performing prompts that maximize a scoring function, thereby optimizing LLM task performance.

Quick Start & Requirements

  • Install: pip install -e .
  • Authentication: Set export OPENAI_API_KEY=YOUR_KEY
  • Dependencies: Requires an OpenAI API key.
  • Resources: Cost estimation tools are provided.
  • More Info: Project Page, Colab

Highlighted Details

  • Outperforms prior LLM baselines and matches/exceeds human annotator performance on 21/24 NLP tasks.
  • Supports zero-shot and few-shot learning prompt optimization.
  • Offers flexible templating for evaluation, prompt generation, and demonstrations.
  • Includes configuration options for efficient prompt evaluation (e.g., UCB).

Maintenance & Community

  • Based on the "Instruction Induction" codebase.
  • Links to a Colab notebook and a GUI demo are provided.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Primarily designed for OpenAI models.

Limitations & Caveats

APE can be computationally expensive to run, with cost estimation tools available. The framework relies on OpenAI's API, and specific model versions used in experiments (e.g., text-davinci-002) may influence results.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
38 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.