promptbase  by microsoft

Prompt engineering resources for eliciting top performance from foundation models

created 1 year ago
5,653 stars

Top 9.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a collection of resources, best practices, and example scripts for advanced prompt engineering, specifically targeting foundation models like GPT-4. It aims to help researchers and practitioners achieve state-of-the-art performance on various benchmarks, particularly in complex reasoning and domain-specific tasks, by offering structured methodologies and extensible frameworks.

How It Works

The core of the project is the "Medprompt" methodology, which combines dynamic few-shot selection, self-generated chain-of-thought (CoT), and choice-shuffle ensembling. Dynamic few-shot selection uses semantic similarity (via text-embedding-ada-002) to retrieve relevant examples for each query. Self-generated CoT automates the creation of reasoning steps, and ensembling with choice shuffling enhances robustness. Medprompt+ extends this by incorporating a portfolio approach, dynamically selecting between direct few-shot prompts and CoT-based prompts based on GPT-4's assessment of task complexity.

Quick Start & Requirements

  • Install via pip install -e . after cloning the repository and navigating to the src directory.
  • Requires Azure OpenAI API keys and endpoints (AZURE_OPENAI_API_KEY, AZURE_OPENAI_CHAT_API_KEY, AZURE_OPENAI_CHAT_ENDPOINT_URL, AZURE_OPENAI_EMBEDDINGS_URL).
  • Datasets for benchmarks (MMLU, HumanEval, DROP, GSM8K, MATH, Big-Bench-Hard) must be downloaded separately and placed in src/promptbase/datasets/.
  • Example run command: python -m promptbase mmlu --subject <SUBJECT>
  • Links: Medprompt Blog, Medprompt Research Paper

Highlighted Details

  • Achieved 90.10% on MMLU with GPT-4 using Medprompt+.
  • Demonstrates outperforming specialized models with generalist foundation models via prompting.
  • Includes detailed methodology for dynamic few-shot selection, self-generated CoT, and ensembling.
  • Benchmarks provided for GPT-4 and Gemini Ultra across MMLU, GSM8K, MATH, HumanEval, BIG-Bench-Hard, DROP, and HellaSwag.

Maintenance & Community

  • Developed by Microsoft Research.
  • Future plans include more case studies, interviews, and specialized tooling deep dives.

Licensing & Compatibility

  • The repository itself appears to be MIT licensed, but the underlying methodologies and data usage are tied to Microsoft's AI services and research.

Limitations & Caveats

Some scripts are for reference and may not be immediately executable against public APIs. Medprompt+ relies on access to logprobs from GPT-4, which were not publicly available via the API at the time of the README's writing but were expected to be enabled.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
74 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.