promptbase by microsoft

Prompt engineering resources for eliciting top performance from foundation models

Created 2 years ago

5,720 stars

Top 8.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

This repository provides a collection of resources, best practices, and example scripts for advanced prompt engineering, specifically targeting foundation models like GPT-4. It aims to help researchers and practitioners achieve state-of-the-art performance on various benchmarks, particularly in complex reasoning and domain-specific tasks, by offering structured methodologies and extensible frameworks.

How It Works

The core of the project is the "Medprompt" methodology, which combines dynamic few-shot selection, self-generated chain-of-thought (CoT), and choice-shuffle ensembling. Dynamic few-shot selection uses semantic similarity (via text-embedding-ada-002) to retrieve relevant examples for each query. Self-generated CoT automates the creation of reasoning steps, and ensembling with choice shuffling enhances robustness. Medprompt+ extends this by incorporating a portfolio approach, dynamically selecting between direct few-shot prompts and CoT-based prompts based on GPT-4's assessment of task complexity.

Quick Start & Requirements

Install via pip install -e . after cloning the repository and navigating to the src directory.
Requires Azure OpenAI API keys and endpoints (AZURE_OPENAI_API_KEY, AZURE_OPENAI_CHAT_API_KEY, AZURE_OPENAI_CHAT_ENDPOINT_URL, AZURE_OPENAI_EMBEDDINGS_URL).
Datasets for benchmarks (MMLU, HumanEval, DROP, GSM8K, MATH, Big-Bench-Hard) must be downloaded separately and placed in src/promptbase/datasets/.
Example run command: python -m promptbase mmlu --subject <SUBJECT>
Links: Medprompt Blog, Medprompt Research Paper

Highlighted Details

Achieved 90.10% on MMLU with GPT-4 using Medprompt+.
Demonstrates outperforming specialized models with generalist foundation models via prompting.
Includes detailed methodology for dynamic few-shot selection, self-generated CoT, and ensembling.
Benchmarks provided for GPT-4 and Gemini Ultra across MMLU, GSM8K, MATH, HumanEval, BIG-Bench-Hard, DROP, and HellaSwag.

Maintenance & Community

Developed by Microsoft Research.
Future plans include more case studies, interviews, and specialized tooling deep dives.

Licensing & Compatibility

The repository itself appears to be MIT licensed, but the underlying methodologies and data usage are tied to Microsoft's AI services and research.

Limitations & Caveats

Some scripts are for reference and may not be immediately executable against public APIs. Medprompt+ relies on access to logprobs from GPT-4, which were not publicly available via the API at the time of the README's writing but were expected to be enabled.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days