gepa by gepa-ai

AI-driven evolution for system text components

Created 5 months ago

2,068 stars

Top 21.4% on SourcePulse

View on GitHub

3 Experts Love This Project

Will Brown

Research Lead at Prime Intellect

Malte Pietsch

Cofounder of deepset

Abhishek Thakur

World's First 4x Kaggle GrandMaster

Project Summary

GEPA is a framework for optimizing text-based components within any system, such as AI prompts, code, or specifications, using an evolutionary approach driven by LLM reflection. It targets developers and researchers seeking to enhance system performance by iteratively refining these text components against defined evaluation metrics, offering a method to achieve robust, high-performing variants with efficient evaluation budgets.

How It Works

GEPA employs a "Reflective Text Evolution" strategy. It uses Large Language Models (LLMs) to analyze feedback from system execution and evaluation traces, reflecting on performance to generate targeted mutations for text components. Candidates are iteratively mutated, evaluated, and selected using a Pareto-aware approach, allowing for the co-evolution of multiple components within modular systems to achieve domain-specific improvements.

Quick Start & Requirements

Installation: pip install gepa or pip install git+https://github.com/gepa-ai/gepa.git
Prerequisites: Requires access to LLMs, specifically demonstrated with OpenAI models (e.g., openai/gpt-4.1-mini, openai/gpt-5). An OPENAI_API_KEY environment variable is necessary for the provided examples.
Resources: Computational resources for running LLMs are implied.
Links:
- DSPy Integration Tutorials: dspy.GEPA Tutorials
- Paper: GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
- Reproduction Artifact: GEPA Artifact Repository

Highlighted Details

DSPy Integration: Offers a direct dspy.GEPA API for seamless integration with the DSPy framework, simplifying prompt optimization tasks.
Performance Gains: Demonstrated improvements include boosting GPT-4.1 Mini's performance from 46.6% to 56.6% on the AIME benchmark and evolving DSPy programs to achieve 93% accuracy on the MATH benchmark (up from 67%).
Adapter Abstraction: Features a flexible GEPAAdapter interface, enabling GEPA to plug into diverse systems, including single-turn LLM interactions, multi-turn agents (e.g., terminal-bench), and full program evolution.
Broad Applicability: Capable of optimizing various text components, from system prompts and code snippets to complex program logic and control flow.

Maintenance & Community

The project is associated with authors from the paper, including Lakshya A Agrawal and Matei Zaharia. Community engagement is encouraged via GitHub issues for support and feature requests, and discussions can be held on Discord. Updates and announcements are shared on X (formerly Twitter) via @LakshyAAAgrawal and @lateinteraction.

Licensing & Compatibility

The provided README does not explicitly state the software license. Users should verify licensing terms before adoption, especially concerning commercial use or integration into closed-source projects.

Limitations & Caveats

Practical application requires access to and configuration of specific LLMs, often necessitating API keys. The optimization process itself can be computationally intensive and may require careful tuning of parameters like max_metric_calls to balance performance gains with evaluation budgets.

Health Check

Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

286 stars in the last 30 days