promptfoo by promptfoo

CLI tool for LLM prompt/agent/RAG testing and red-teaming

Created 2 years ago

9,822 stars

Top 5.1% on SourcePulse

View on GitHub

18 Experts Love This Project

DevRel at Google DeepMind

Marc Klingen

Cofounder of Langfuse

and 14 more!

Project Summary

Promptfoo is a developer-focused, local tool designed to streamline the testing, evaluation, and security of Large Language Model (LLM) applications. It enables users to move beyond trial-and-error by providing automated testing, red teaming capabilities, and side-by-side model comparisons, ultimately aiming to help ship more secure and reliable AI applications.

How It Works

Promptfoo operates via a declarative configuration system, allowing users to define test cases, prompts, and evaluation metrics. It supports a wide array of LLM providers, including OpenAI, Anthropic, Azure, Bedrock, and Ollama, facilitating direct comparison of model performance. The tool emphasizes a developer-first approach with features like live reload and caching for rapid iteration, and it runs entirely locally to ensure prompt privacy.

Quick Start & Requirements

Primary install / run command: npx promptfoo@latest init followed by npx promptfoo eval
Prerequisites: Node.js.
Links: Website, Getting Started, Red Teaming, Documentation

Highlighted Details

Facilitates red teaming and vulnerability scanning for LLMs.
Supports comparison of multiple LLM providers (OpenAI, Anthropic, Azure, Bedrock, Ollama, etc.).
Integrates with CI/CD pipelines for automated checks.
Offers features like live reload and caching for developer efficiency.

Maintenance & Community

Active community on Discord.
Open source with a contributing guide.

Licensing & Compatibility

MIT License.
Permissive license suitable for commercial use and closed-source linking.

Limitations & Caveats

The tool is primarily command-line driven, with a focus on local execution. While it supports numerous LLM providers, integration with specific or self-hosted models not exposed via standard APIs might require custom configurations.

Health Check

Last Commit

11 hours ago

Responsiveness

1 day

Pull Requests (30d)

369

Issues (30d)

Star History

447 stars in the last 30 days