promptfoo  by promptfoo

CLI tool for LLM prompt/agent/RAG testing and red-teaming

Created 2 years ago
8,400 stars

Top 6.1% on SourcePulse

GitHubView on GitHub
Project Summary

Promptfoo is a developer-focused, local tool designed to streamline the testing, evaluation, and security of Large Language Model (LLM) applications. It enables users to move beyond trial-and-error by providing automated testing, red teaming capabilities, and side-by-side model comparisons, ultimately aiming to help ship more secure and reliable AI applications.

How It Works

Promptfoo operates via a declarative configuration system, allowing users to define test cases, prompts, and evaluation metrics. It supports a wide array of LLM providers, including OpenAI, Anthropic, Azure, Bedrock, and Ollama, facilitating direct comparison of model performance. The tool emphasizes a developer-first approach with features like live reload and caching for rapid iteration, and it runs entirely locally to ensure prompt privacy.

Quick Start & Requirements

Highlighted Details

  • Facilitates red teaming and vulnerability scanning for LLMs.
  • Supports comparison of multiple LLM providers (OpenAI, Anthropic, Azure, Bedrock, Ollama, etc.).
  • Integrates with CI/CD pipelines for automated checks.
  • Offers features like live reload and caching for developer efficiency.

Maintenance & Community

  • Active community on Discord.
  • Open source with a contributing guide.

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and closed-source linking.

Limitations & Caveats

The tool is primarily command-line driven, with a focus on local execution. While it supports numerous LLM providers, integration with specific or self-hosted models not exposed via standard APIs might require custom configurations.

Health Check
Last Commit

13 hours ago

Responsiveness

1 day

Pull Requests (30d)
368
Issues (30d)
26
Star History
379 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Meng Zhang Meng Zhang(Cofounder of TabbyML), and
3 more.

qodo-cover by qodo-ai

0.2%
5k
CLI tool for AI-powered test generation and code coverage enhancement
Created 1 year ago
Updated 2 months ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

opik by comet-ml

1.7%
14k
Open-source LLM evaluation framework for RAG, agents, and more
Created 2 years ago
Updated 12 hours ago
Feedback? Help us improve.