deepeval  by confident-ai

LLM evaluation framework for unit testing LLM outputs

Created 2 years ago
10,836 stars

Top 4.6% on SourcePulse

GitHubView on GitHub
Project Summary

DeepEval is an open-source LLM evaluation framework designed for developers and researchers building LLM applications. It provides a Pytest-like experience for unit testing LLM outputs, incorporating advanced metrics and research to assess aspects like hallucination, relevancy, and RAG performance, enabling confident iteration and deployment of LLM systems.

How It Works

DeepEval leverages a modular design allowing users to select from a wide array of pre-built metrics or create custom ones. These metrics can be powered by various LLMs, statistical methods, or local NLP models. The framework supports both Pytest integration for CI/CD pipelines and standalone evaluation for notebook environments, facilitating systematic testing of LLM responses against defined criteria.

Quick Start & Requirements

  • Install via pip: pip install -U deepeval
  • Requires an OpenAI API key or a custom model setup.
  • Optional: DeepEval platform account for cloud reporting.
  • See: Getting Started

Highlighted Details

  • Supports over 40 safety vulnerabilities for red-teaming LLMs.
  • Integrates with LlamaIndex and Hugging Face for RAG and fine-tuning evaluations.
  • Offers benchmarking against popular LLM benchmarks like MMLU and HumanEval.
  • Enables custom metric creation and synthetic dataset generation.

Maintenance & Community

  • Developed by the founders of Confident AI.
  • Community support available via Discord.
  • Roadmap includes DAG custom metrics and Guardrails.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The framework relies on external LLM APIs (like OpenAI) for many metrics, which may incur costs or require API key management. Some advanced features like DAG custom metrics are still under development.

Health Check
Last Commit

17 hours ago

Responsiveness

1 day

Pull Requests (30d)
97
Issues (30d)
31
Star History
644 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
3 more.

promptbench by microsoft

0.1%
3k
LLM evaluation framework
Created 2 years ago
Updated 1 month ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

opik by comet-ml

1.7%
14k
Open-source LLM evaluation framework for RAG, agents, and more
Created 2 years ago
Updated 12 hours ago
Feedback? Help us improve.