deepeval  by confident-ai

LLM evaluation framework for unit testing LLM outputs

created 2 years ago
9,656 stars

Top 5.3% on sourcepulse

GitHubView on GitHub
Project Summary

DeepEval is an open-source LLM evaluation framework designed for developers and researchers building LLM applications. It provides a Pytest-like experience for unit testing LLM outputs, incorporating advanced metrics and research to assess aspects like hallucination, relevancy, and RAG performance, enabling confident iteration and deployment of LLM systems.

How It Works

DeepEval leverages a modular design allowing users to select from a wide array of pre-built metrics or create custom ones. These metrics can be powered by various LLMs, statistical methods, or local NLP models. The framework supports both Pytest integration for CI/CD pipelines and standalone evaluation for notebook environments, facilitating systematic testing of LLM responses against defined criteria.

Quick Start & Requirements

  • Install via pip: pip install -U deepeval
  • Requires an OpenAI API key or a custom model setup.
  • Optional: DeepEval platform account for cloud reporting.
  • See: Getting Started

Highlighted Details

  • Supports over 40 safety vulnerabilities for red-teaming LLMs.
  • Integrates with LlamaIndex and Hugging Face for RAG and fine-tuning evaluations.
  • Offers benchmarking against popular LLM benchmarks like MMLU and HumanEval.
  • Enables custom metric creation and synthetic dataset generation.

Maintenance & Community

  • Developed by the founders of Confident AI.
  • Community support available via Discord.
  • Roadmap includes DAG custom metrics and Guardrails.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The framework relies on external LLM APIs (like OpenAI) for many metrics, which may incur costs or require API key management. Some advanced features like DAG custom metrics are still under development.

Health Check
Last commit

23 hours ago

Responsiveness

1 week

Pull Requests (30d)
102
Issues (30d)
35
Star History
3,646 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ToolBench by OpenBMB

0.1%
5k
Open platform for LLM tool learning (ICLR'24 spotlight)
created 2 years ago
updated 2 months ago
Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Didier Lopes Didier Lopes(Founder of OpenBB), and
4 more.

evidently by evidentlyai

0.4%
6k
Open-source framework for ML/LLM observability
created 4 years ago
updated 1 day ago
Feedback? Help us improve.