evidently  by evidentlyai

Open-source framework for ML/LLM observability

created 4 years ago
6,454 stars

Top 8.1% on sourcepulse

GitHubView on GitHub
Project Summary

Evidently is an open-source Python framework for evaluating, testing, and monitoring AI and ML systems, including LLMs. It supports both tabular and text data, offering over 100 built-in metrics for tasks ranging from data drift detection to RAG pipeline quality. The framework is designed for flexibility, allowing users to perform one-off evaluations or host a full monitoring service, making it suitable for researchers, data scientists, and ML engineers.

How It Works

Evidently operates through modular components: Reports and Test Suites for offline analysis and validation, and a Monitoring UI for visualizing results over time. Reports generate interactive visualizations and summaries of various quality evaluations, which can be customized with presets or individual metrics. Test Suites build upon Reports by adding pass/fail conditions, enabling automated checks for CI/CD pipelines. The framework supports custom metrics and offers an open architecture for integration with existing tools.

Quick Start & Requirements

Highlighted Details

  • Supports over 100 built-in metrics for data drift, quality, classification, regression, LLM outputs, and RAG.
  • Offers both offline evaluation (Reports, Test Suites) and live monitoring capabilities.
  • Includes a self-hostable open-source monitoring UI and a cloud offering with additional features.
  • Allows custom metric creation and integration with existing MLOps tools.

Maintenance & Community

  • Active community with a Discord server for discussion and support.
  • Regular updates and contributions are welcomed via a contribution guide.
  • Blog and Twitter accounts provide project updates and insights.

Licensing & Compatibility

  • Apache 2.0 License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is actively developed, and while it supports a wide range of AI tasks, users should consult the documentation for the latest supported metrics and features, as specific LLM evaluation capabilities are continuously evolving.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
32
Issues (30d)
4
Star History
376 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
2 more.

prompttools by hegelai

0.3%
3k
Open-source tools for prompt testing and experimentation
created 2 years ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Sourabh Bajaj Sourabh Bajaj(Cofounder of Uplimit), and
4 more.

opik by comet-ml

2.5%
12k
Open-source LLM evaluation framework for RAG, agents, and more
created 2 years ago
updated 19 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jerry Liu Jerry Liu(Cofounder of LlamaIndex).

deepeval by confident-ai

2.0%
10k
LLM evaluation framework for unit testing LLM outputs
created 2 years ago
updated 11 hours ago
Feedback? Help us improve.