evidently by evidentlyai

Open-source framework for ML/LLM observability

Created 5 years ago

7,224 stars

Top 7.1% on SourcePulse

View on GitHub

8 Experts Love This Project

Han Wang

Cofounder of Mintlify

John Resig

Author of jQuery; Chief Software Architect at Khan Academy

Didier Lopes

Founder of OpenBB

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 4 more!

Project Summary

Evidently is an open-source Python framework for evaluating, testing, and monitoring AI and ML systems, including LLMs. It supports both tabular and text data, offering over 100 built-in metrics for tasks ranging from data drift detection to RAG pipeline quality. The framework is designed for flexibility, allowing users to perform one-off evaluations or host a full monitoring service, making it suitable for researchers, data scientists, and ML engineers.

How It Works

Evidently operates through modular components: Reports and Test Suites for offline analysis and validation, and a Monitoring UI for visualizing results over time. Reports generate interactive visualizations and summaries of various quality evaluations, which can be customized with presets or individual metrics. Test Suites build upon Reports by adding pass/fail conditions, enabling automated checks for CI/CD pipelines. The framework supports custom metrics and offers an open architecture for integration with existing tools.

Quick Start & Requirements

Install via pip: pip install evidently or conda: conda install -c conda-forge evidently.
For the monitoring UI demo: pip install virtualenv, virtualenv venv, source venv/bin/activate, then pip install evidently and evidently ui --demo-projects all.
Access the UI at localhost:8000.
Documentation: https://docs.evidentlyai.com/
LLM Evaluation Tutorial: https://docs.evidentlyai.com/tutorials/llm-evaluation
Tabular Data Tutorial: https://docs.evidentlyai.com/tutorials/tabular-data

Highlighted Details

Supports over 100 built-in metrics for data drift, quality, classification, regression, LLM outputs, and RAG.
Offers both offline evaluation (Reports, Test Suites) and live monitoring capabilities.
Includes a self-hostable open-source monitoring UI and a cloud offering with additional features.
Allows custom metric creation and integration with existing MLOps tools.

Maintenance & Community

Active community with a Discord server for discussion and support.
Regular updates and contributions are welcomed via a contribution guide.
Blog and Twitter accounts provide project updates and insights.

Licensing & Compatibility

Apache 2.0 License.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is actively developed, and while it supports a wide range of AI tasks, users should consult the documentation for the latest supported metrics and features, as specific LLM evaluation capabilities are continuously evolving.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

185 stars in the last 30 days