opik  by comet-ml

Open-source LLM evaluation framework for RAG, agents, and more

created 2 years ago
11,933 stars

Top 4.3% on sourcepulse

GitHubView on GitHub
Project Summary

Opik is an open-source platform designed to help developers debug, evaluate, and monitor Large Language Model (LLM) applications, including RAG systems and agentic workflows. It offers comprehensive tracing, automated evaluations using "LLM as a judge" metrics, and production-ready dashboards, aiming to improve the performance, speed, and cost-efficiency of LLM-based systems.

How It Works

Opik provides a Python SDK and a local or hosted platform for logging and analyzing LLM interactions. It captures detailed traces of LLM calls, user feedback, and prompt variations. The core advantage lies in its integrated evaluation suite, which includes pre-built "LLM as a judge" metrics for complex tasks like hallucination detection and relevance scoring, alongside heuristic metrics and customizability, enabling automated quality assessment and CI/CD integration.

Quick Start & Requirements

  • Install SDK: pip install opik
  • Configure: opik configure (or opik.configure(use_local=True) in code)
  • Local Deployment: Clone repo, run ./opik.sh (Linux/Mac) or .\opik.ps1 (Windows). Access at localhost:5173.
  • Integrations: Supports OpenAI, LiteLLM, LangChain, Haystack, Anthropic, Bedrock, CrewAI, and more.
  • Documentation: Website, Documentation

Highlighted Details

  • Comprehensive tracing for LLM calls across various frameworks.
  • "LLM as a judge" metrics for automated evaluation of relevance, hallucination, etc.
  • CI/CD integration via PyTest for automated testing.
  • High-volume trace ingestion capability for production monitoring.
  • Local or Comet.com hosted deployment options.

Maintenance & Community

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The platform is actively evolving, with a note about significant changes in version 1.7.0 requiring users to check the changelog. While many integrations are listed, users of unlisted frameworks may need to implement custom tracking via the @opik.track decorator.

Health Check
Last commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
180
Issues (30d)
36
Star History
4,964 stars in the last 90 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Didier Lopes Didier Lopes(Founder of OpenBB), and
4 more.

evidently by evidentlyai

0.4%
6k
Open-source framework for ML/LLM observability
created 4 years ago
updated 1 day ago
Feedback? Help us improve.