trulens by truera

Evaluation and tracking tool for LLM experiments and AI agents

Created 5 years ago

3,434 stars

Top 13.7% on SourcePulse

9 Experts Love This Project

eugeneyan

AI Scientist at AWS

transitive-bullshit

Founder of Agentic

atroyn

Anton Troynikov

Cofounder of Chroma

gregpr07

Cofounder of Browser Use

and 5 more!

Project Summary

TruLens provides systematic evaluation and tracking for Large Language Model (LLM) applications and AI agents, enabling developers to understand and improve performance. It targets developers building LLM-powered applications, offering fine-grained, stack-agnostic instrumentation and comprehensive evaluations to identify failure modes.

How It Works

TruLens instruments LLM applications to log prompts, models, retrievers, and knowledge sources. It allows users to define custom feedback functions and evaluations that run alongside the application, facilitating systematic iteration and comparison of different app versions through a user interface.

Quick Start & Requirements

Primary install: pip install trulens
Prerequisites: Python. No specific hardware or GPU requirements are mentioned for basic installation.
Links: Contributing Guide, Discourse Community

Highlighted Details

Stack-agnostic instrumentation for LLM applications.
Supports evaluation of RAG (Retrieval-Augmented Generation) systems.
Enables definition of custom feedback functions and evaluations.
Provides a user interface for comparing app versions.

Maintenance & Community

The project encourages community contributions and provides a Discourse forum for discussion. A GitHub star is requested as a form of support.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README focuses on core functionality and does not detail limitations, unsupported platforms, or potential caveats regarding stability or advanced features.

Health Check

Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)

50

Issues (30d)

3

Star History

62 stars in the last 30 days

Explore Similar Projects

evalyn by shihongDev

GenAI application evaluation framework

Created 7 months ago

Updated 1 month ago

ToolSandbox by apple

An interactive benchmark for evaluating LLM tool use

Created 1 year ago

Updated 8 months ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind).

bench by arthur-ai

LLM evaluation tool for production use cases

Created 3 years ago

Updated 3 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Michael Chiang

Michael Chiang(Cofounder of Ollama), and

7 more.

openbench by groq

Provider-agnostic LLM evaluation infrastructure

Created 11 months ago

Updated 2 weeks ago

llmops-python-package by callmesora

LLMOps package for flexible, robust LLM workflows

Created 1 year ago

Updated 1 year ago

Starred by

Ankush Gola

Ankush Gola(Cofounder of LangChain).

langsmith-cookbook by langchain-ai

Cookbook for LangSmith, a tool to debug, evaluate, test, and improve LLM apps

Created 2 years ago

Updated 7 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Xiaofan Luan

Xiaofan Luan(VP Engineering at Zilliz).

coze-loop by coze-dev

AI agent development and operations platform

Created 1 year ago

Updated 1 day ago

Starred by

Michael Han

Michael Han(Cofounder of Unsloth),

Simon Willison

Simon Willison(Coauthor of Django), and

9 more.

phoenix by Arize-ai

AI observability platform for experimentation, evaluation, and troubleshooting

Created 3 years ago

Updated 14 hours ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

7 more.

opik by comet-ml

Open-source LLM evaluation framework for RAG, agents, and more

Created 3 years ago

Updated 1 day ago

Starred by

Michael Chiang

Michael Chiang(Cofounder of Ollama),

Magnus Müller

Magnus Müller(Cofounder of Browser Use), and

9 more.

deepeval by confident-ai

LLM evaluation framework for unit testing LLM outputs

Created 2 years ago

Updated 1 day ago

Starred by

Alexey Milovidov

Alexey Milovidov(Cofounder of Clickhouse),

Marc Klingen

Marc Klingen(Cofounder of Langfuse), and

20 more.

langfuse by langfuse

Open source LLM engineering platform for observability and evals

Created 3 years ago

Updated 13 hours ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Gregor Zunic

Gregor Zunic(Cofounder of Browser Use), and

15 more.

openai-agents-python by openai

Python SDK for multi-agent workflows

Created 1 year ago

Updated 20 hours ago

Feedback? Help us improve.