uptrain by uptrain-ai

Open-source platform to evaluate and improve GenAI apps

Created 3 years ago

2,354 stars

Top 18.7% on SourcePulse

10 Experts Love This Project

winglian

Founder of Axolotl AI

transitive-bullshit

Founder of Agentic

swyxio

Editor of Latent Space

calebpeffer

Cofounder of Firecrawl

and 6 more!

Project Summary

UpTrain is an open-source platform designed to evaluate and enhance Generative AI applications. It offers over 20 pre-configured checks for language, code, and embeddings, performs root cause analysis on failures, and provides actionable insights for improvement, targeting developers and researchers working with LLMs.

How It Works

UpTrain utilizes an LLM-as-a-judge approach for evaluations, allowing customization of evaluation methods, few-shot examples, and scenario descriptions. It supports various LLM providers (OpenAI, Anthropic, Mistral, Azure, Anyscale) and embedding models, running analyses locally for data privacy. Root cause analysis helps pinpoint issues within the LLM pipeline based on negative feedback or low evaluation scores.

Quick Start & Requirements

Dashboard: Clone the repository and run bash run_uptrain.sh. Requires Docker.
Package: pip install uptrain.
Evaluations: Requires an OpenAI API key for model grading checks.
Resources: Local dashboard runs on your machine; no code required for dashboard use.
Documentation: How to evaluate your LLM application

Highlighted Details

Supports 20+ pre-configured evaluations including factual accuracy, response completeness, and prompt injection detection.
Local execution ensures data privacy, with only LLM calls leaving the secure environment.
Offers programmatic integration via a Python package and a local, code-free dashboard interface.
Enables customization of evaluation methods, few-shot examples, and custom evaluator creation.

Maintenance & Community

Active development with a roadmap for features like team collaboration and visualization.
Community support available via a Slack channel.
Direct contact with maintainers can be booked.

Licensing & Compatibility

Published under the Apache 2.0 license.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

The UpTrain Dashboard is currently in Beta. Future features include embedding visualization, pattern recognition, and prompt improvement suggestions.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

6 stars in the last 30 days

Explore Similar Projects

evalyn by shihongDev

GenAI application evaluation framework

Created 7 months ago

Updated 1 month ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Travis Fischer

Travis Fischer(Founder of Agentic), and

1 more.

zeno-build by zeno-ml

Examples for evaluating generative AI models

Created 3 years ago

Updated 2 years ago

Starred by

Ishaan Jaffer

Ishaan Jaffer(Cofounder of LiteLLM),

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI), and

2 more.

can-ai-code by the-crypt-keeper

AI coding model evaluation framework

Created 3 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

Evaluator by NVIDIA-NeMo

Open-source library for scalable, reproducible AI model and benchmark evaluation

Created 1 year ago

Updated 2 days ago

prompt-forge by insaaniManav

AI prompt engineering workbench

Created 1 year ago

Updated 1 year ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Michael Chiang

Michael Chiang(Cofounder of Ollama), and

7 more.

openbench by groq

Provider-agnostic LLM evaluation infrastructure

Created 11 months ago

Updated 2 weeks ago

bugbug by mozilla

ML platform for software engineering tasks

Created 8 years ago

Updated 1 day ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI),

Daniel Han

Daniel Han(Cofounder of Unsloth), and

6 more.

giskard-oss by Giskard-AI

Open-source testing framework for AI & LLM systems

Created 4 years ago

Updated 1 day ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind), and

3 more.

opencompass by open-compass

LLM evaluation platform for assessing model capabilities across diverse datasets

Created 3 years ago

Updated 1 day ago

Starred by

Marc Klingen

Marc Klingen(Cofounder of Langfuse),

Shyamal Anadkat

Shyamal Anadkat(Research Scientist at OpenAI), and

6 more.

Awesome-LLMOps by tensorchord

Curated list of LLMOps tools for developers

Created 4 years ago

Updated 1 month ago

Starred by

Lilian Weng

Lilian Weng(Cofounder of Thinking Machines Lab),

Anton Osika

Anton Osika(Cofounder of Lovable), and

20 more.

gpt-engineer by AntonOsika

CLI platform for code generation experimentation

Created 3 years ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind),

Dan Guido

Dan Guido(Cofounder of Trail of Bits), and

4 more.

generative-ai-for-beginners by microsoft

Course for learning generative AI application development

Created 3 years ago

Updated 2 days ago

Feedback? Help us improve.