auto-evaluator by rlancemartin

Evaluation tool for LLM QA chains

Created 2 years ago

1,093 stars

Top 34.8% on SourcePulse

View on GitHub

8 Experts Love This Project

Krrish Dholakia

Cofounder of LiteLLM

Omar Sanseviero

DevRel at Google DeepMind

Yaowei Zheng

Author of LLaMA-Factory

Jeff Hammerbacher

Cofounder of Cloudera

and 4 more!

Project Summary

This project provides a lightweight, Streamlit-based evaluation tool for question-answering (QA) chains built with Langchain. It automates the generation of question-answer pairs from user-provided documents, constructs QA chains with configurable parameters, generates responses, and uses an LLM to score these responses against ground truth answers, enabling exploration of different chain configurations.

How It Works

The tool leverages Langchain to process user-provided documents. It splits text into chunks, generates question-answer pairs using an LLM (GPT-3.5-turbo), and constructs QA chains with user-selectable configurations for embeddings, retrieval, and LLM models. Responses are generated and then evaluated by another LLM (GPT-3.5-turbo) using a specified grading prompt.

Quick Start & Requirements

Install: pip install -r requirements.txt
Run: streamlit run auto-evaluator.py
Prerequisites: OpenAI API key (GPT-4 recommended for full features), Anthropic API key (optional, for default dashboard models).
Documentation: HuggingFace Space, Hosted App

Highlighted Details

Automates LLM QA chain evaluation.
Supports configurable text splitting, embeddings, retrieval, and LLM models.
Utilizes LLMs for both question generation and response grading.
Offers a Streamlit UI for interactive exploration.

Maintenance & Community

The project is associated with Langchain. The code for the hosted app is also open source at langchain-ai/auto-evaluator.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Full functionality and default model settings require both OpenAI (GPT-4) and Anthropic API keys, which may incur costs. While other models can be added, the default setup is dependent on these proprietary APIs.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days