auto-evaluator  by rlancemartin

Evaluation tool for LLM QA chains

created 2 years ago
1,082 stars

Top 35.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a lightweight, Streamlit-based evaluation tool for question-answering (QA) chains built with Langchain. It automates the generation of question-answer pairs from user-provided documents, constructs QA chains with configurable parameters, generates responses, and uses an LLM to score these responses against ground truth answers, enabling exploration of different chain configurations.

How It Works

The tool leverages Langchain to process user-provided documents. It splits text into chunks, generates question-answer pairs using an LLM (GPT-3.5-turbo), and constructs QA chains with user-selectable configurations for embeddings, retrieval, and LLM models. Responses are generated and then evaluated by another LLM (GPT-3.5-turbo) using a specified grading prompt.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Run: streamlit run auto-evaluator.py
  • Prerequisites: OpenAI API key (GPT-4 recommended for full features), Anthropic API key (optional, for default dashboard models).
  • Documentation: HuggingFace Space, Hosted App

Highlighted Details

  • Automates LLM QA chain evaluation.
  • Supports configurable text splitting, embeddings, retrieval, and LLM models.
  • Utilizes LLMs for both question generation and response grading.
  • Offers a Streamlit UI for interactive exploration.

Maintenance & Community

The project is associated with Langchain. The code for the hosted app is also open source at langchain-ai/auto-evaluator.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Full functionality and default model settings require both OpenAI (GPT-4) and Anthropic API keys, which may incur costs. While other models can be added, the default setup is dependent on these proprietary APIs.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.