Evaluation tool for LLM QA chains
Top 35.7% on sourcepulse
This project provides a lightweight, Streamlit-based evaluation tool for question-answering (QA) chains built with Langchain. It automates the generation of question-answer pairs from user-provided documents, constructs QA chains with configurable parameters, generates responses, and uses an LLM to score these responses against ground truth answers, enabling exploration of different chain configurations.
How It Works
The tool leverages Langchain to process user-provided documents. It splits text into chunks, generates question-answer pairs using an LLM (GPT-3.5-turbo), and constructs QA chains with user-selectable configurations for embeddings, retrieval, and LLM models. Responses are generated and then evaluated by another LLM (GPT-3.5-turbo) using a specified grading prompt.
Quick Start & Requirements
pip install -r requirements.txt
streamlit run auto-evaluator.py
Highlighted Details
Maintenance & Community
The project is associated with Langchain. The code for the hosted app is also open source at langchain-ai/auto-evaluator.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Full functionality and default model settings require both OpenAI (GPT-4) and Anthropic API keys, which may incur costs. While other models can be added, the default setup is dependent on these proprietary APIs.
2 years ago
1 day