Discover and explore top open-source AI tools and projects—updated daily.
ai-evals-courseAI evaluation course and toolkit for building and assessing chatbots
Top 96.8% on SourcePulse
This repository provides a structured, 5-assignment course for learning practical AI evaluation techniques using a Recipe Chatbot as a case study. It targets engineers and researchers seeking hands-on experience with systematic AI system improvement, offering a progressive learning path from basic prompt engineering to advanced RAG and agent analysis. The benefit is acquiring industry-standard evaluation skills applicable to real-world AI systems.
How It Works
The project employs a FastAPI backend integrated with LiteLLM for flexible, multi-provider LLM support, a simple HTML/CSS/JS frontend, and a FastHTML-based annotation tool. Core functionalities include a BM25-based retrieval system for recipes, LLM-powered query rewriting for optimization, and automated evaluation scripts. The approach emphasizes systematic, practical evaluation over theoretical concepts, building complexity incrementally across five distinct homework assignments.
Quick Start & Requirements
git clone https://github.com/ai-evals-course/recipe-chatbot.git
cd recipe-chatbot
uv sync
source .venv/bin/activate
env.example to .env and populate it with necessary LLM API keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) and desired model names (MODEL_NAME, MODEL_NAME_JUDGE).uv run uvicorn backend.main:app --reload
Access the chatbot at http://127.0.0.1:8000.uv, API access to specified LLM providers. Setup requires configuring API keys. Estimated setup time is minimal, dependent on environment setup and API key availability.Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (like Discord/Slack), or project roadmap are provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README. This lack of clear licensing information may pose a barrier to commercial use or integration into closed-source projects.
Limitations & Caveats
This project is primarily structured as an educational course, focusing on learning evaluation techniques rather than serving as a production-ready, standalone application. Running the system requires obtaining and configuring API keys for various LLM providers, which may incur costs.
3 days ago
Inactive
braintrustdata
cfortuner
johnbean393