recipe-chatbot by ai-evals-course

AI evaluation course and toolkit for building and assessing chatbots

Created 9 months ago

285 stars

Top 91.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This repository provides a structured, 5-assignment course for learning practical AI evaluation techniques using a Recipe Chatbot as a case study. It targets engineers and researchers seeking hands-on experience with systematic AI system improvement, offering a progressive learning path from basic prompt engineering to advanced RAG and agent analysis. The benefit is acquiring industry-standard evaluation skills applicable to real-world AI systems.

How It Works

The project employs a FastAPI backend integrated with LiteLLM for flexible, multi-provider LLM support, a simple HTML/CSS/JS frontend, and a FastHTML-based annotation tool. Core functionalities include a BM25-based retrieval system for recipes, LLM-powered query rewriting for optimization, and automated evaluation scripts. The approach emphasizes systematic, practical evaluation over theoretical concepts, building complexity incrementally across five distinct homework assignments.

Quick Start & Requirements

Install:

git clone https://github.com/ai-evals-course/recipe-chatbot.git
cd recipe-chatbot
uv sync
source .venv/bin/activate

Configure: Copy env.example to .env and populate it with necessary LLM API keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) and desired model names (MODEL_NAME, MODEL_NAME_JUDGE).
Run Chatbot:
```
uv run uvicorn backend.main:app --reload
```
Access the chatbot at http://127.0.0.1:8000.
Prerequisites: Python environment managed by uv, API access to specified LLM providers. Setup requires configuring API keys. Estimated setup time is minimal, dependent on environment setup and API key availability.

Highlighted Details

Features 5 progressive homework assignments covering prompt engineering, error analysis, LLM-as-Judge evaluation, RAG/retrieval evaluation, and agent failure analysis.
Supports multiple LLM providers via LiteLLM, allowing flexibility in model selection.
Includes interactive walkthroughs (Jupyter notebooks, Marimo scripts) for homework solutions.
Provides a FastHTML-based annotation tool for manual evaluation tasks.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or project roadmap are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. This lack of clear licensing information may pose a barrier to commercial use or integration into closed-source projects.

Limitations & Caveats

This project is primarily structured as an educational course, focusing on learning evaluation techniques rather than serving as a production-ready, standalone application. Running the system requires obtaining and configuring API keys for various LLM providers, which may incur costs.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days