recipe-chatbot  by ai-evals-course

AI evaluation course and toolkit for building and assessing chatbots

Created 7 months ago
264 stars

Top 96.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a structured, 5-assignment course for learning practical AI evaluation techniques using a Recipe Chatbot as a case study. It targets engineers and researchers seeking hands-on experience with systematic AI system improvement, offering a progressive learning path from basic prompt engineering to advanced RAG and agent analysis. The benefit is acquiring industry-standard evaluation skills applicable to real-world AI systems.

How It Works

The project employs a FastAPI backend integrated with LiteLLM for flexible, multi-provider LLM support, a simple HTML/CSS/JS frontend, and a FastHTML-based annotation tool. Core functionalities include a BM25-based retrieval system for recipes, LLM-powered query rewriting for optimization, and automated evaluation scripts. The approach emphasizes systematic, practical evaluation over theoretical concepts, building complexity incrementally across five distinct homework assignments.

Quick Start & Requirements

  1. Install:
    git clone https://github.com/ai-evals-course/recipe-chatbot.git
    cd recipe-chatbot
    uv sync
    source .venv/bin/activate
    
  2. Configure: Copy env.example to .env and populate it with necessary LLM API keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) and desired model names (MODEL_NAME, MODEL_NAME_JUDGE).
  3. Run Chatbot:
    uv run uvicorn backend.main:app --reload
    
    Access the chatbot at http://127.0.0.1:8000.
  4. Prerequisites: Python environment managed by uv, API access to specified LLM providers. Setup requires configuring API keys. Estimated setup time is minimal, dependent on environment setup and API key availability.

Highlighted Details

  • Features 5 progressive homework assignments covering prompt engineering, error analysis, LLM-as-Judge evaluation, RAG/retrieval evaluation, and agent failure analysis.
  • Supports multiple LLM providers via LiteLLM, allowing flexibility in model selection.
  • Includes interactive walkthroughs (Jupyter notebooks, Marimo scripts) for homework solutions.
  • Provides a FastHTML-based annotation tool for manual evaluation tasks.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or project roadmap are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. This lack of clear licensing information may pose a barrier to commercial use or integration into closed-source projects.

Limitations & Caveats

This project is primarily structured as an educational course, focusing on learning evaluation techniques rather than serving as a production-ready, standalone application. Running the system requires obtaining and configuring API keys for various LLM providers, which may incur costs.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Elvis Saravia Elvis Saravia(Founder of DAIR.AI), Tom Moor Tom Moor(Head of Engineering at Linear; Founder of Outline), and
6 more.

promptable by cfortuner

0%
2k
TS/JS library for building full-stack AI apps
Created 3 years ago
Updated 2 years ago
Feedback? Help us improve.