llm-zoomcamp  by DataTalksClub

Free online course for real-life LLM applications

created 1 year ago
4,182 stars

Top 11.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a free, 10-week online course focused on building AI systems for knowledge base question answering using Large Language Models (LLMs). It targets data engineers, ML engineers, and researchers seeking practical, hands-on experience with LLMs, Retrieval-Augmented Generation (RAG), vector databases, and system evaluation. The course offers a structured curriculum with video lectures, homework, and a capstone project to build real-world AI applications.

How It Works

The course follows a modular approach, starting with LLM fundamentals and RAG, then progressing to open-source LLMs (Ollama, GPU deployment), vector databases (embeddings, indexing), evaluation, monitoring, and orchestration (Mage). It emphasizes practical implementation, covering topics like hybrid search and document reranking, culminating in an end-to-end project. The curriculum is designed to equip learners with the skills to build and deploy functional AI-powered Q&A systems.

Quick Start & Requirements

  • Self-Paced Learning: Watch videos, complete homework, and work on a project.
  • Prerequisites: Basic understanding of data science and Python. Specific module requirements (e.g., GPU for certain deployments) may apply.
  • Resources: Links to course videos, homework, and a community Slack channel are provided.

Highlighted Details

  • Comprehensive 10-week curriculum covering LLMs, RAG, vector search, evaluation, and monitoring.
  • Hands-on experience with tools like OpenAI API, Elasticsearch, Ollama, and vector databases.
  • Focus on building end-to-end AI systems for knowledge base question answering.
  • Includes bonus modules on advanced techniques like hybrid search and document reranking.

Maintenance & Community

  • Active community support via a dedicated Slack channel (#course-llm-zoomcamp).
  • Organized by DataTalks.Club, a global online community for data enthusiasts.
  • Links to Telegram announcements, course playlist, and community guidelines are available.

Licensing & Compatibility

  • The course content and associated code are generally provided for educational purposes. Specific licensing for code snippets or datasets should be verified within the repository.

Limitations & Caveats

  • The course is structured for specific cohorts (e.g., June 2, 2025), but self-paced learning is supported. Some advanced topics might require specific hardware (e.g., GPUs) for optimal local execution.
Health Check
Last commit

1 week ago

Responsiveness

1+ week

Pull Requests (30d)
10
Issues (30d)
1
Star History
918 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.