LLMs-Zero-to-Hero  by bbruceyuan

Tutorial for building LLMs from scratch

created 6 months ago
1,544 stars

Top 27.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive, hands-on guide to mastering Large Language Models (LLMs) from scratch. It targets engineers and researchers aiming to understand and implement LLM training, fine-tuning, and deployment, offering a structured learning path with accompanying video tutorials.

How It Works

The project emphasizes a "from scratch" implementation approach, mirroring Andrej Karpathy's educational style. It covers foundational LLM concepts, dense models, Mixture-of-Experts (MoE) architectures, and various fine-tuning techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). The code is designed to be educational, with explanations integrated into the development process.

Quick Start & Requirements

  • Installation: Code is provided within the src/ directory and organized by chapter. Notebooks are available for direct execution.
  • Prerequisites: A GPU is required for training, with a minimum recommendation of NVIDIA 3090 or 4090.
  • Resources: The project offers GPU discount coupons via an AIStackDC registration link.
  • Documentation: Accompanying video lectures are available on Bilibili, linked within the chapter descriptions.

Highlighted Details

  • End-to-end LLM training and fine-tuning from scratch.
  • Detailed explanations of MoE architectures, including DeepSeek's MLA algorithm.
  • Coverage of activation function evolution and inference optimization techniques.
  • Dedicated sections for code-LLM development and LLM deployment.

Maintenance & Community

The project is actively developed by bbruceyuan, with community engagement encouraged via WeChat, a personal blog, and a public WeChat account.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the provided README.

Limitations & Caveats

Some sections, such as the nanoGPT implementation and activation function optimization, are marked as "todo," indicating incomplete content. The project is presented as an ongoing learning resource.

Health Check
Last commit

3 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
337 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Nathan Lambert Nathan Lambert(AI Researcher at AI2), and
4 more.

large_language_model_training_playbook by huggingface

0%
478
Tips for training large language models
created 2 years ago
updated 2 years ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

cookbook by EleutherAI

0.1%
809
Deep learning resource for practical model work
created 1 year ago
updated 4 days ago
Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Charlie Holtz Charlie Holtz(Founder of Melty), and
6 more.

LLM101n by karpathy

0.1%
34k
Educational resource for building a Storyteller AI LLM
created 1 year ago
updated 1 year ago
Starred by Peter Norvig Peter Norvig(Author of Artificial Intelligence: A Modern Approach; Research Director at Google), Bojan Tunguz Bojan Tunguz(AI Scientist; Formerly at NVIDIA), and
4 more.

LLMs-from-scratch by rasbt

1.4%
61k
Educational resource for LLM construction in PyTorch
created 2 years ago
updated 22 hours ago
Feedback? Help us improve.