LLMs-Zero-to-Hero  by bbruceyuan

Tutorial for building LLMs from scratch

Created 8 months ago
1,659 stars

Top 25.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive, hands-on guide to mastering Large Language Models (LLMs) from scratch. It targets engineers and researchers aiming to understand and implement LLM training, fine-tuning, and deployment, offering a structured learning path with accompanying video tutorials.

How It Works

The project emphasizes a "from scratch" implementation approach, mirroring Andrej Karpathy's educational style. It covers foundational LLM concepts, dense models, Mixture-of-Experts (MoE) architectures, and various fine-tuning techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). The code is designed to be educational, with explanations integrated into the development process.

Quick Start & Requirements

  • Installation: Code is provided within the src/ directory and organized by chapter. Notebooks are available for direct execution.
  • Prerequisites: A GPU is required for training, with a minimum recommendation of NVIDIA 3090 or 4090.
  • Resources: The project offers GPU discount coupons via an AIStackDC registration link.
  • Documentation: Accompanying video lectures are available on Bilibili, linked within the chapter descriptions.

Highlighted Details

  • End-to-end LLM training and fine-tuning from scratch.
  • Detailed explanations of MoE architectures, including DeepSeek's MLA algorithm.
  • Coverage of activation function evolution and inference optimization techniques.
  • Dedicated sections for code-LLM development and LLM deployment.

Maintenance & Community

The project is actively developed by bbruceyuan, with community engagement encouraged via WeChat, a personal blog, and a public WeChat account.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the provided README.

Limitations & Caveats

Some sections, such as the nanoGPT implementation and activation function optimization, are marked as "todo," indicating incomplete content. The project is presented as an ongoing learning resource.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
69 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.