llms-from-scratch-cn  by datawhalechina

Hands-on tutorial for building LLMs from scratch

created 1 year ago
3,385 stars

Top 14.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a hands-on, code-first tutorial for building Large Language Models (LLMs) from scratch, targeting developers and researchers who want to understand LLM architecture and implementation. It offers a systematic learning path, demystifying LLM principles through practical coding exercises and detailed explanations, enabling users to build functional, albeit small-scale, LLMs.

How It Works

The project guides users through implementing a GPT-like LLM architecture using PyTorch. It breaks down the process into manageable steps, starting with foundational concepts like text processing and attention mechanisms, and progressing to building the core GPT model, pre-training, and fine-tuning. The approach emphasizes clear, runnable notebook code and detailed explanations to foster a deep understanding of LLM internals, rather than just API usage.

Quick Start & Requirements

  • Install: Primarily uses Python notebooks.
  • Prerequisites: Basic PyTorch knowledge is sufficient. No specific hardware or advanced dependencies are mentioned for the core tutorials.
  • Resources: Links to detailed notebooks (Translated_Book) and concise starter notebooks (Codes) are provided.

Highlighted Details

  • Covers the step-by-step implementation of a GPT-like LLM.
  • Includes discussions and code for architectures like ChatGLM, Llama, and RWKV (V2-V6).
  • Offers detailed explanations of core LLM components such as attention mechanisms.
  • Provides exercises and solutions for practical reinforcement.

Maintenance & Community

The project is associated with Datawhale, a community focused on data science education. It lists contributors for specific chapters and encourages community involvement through GitHub Issues and Discussions.

Licensing & Compatibility

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). This license restricts commercial use and requires derivative works to be shared under the same terms.

Limitations & Caveats

The project's primary educational goal is to teach LLM principles by building smaller models; it is not intended for training production-scale foundation models. Some chapters (fine-tuning, practical application) are marked as "upcoming."

Health Check
Last commit

11 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
548 stars in the last 90 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of Artificial Intelligence: A Modern Approach; Research Director at Google), Bojan Tunguz Bojan Tunguz(AI Scientist; Formerly at NVIDIA), and
4 more.

LLMs-from-scratch by rasbt

1.4%
61k
Educational resource for LLM construction in PyTorch
created 2 years ago
updated 23 hours ago
Feedback? Help us improve.