llms-from-scratch-cn  by datawhalechina

Hands-on tutorial for building LLMs from scratch

Created 1 year ago
3,537 stars

Top 13.7% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a hands-on, code-first tutorial for building Large Language Models (LLMs) from scratch, targeting developers and researchers who want to understand LLM architecture and implementation. It offers a systematic learning path, demystifying LLM principles through practical coding exercises and detailed explanations, enabling users to build functional, albeit small-scale, LLMs.

How It Works

The project guides users through implementing a GPT-like LLM architecture using PyTorch. It breaks down the process into manageable steps, starting with foundational concepts like text processing and attention mechanisms, and progressing to building the core GPT model, pre-training, and fine-tuning. The approach emphasizes clear, runnable notebook code and detailed explanations to foster a deep understanding of LLM internals, rather than just API usage.

Quick Start & Requirements

  • Install: Primarily uses Python notebooks.
  • Prerequisites: Basic PyTorch knowledge is sufficient. No specific hardware or advanced dependencies are mentioned for the core tutorials.
  • Resources: Links to detailed notebooks (Translated_Book) and concise starter notebooks (Codes) are provided.

Highlighted Details

  • Covers the step-by-step implementation of a GPT-like LLM.
  • Includes discussions and code for architectures like ChatGLM, Llama, and RWKV (V2-V6).
  • Offers detailed explanations of core LLM components such as attention mechanisms.
  • Provides exercises and solutions for practical reinforcement.

Maintenance & Community

The project is associated with Datawhale, a community focused on data science education. It lists contributors for specific chapters and encourages community involvement through GitHub Issues and Discussions.

Licensing & Compatibility

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). This license restricts commercial use and requires derivative works to be shared under the same terms.

Limitations & Caveats

The project's primary educational goal is to teach LLM principles by building smaller models; it is not intended for training production-scale foundation models. Some chapters (fine-tuning, practical application) are marked as "upcoming."

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
100 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), and
19 more.

ml-engineering by stas00

0.4%
15k
Open book for LLM/VLM training engineers
Created 5 years ago
Updated 1 day ago
Feedback? Help us improve.