llms-from-scratch-cn by datawhalechina

Hands-on tutorial for building LLMs from scratch

Created 1 year ago

3,886 stars

Top 12.4% on SourcePulse

Project Summary

This project provides a hands-on, code-first tutorial for building Large Language Models (LLMs) from scratch, targeting developers and researchers who want to understand LLM architecture and implementation. It offers a systematic learning path, demystifying LLM principles through practical coding exercises and detailed explanations, enabling users to build functional, albeit small-scale, LLMs.

How It Works

The project guides users through implementing a GPT-like LLM architecture using PyTorch. It breaks down the process into manageable steps, starting with foundational concepts like text processing and attention mechanisms, and progressing to building the core GPT model, pre-training, and fine-tuning. The approach emphasizes clear, runnable notebook code and detailed explanations to foster a deep understanding of LLM internals, rather than just API usage.

Quick Start & Requirements

Install: Primarily uses Python notebooks.
Prerequisites: Basic PyTorch knowledge is sufficient. No specific hardware or advanced dependencies are mentioned for the core tutorials.
Resources: Links to detailed notebooks (Translated_Book) and concise starter notebooks (Codes) are provided.

Highlighted Details

Covers the step-by-step implementation of a GPT-like LLM.
Includes discussions and code for architectures like ChatGLM, Llama, and RWKV (V2-V6).
Offers detailed explanations of core LLM components such as attention mechanisms.
Provides exercises and solutions for practical reinforcement.

Maintenance & Community

The project is associated with Datawhale, a community focused on data science education. It lists contributors for specific chapters and encourages community involvement through GitHub Issues and Discussions.

Licensing & Compatibility

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). This license restricts commercial use and requires derivative works to be shared under the same terms.

Limitations & Caveats

The project's primary educational goal is to teach LLM principles by building smaller models; it is not intended for training production-scale foundation models. Some chapters (fine-tuning, practical application) are marked as "upcoming."

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

93 stars in the last 30 days