Building-a-Small-LLM-from-Scratch  by KaihuaTang

Tutorial for building LLMs from scratch using PyTorch

created 6 months ago
356 stars

Top 79.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a tutorial series for building a small Large Language Model (LLM) from scratch using PyTorch. It targets individuals with basic Python, PyTorch, and deep learning knowledge, aiming to demystify LLM components and training processes without relying on pre-existing libraries. The benefit is a hands-on understanding of LLM architecture and implementation.

How It Works

The tutorial breaks down LLM construction into fundamental components, implemented purely in PyTorch. It covers core elements like attention mechanisms, feed-forward networks, normalization layers, and tokenizers. The approach emphasizes building from first principles, allowing learners to grasp the underlying mechanics and practical challenges encountered in industry projects.

Quick Start & Requirements

  • Install: Python, PyTorch, NumPy.
  • Prerequisites: Basic deep learning background.
  • Resources: Designed for local execution on personal machines.
  • Links: 知乎链接, B站链接, Substack (under construction).

Highlighted Details

  • Focuses on building a small LLM from scratch, without external LLM libraries.
  • Covers practical industry challenges like ONNX conversion for dynamic to static shapes, optimizing inference speed, and handling FP16 numerical stability.
  • Explores specific model architectures and techniques such as DeepSeekV3's attention optimizations and LoRA fine-tuning.
  • Planned chapters include text pre-training, dialogue fine-tuning, and potentially multimodal models and multi-GPU deployment.

Maintenance & Community

The project is authored by Kaihua Tang, with Huaizheng Zhang also listed as an author. Community engagement channels are under construction (Bilibili, Substack).

Licensing & Compatibility

Content (text, images, code) is available for non-profit personal use and sharing. Commercial use, including paid courses or content platforms, requires explicit author approval.

Limitations & Caveats

The tutorial is a work in progress, with several planned chapters marked as "to be updated." Some advanced topics like multimodal networks and tensor parallelism are tentative.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
111 stars in the last 90 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of Artificial Intelligence: A Modern Approach; Research Director at Google).

fromthetensor by jla524

0%
1k
ML course for understanding deep learning from first principles
created 3 years ago
updated 5 days ago
Feedback? Help us improve.